Top AWS Glue frequently asked interview questions.
- What is AWS Glue?
- What are the Benefits of AWS Glue?
- What are the components used by AWS Glue?
- What Data Sources are supported by AWS Glue?
- What are Development Endpoints?
- What are AWS Tags in AWS Glue?
- What is AWS Glue Data Catalog?
- What are AWS Glue Crawlers?
- What is AWS Glue Streaming ETL?
- Is AWS Glue Schema Registry an open source?
- How can we list Databases and Tables in AWS Glue Catalog?
- How does AWS Glue update Duplicating Data?
What is AWS Glue?AWS Glue helps in preparing data for Analysis by automated extract, transforming, and loading ETL processes. It supports MySQL, Microsoft SQL Server, PostgreSQL Databases which runs on Amazon EC2(Elastic Compute Cloud) Instances in an Amazon VPC(Virtual Private Cloud).
AWS Glue is an extracted, loaded, transformed service which helps in automating time-consuming steps of Data Preparation for the analytics.
What are the Benefits of AWS Glue?Benefits of AWS Glue are as follows:
- Fault Tolerance - AWS Glue is retrievable and the logs can be debugged.
- Filtering - AWS Glue uses filtering for bad data.
- Maintenance and Development - AWS Glue uses maintenance and deployment as the service is managed by AWS.
What are the components used by AWS Glue?
AWS Glue consists of:
- Data Catalog is a Central Metadata Repository.
- ETL Engine helps in generating Python and Scala Code.
- Flexible Scheduler helps in handling Dependency Resolution, Job Monitoring and Retring.
- AWS Glue DataBrew helps in Normalizing and Cleaning Data with visual interface.
- AWS Glue Elastic View used in Replicating and Combining Data through multiple Data Stores.
What Data Sources are supported by AWS Glue?Data Sources supported by AWS Glue are:
Amazon RDS for MySQL
Amazon RDS for Oracle
Amazon RDS for PostgreSQL
Amazon RDS for SQL Server
Microsoft SQL Server
AWS Glue also supports Database such as:
Amazon Kinesis Data Streams
What are Development Endpoints?Development Endpoints are used in describing the AWS Glue API that is related to testing by using Custom DevEndpoint.The endpoint is where a developer can debug the extract, transforming, and loading ETL Scripts.
What are AWS Tags in AWS Glue?AWS Tags are labels used in assigning us to AWS Resources.
Each tag contains a Key and an Optional Value, which we can define. We can also use tags in AWS Glue for organizing and identifying our resources. All the tags are used in creating cost accounting reports and restricting access to resources.