Inspirational journeys

Follow the stories of academics and their research expeditions

AWS Certified Data Analytics - Specialty - Part 10

Mary Smith

Mon, 17 Mar 2025

AWS Certified Data Analytics - Specialty - Part 10

1. An insurance company has raw data in JSON format that is sent without a predefined schedule through an Amazon Kinesis Data Firehose delivery stream to anAmazon S3 bucket. An AWS Glue crawler is scheduled to run every 8 hours to update the schema in the data catalog of the tables stored in the S3 bucket. Data analysts analyze the data using Apache Spark SQL on Amazon EMR set up with AWS Glue Data Catalog as the metastore. Data analysts say that, occasionally, the data they receive is stale. A data engineer needs to provide access to the most up-to-date data.Which solution meets these requirements?

A) Create an external schema based on the AWS Glue Data Catalog on the existing Amazon Redshift cluster to query new data in Amazon S3 with Amazon Redshift Spectrum.
B) Use Amazon CloudWatch Events with the rate (1 hour) expression to execute the AWS Glue crawler every hour.
C) Using the AWS CLI, modify the execution schedule of the AWS Glue crawler from 8 hours to 1 minute.
D) Run the AWS Glue crawler from an AWS Lambda function triggered by an S3:ObjectCreated:* event notification on the S3 bucket.



2. A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake.The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day.How should this data be stored for optimal performance?

A) In Apache ORC partitioned by date and sorted by source IP
B) In compressed .csv partitioned by date and sorted by source IP
C) In Apache Parquet partitioned by source IP and sorted by date
D) In compressed nested JSON partitioned by source IP and sorted by date



3. A banking company is currently using an Amazon Redshift cluster with dense storage (DS) nodes to store sensitive data. An audit found that the cluster is unencrypted. Compliance requirements state that a database with sensitive data must be encrypted through a hardware security module (HSM) with automated key rotation.Which combination of steps is required to achieve compliance? (Choose two.)(Select 2answers)

A) Set up a trusted connection with HSM using a client and server certificate with automatic key rotation.
B) Modify the cluster with an HSM encryption option and automatic key rotation.
C) Create a new HSM-encrypted Amazon Redshift cluster and migrate the data to the new cluster.
D) Enable HSM with key rotation through the AWS CLI.
E) Enable Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) encryption in the HSM.


4. A company is planning to do a proof of concept for a machine earning (ML) project using Amazon SageMaker with a subset of existing on-premises data hosted in the company's 3 TB data warehouse. For part of the project, AWS Direct Connect is established and tested. To prepare the data for ML, data analysts are performing data curation. The data analysts want to perform multiple step, including mapping, dropping null fields, resolving choice, and splitting fields. The company needs the fastest solution to curate the data for this project.Which solution meets these requirements?

A) Ingest data into Amazon S3 using AWS DataSync and use Apache Spark scrips to curate the data in an Amazon EMR cluster. Store the curated data in Amazon S3 for ML processing.
B) Create custom ETL jobs on-premises to curate the data. Use AWS DMS to ingest data into Amazon S3 for ML processing.
C) Ingest data into Amazon S3 using AWS DMS. Use AWS Glue to perform data curation and store the data in Amazon 3 for ML processing.
D) Take a full backup of the data store and ship the backup files using AWS Snowball. Upload Snowball data into Amazon S3 and schedule data curation jobs using AWS Batch to prepare the data for ML.



5. A US-based sneaker retail company launched its global website. All the transaction data is stored in Amazon RDS and curated historic transaction data is stored in Amazon Redshift in the us-east-1 Region. The business intelligence (BI) team wants to enhance the user experience by providing a dashboard for sneaker trends.The BI team decides to use Amazon QuickSight to render the website dashboards. During development, a team in Japan provisioned Amazon QuickSight in ap- northeast-1. The team is having difficulty connecting Amazon QuickSight from ap-northeast-1 to Amazon Redshift in us-east-1.Which solution will solve this issue and meet the requirements?

A) In the Amazon Redshift console, choose to configure cross-Region snapshots and set the destination Region as ap-northeast-1. Restore the Amazon Redshift Cluster from the snapshot and connect to Amazon QuickSight launched in ap-northeast-1.
B) Create a VPC endpoint from the Amazon QuickSight VPC to the Amazon Redshift VPC so Amazon QuickSight can access data from Amazon Redshift.
C) Create an Amazon Redshift endpoint connection string with Region information in the string and use this connection string in Amazon QuickSight to connect to Amazon Redshift.
D) Create a new security group for Amazon Redshift in us-east-1 with an inbound rule authorizing access from the appropriate IP address range for the Amazon QuickSight servers in ap-northeast-1.



1. Right Answer: D
Explanation: you can use a wildcard (for example, s3:ObjectCreated:*) to request notification when an object is created regardless of the API used 'AWS Lambda can run custom code in response to Amazon S3 bucket events. You upload your custom code to AWS Lambda and create what is called a Lambda function. When Amazon S3 detects an event of a specific type (for example, an object created event), it can publish the event to AWS Lambda and invoke your function in Lambda. In response, AWS Lambda runs your function.'Referencehttps://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

2. Right Answer: C
Explanation: ORC and Parquet are columnar format can be processed in parallel has better query performance. In this case the company also need to analyze historical logs dating back 2 years, therefore partitioned by date will be good for query in specific time range to limit the data to be load can has better query performance.

3. Right Answer: A,C
Explanation: When you use an HSM, you must use client and server certificates to configure a trusted connection between Amazon Redshift and your HSM. Reference: https://docs.amazonaws.cn/en_us/redshift/latest/mgmt/security-key-management.html To migrate an unencrypted cluster to a cluster encrypted using a hardware security module (HSM), you create a new encrypted cluster and move your data to the new cluster. Reference: https://docs.aws.amazon.com/redshift/latest/mgmt/changing-cluster-encryption.html

4. Right Answer: C
Explanation: The keyword to look for in the question is fastest solution. Glue being a fully managed serverless ETL service provides instant job creation with predefined py spark/scala code that can be tailored as needed. And DMS does support S3 as a target.Referencehttps://docs.aws.amazon.com/glue/latest/dg/built-in-transforms.html

5. Right Answer: D
Explanation: Quicksight while attached to vpc cannot connect cross region as stated here Reference: https://docs.aws.amazon.com/quicksight/latest/user/working-with-aws-vpc.html Then the only option is to enable access via the public endpoint of Quicksight on the SG of Redshift. List of public cidr of Quicksight in all regions is here Reference: https://docs.aws.amazon.com/quicksight/latest/user/regions.html

0 Comments

Leave a comment