Inspirational journeys

Follow the stories of academics and their research expeditions

AWS Certified Data Analytics - Specialty - Part 9

Mary Smith

Wed, 19 Nov 2025

AWS Certified Data Analytics - Specialty - Part 9

1. A company uses Amazon Elasticsearch Service (Amazon ES) to store and analyze its website clickstream data. The company ingests 1 TB of data daily usingAmazon Kinesis Data Firehose and stores one day's worth of data in an Amazon ES cluster.The company has very slow query performance on the Amazon ES index and occasionally sees errors from Kinesis Data Firehose when attempting to write to the index. The Amazon ES cluster has 10 nodes running a single index and 3 dedicated master nodes. Each data node has 1.5 TB of Amazon EBS storage attached and the cluster is configured with 1,000 shards. Occasionally, JVMMemoryPressure errors are found in the cluster logs.Which solution will improve the performance of Amazon ES?

A) Increase the memory of the Amazon ES master nodes.
B) Decrease the number of Amazon ES data nodes.
C) Decrease the number of Amazon ES shards for the index.
D) Increase the number of Amazon ES shards for the index.



2. A financial services company needs to aggregate daily stock trade data from the exchanges into a data store. The company requires that data be streamed directly into the data store, but also occasionally allows data to be modified using SQL. The solution should integrate complex, analytic queries running with minimal latency. The solution must provide a business intelligence dashboard that enables viewing of the top contributors to anomalies in stock prices.Which solution meets the company's requirements?

A) Use Amazon Kinesis Data Firehose to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.
B) Use Amazon Kinesis Data Streams to stream data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.
C) Use Amazon Kinesis Data Firehose to stream data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.
D) Use Amazon Kinesis Data Streams to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.



3. A financial company hosts a data lake in Amazon S3 and a data warehouse on an Amazon Redshift cluster. The company uses Amazon QuickSight to build dashboards and wants to secure access from its on-premises Active Directory to Amazon QuickSight.How should the data be secured?

A) Use an Active Directory connector and single sign-on (SSO) in a corporate network environment.
B) Use a VPC endpoint to connect to Amazon S3 from Amazon QuickSight and an IAM role to authenticate Amazon Redshift.
C) Establish a secure connection by creating an S3 endpoint to connect Amazon QuickSight and a VPC endpoint to connect to Amazon Redshift.
D) Place Amazon QuickSight and Amazon Redshift in the security group and use an Amazon S3 endpoint to connect Amazon QuickSight to Amazon S3.



4. A software company hosts an application on AWS, and new features are released weekly. As part of the application testing process, a solution must be developed that analyzes logs from each Amazon EC2 instance to ensure that the application is working as expected after each deployment. The collection and analysis solution should be highly available with the ability to display new information with minimal delays.Which method should the company use to collect and analyze the logs?

A) Enable detailed monitoring on Amazon EC2, use Amazon CloudWatch agent to store logs in Amazon S3, and use Amazon Athena for fast, interactive log analytics.
B) Use the Amazon Kinesis Producer Library (KPL) agent on Amazon EC2 to collect and send data to Kinesis Data Streams to further push the data to Amazon Elasticsearch Service and visualize using Amazon QuickSight.
C) Use the Amazon Kinesis Producer Library (KPL) agent on Amazon EC2 to collect and send data to Kinesis Data Firehose to further push the data to Amazon Elasticsearch Service and Kibana.
D) Use Amazon CloudWatch subscriptions to get access to a real-time feed of logs and have the logs delivered to Amazon Kinesis Data Streams to further push the data to Amazon Elasticsearch Service and Kibana.



5. A manufacturing company has been collecting IoT sensor data from devices on its factory floor for a year and is storing the data in Amazon Redshift for daily analysis. A data analyst has determined that, at an expected ingestion rate of about 2 TB per day, the cluster will be undersized in less than 4 months. A long-term solution is needed. The data analyst has indicated that most queries only reference the most recent 13 months of data, yet there are also quarterly reports that need to query all the data generated from the past 7 years. The chief technology officer (CTO) is concerned about the costs, administrative effort, and performance of a long-term solution.Which solution should the data analyst use to meet these requirements?

A) Create a daily job in AWS Glue to UNLOAD records older than 13 months to Amazon S3 and delete those records from Amazon Redshift. Create an external table in Amazon Redshift to point to the S3 location. Use Amazon Redshift Spectrum to join to data that is older than 13 months.
B) Take a snapshot of the Amazon Redshift cluster. Restore the cluster to a new cluster using dense storage nodes with additional storage capacity.
C) Execute a CREATE TABLE AS SELECT (CTAS) statement to move records that are older than 13 months to quarterly partitioned data in Amazon Redshift Spectrum backed by Amazon S3.
D) Unload all the tables in Amazon Redshift to an Amazon S3 bucket using S3 Intelligent-Tiering. Use AWS Glue to crawl the S3 bucket location to create external tables in an AWS Glue Data Catalog. Create an Amazon EMR cluster using Auto Scaling for any daily analytics needs, and use Amazon Athena for the quarterly reports, with both using the same AWS Glue Data Catalog.



1. Right Answer: C
Explanation: Unbalanced shard allocations across nodes or too many shards in a cluster can cause JVMMemoryPressue. Resolution - Reduce the number of shards by deleting old or unused indices. Referencehttps://aws.amazon.com/premiumsupport/knowledge-center/high-jvm-memory-pressure-elasticsearch/

2. Right Answer: C
Explanation: Data streamed DIRECTLY to data store = Data Firehose does this. Integrate complex, analytic queries with min latency = Redshift, OLAP use case and destination for firehose. Business intelligence dashboard = Quicksight.

3. Right Answer: A
Explanation: Use an AD connector and SSO in a corporate environment, Key point of question is to 'Secure access from its on-premise AD to Quicksight'. Quicksight Enterprise edition allows for connecting to AD / using AD groups, SSO, row-level security, encryption at rest ... etcReferenceshttps://aws.amazon.com/blogs/security/how-to-connect-your-on-premises-active-directory-to-aws-using-ad-connector/https://docs.aws.amazon.com/singlesignon/latest/userguide/connectonpremad.html

4. Right Answer: D
Explanation: You can monitor your instances using Amazon CloudWatch, which collects and processes raw data from Amazon EC2 into readable, near real-time metrics. Referencehttps://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch.html

5. Right Answer: A
Explanation: A: is correct because that exactly specifies how to move data to Redshift spectrum and reduce cluster space.Incorrect Answers:B. Snapshotting will save costs but not solve problem of cluster being undersized C. CTAS is not used to move data to S3 via spectrum. CTAS Creates a new table based on a query. The owner of this table is the user that issues the command. D. EMR cannot be used as Data Warehouse solution And they do not need interactive query with Athena. Referencehttps://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html

0 Comments

Leave a comment