1. An Amazon Redshift database contains sensitive user data. Logging is necessary to meet compliance requirements. The logs must contain database authentication attempts, connections, and disconnections. The logs must also contain each query run against the database and record which database user ran each query.Which steps will create the required logs?
A) Enable Amazon Redshift Enhanced VPC Routing. Enable VPC Flow Logs to monitor traffic. B) Allow access to the Amazon Redshift database using AWS IAM only. Log access using AWS CloudTrail. C) Enable audit logging for Amazon Redshift using the AWS Management Console or the AWS CLI. D) Enable and download audit reports from AWS Artifact.
2. A company that monitors weather conditions from remote construction sites is setting up a solution to collect temperature data from the following two weather stations.' Station A, which has 10 sensors' Station B, which has five sensorsThese weather stations were placed by onsite subject-matter experts.Each sensor has a unique ID. The data collected from each sensor will be collected using Amazon Kinesis Data Streams.Based on the total incoming and outgoing data throughput, a single Amazon Kinesis data stream with two shards is created. Two partition keys are created based on the station names. During testing, there is a bottleneck on data coming from Station A, but not from Station B. Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?
A) Increase the number of shards in Kinesis Data Streams to increase the level of parallelism. B) Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream. C) Modify the partition key to use the sensor ID instead of the station name. D) Reduce the number of sensors in Station A from 10 to 5 sensors.
3. Once a month, a company receives a 100 MB .csv file compressed with gzip. The file contains 50,000 property listing records and is stored in Amazon S3 Glacier.The company needs its data analyst to query a subset of the data for a specific vendor.What is the most cost-effective solution?
A) Load the data into Amazon S3 and query it with Amazon S3 Select. B) Query the data from Amazon S3 Glacier directly with Amazon Glacier Select. C) Load the data to Amazon S3 and query it with Amazon Athena. D) Load the data to Amazon S3 and query it with Amazon Redshift Spectrum.
4. A retail company is building its data warehouse solution using Amazon Redshift. As a part of that effort, the company is loading hundreds of files into the fact table created in its Amazon Redshift cluster. The company wants the solution to achieve the highest throughput and optimally use cluster resources when loading data into the company's fact table.How should the company meet these requirements?
A) Use multiple COPY commands to load the data into the Amazon Redshift cluster. B) Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS connector to ingest the data into the Amazon Redshift cluster. C) Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in parallel into each node. D) Use a single COPY command to load the data into the Amazon Redshift cluster.
5. A data analyst is designing a solution to interactively query datasets with SQL using a JDBC connection. Users will join data stored in Amazon S3 in Apache ORC format with data stored in Amazon Elasticsearch Service (Amazon ES) and Amazon Aurora MySQL.Which solution will provide the MOST up-to-date results?
A) Use AWS Glue jobs to ETL data from Amazon ES and Aurora MySQL to Amazon S3. Query the data with Amazon Athena. B) Use Amazon DMS to stream data from Amazon ES and Aurora MySQL to Amazon Redshift. Query the data with Amazon Redshift. C) Query all the datasets in place with Apache Spark SQL running on an AWS Glue developer endpoint. D) Query all the datasets in place with Apache Presto running on Amazon EMR.
1. Right Answer: C Explanation: The connection log, user log, and user activity log are enabled together by using the AWS Management Console, the Amazon Redshift API Reference, or the AWS Command Line Interface (AWS CLI).Referencehttps://aws.amazon.com/premiumsupport/knowledge-center/logs-redshift-database-cluster/
2. Right Answer: C Explanation: Splitting increases the number of shards in your stream and therefore increases the data capacity of the stream. Because you are charged on a per-shard basis, splitting increases the cost of your streamReferencehttps://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html
3. Right Answer: A Explanation: You may need to use Athena if data is in multiple files but this question - data is in a single compressed fileReferencehttps://aws.amazon.com/athena/faqs/
4. Right Answer: D Explanation: D. Use a single COPY command to load the data into the Amazon Redshift cluster. ==> Single copy command starts multiple process to load by taking advantage of multiple slices of node.Incorrect Answers:A. Use multiple COPY commands to load the data into the Amazon Redshift cluster. ==> This will make the copy process is serialized B. Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS connector to ingest the data into the Amazon Redshift cluster. => It's over engineering of introducing HDFS/EMR C. Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in parallel into each node. ==> It's same as Option A Referencehttps://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html
5. Right Answer: D Explanation: Presto supports both non-relational sources, such as the Hadoop Distributed File System (HDFS), Amazon S3, and HBase, and relational data sources such as MySQL, PostgreSQL and Amazon Redshift. Presto can query data where it's stored, without needing to move data into a separate analytics system.Referencehttps://prestodb.io/docs/current/connector/elasticsearch.html
Leave a comment