1. Right Answer: A,D
Explanation: After using Firehose to stream into Elasticsearch, use Kibana for the visualisations
2. Right Answer: D
Explanation: The parent shards that remain after the reshard could still contain data that you haven't read yet that was added to the stream before the reshard. If you read data from the child shards before having read all data from the parent shards, you could read data for a particular hash key out of the order given by the data records' sequence numbers. Therefore, assuming that the order of the data is important, you should, after a reshard, always continue to read data from the parent shards until it is exhausted. Only then should you begin reading data from the child shards.Referencehttps://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-after-resharding.html
3. Right Answer: C
Explanation: Since data is partitioned by user_id, we can safely assume that the data is more or less evenly distributed among the shards. Also, no other applications are mentioned in the question, so we have to assume that Lambda is the only source consuming the stream. During peak loads, the lambda process is so laggy that the data processing is an hour behind. Even if we move to enhanced fanout with stream processing to HTTP/2 consumers, it may not improve the read throughput to catch up to 1 hour lag. A better option is to increase the shards which will increase the read throughput and improve latency.
4. Right Answer: C,D
Explanation: A few months after launch, a data analyst found that write performance is significantly reduced means the load is increasing constantly and will not decrease, and it's not a temporary spike. So we are facing a lack of shard or bad partition key is burning some shards over time
5. Right Answer: A
Explanation: A improves ingestion/storage efficiency by levaraging Firehose. This is clearly an option which 'improves the efficiency of the data processing jobs and is well architected' while 'continue to use PySpark'.
Leave a comment