AWS’s second speciality exam — big data — is designed to validate technical skills and experience in designing and implementing AWS services to derive value from data. The exam is for individuals who perform complex Big Data analyses and validates an individual’s ability to:
- Implement core AWS Big Data services according to basic architecture best practices
- Design and maintain Big Data
- Leverage tools to automate data analysis
So far, so good; but what does this mean, in real terms, as one person’s Big Data, is another person’s Jupyter notebooks on EMR. AWS, as always, break this across domains (for 100%):
| Domain 1 | Collection | 17% |
| Domain 2 | Storage | 17% |
| Domain 3 | Processing | 17% |
| Domain 4 | Analysis | 17% |
| Domain 5 | Visualisation | 12% |
| Domain 6| Data Security | 20% |
Already, an obvious problem, and this was certainly reflected in the exam, is the weight put on domain 6 — data security. Architects and Data Scientists probably aren’t going to have any real exposure to this domain, and certainly the questions I was given, were focussed on logging and S3 bucket security controls, rather than configuring encryption on EMR or a Redshift Data Warehouse (or DWH as one question just throws in there).
As with all exams above associate level, there’s a prerequisite requirement to hold 1 AWS Associate level exam. The exam itself is 65 questions, with 170 minutes to complete.
Starting Point
Review the AWS Certified Big Data Speciality page — https://aws.amazon.com/certification/certified-big-data-specialty/
This page includes a PDF link to the Exam Guide, which covers the above domains in more detail.
Next Steps
If your focus has been infrastructure, development, or another non-specific Big Data track, you’ll want to start with some basics, so you get a good feel for what the exam focus will be. Firstly, I’d strongly encourage viewing of “AWS re:Invent 2017: Big Data Architectural Patterns and Best Practices on AWS (ABD201)” This session is presented by Siva Raghupathy, who’s been at AWS for nearly 10 years — and this shows by the breadth and depth of his knowledge on AWS’s Big Data Offerings.
FAQS
AWS’s FAQ sections are always worth reading as part of any pre-sales, architecting, or exam study. For this exam, I’d recommend the following FAQs to get started:
EMR https://aws.amazon.com/emr/faqs/
Data Pipeline https://aws.amazon.com/datapipeline/faqs/
Kinesis https://aws.amazon.com/kinesis/data-streams/faqs/
Redshift https://aws.amazon.com/redshift/faqs/
DynamoDB https://aws.amazon.com/dynamodb/faqs/
IAM https://aws.amazon.com/iam/faqs/
Online Study Resources
AWS have a list of resources here: https://aws.amazon.com/blogs/big-data/getting-started-training-resources-for-big-data-on-aws/
I followed the aCloudGuru course for the Certified Big Data Speciality: https://acloud.guru/learn/aws-certified-big-data-specialty
linuxAcademy course available here: https://linuxacademy.com/amazon-web-services/training/course/name/aws-certified-big-data
Pluralsight course available here: https://www.pluralsight.com/courses/big-data-amazon-web-services
Re:INVENT Videos
I watched the following re:INVENT videos on YouTube to supplement my online learning, search “reinvent wxyz YYYY”:
Redshift
BDM402 2016
ABD304 2017
GPSTEC315 2017
EMR
BDT305 2015
BDM401 2016
ABD305 2017
DynamoDB
DAT310-R 2017
Kinesis
ABD301 2017
Free Answers
I’m not here to give you free answers, but keep an eye out for the following phrases/learning points as you approach the exam:
1) Redshift Distribution Keys
2) Redshift Star Schemas
3) Redshift Distribution Keys
4) Redshift data loading — slices
5) Machine Learning Models — what they’re called, what they’re used for, what the outputs are (brief)
6) Have a clear understanding between Kinesis Streams and Kinesis Firehose
7) EMR –
a. How/when a metastore is used
b. Why consistent views are used
8) DynamoDB architecture
a. Performance considerations
b. GSI/LIS use
9) Security
a. S3 bucket ACLs
b. Sending log data to Cloudwatch/ElasticSearch
c. Encryption Options on EMR