Small Notes from the AWS Certified Big Data: Speciality Exam

corcoran
3 min readAug 7, 2020

AWS’s second speciality exam — big data — is designed to validate technical skills and experience in designing and implementing AWS services to derive value from data. The exam is for individuals who perform complex Big Data analyses and validates an individual’s ability to:

  • Implement core AWS Big Data services according to basic architecture best practices
  • Design and maintain Big Data
  • Leverage tools to automate data analysis

So far, so good; but what does this mean, in real terms, as one person’s Big Data, is another person’s Jupyter notebooks on EMR. AWS, as always, break this across domains (for 100%):

| Domain 1 | Collection | 17% |

| Domain 2 | Storage | 17% |

| Domain 3 | Processing | 17% |

| Domain 4 | Analysis | 17% |

| Domain 5 | Visualisation | 12% |

| Domain 6| Data Security | 20% |

Already, an obvious problem, and this was certainly reflected in the exam, is the weight put on domain 6 — data security. Architects and Data Scientists probably aren’t going to have any real exposure to this domain, and certainly the questions I was given, were focussed on logging and S3 bucket security controls, rather than configuring encryption on EMR or a Redshift Data Warehouse (or DWH as one question just throws in there).

As with all exams above associate level, there’s a prerequisite requirement to hold 1 AWS Associate level exam. The exam itself is 65 questions, with 170 minutes to complete.

Starting Point

Review the AWS Certified Big Data Speciality page — https://aws.amazon.com/certification/certified-big-data-specialty/

This page includes a PDF link to the Exam Guide, which covers the above domains in more detail.

Next Steps

If your focus has been infrastructure, development, or another non-specific Big Data track, you’ll want to start with some basics, so you get a good feel for what the exam focus will be. Firstly, I’d strongly encourage viewing of “AWS re:Invent 2017: Big Data Architectural Patterns and Best Practices on AWS (ABD201)” This session is presented by Siva Raghupathy, who’s been at AWS for nearly 10 years — and this shows by the breadth and depth of his knowledge on AWS’s Big Data Offerings.

FAQS

AWS’s FAQ sections are always worth reading as part of any pre-sales, architecting, or exam study. For this exam, I’d recommend the following FAQs to get started:

EMR https://aws.amazon.com/emr/faqs/

Data Pipeline https://aws.amazon.com/datapipeline/faqs/

Kinesis https://aws.amazon.com/kinesis/data-streams/faqs/

Redshift https://aws.amazon.com/redshift/faqs/

DynamoDB https://aws.amazon.com/dynamodb/faqs/

IAM https://aws.amazon.com/iam/faqs/

Online Study Resources

AWS have a list of resources here: https://aws.amazon.com/blogs/big-data/getting-started-training-resources-for-big-data-on-aws/

I followed the aCloudGuru course for the Certified Big Data Speciality: https://acloud.guru/learn/aws-certified-big-data-specialty

linuxAcademy course available here: https://linuxacademy.com/amazon-web-services/training/course/name/aws-certified-big-data

Pluralsight course available here: https://www.pluralsight.com/courses/big-data-amazon-web-services

Re:INVENT Videos

I watched the following re:INVENT videos on YouTube to supplement my online learning, search “reinvent wxyz YYYY”:

Redshift

BDM402 2016

ABD304 2017

GPSTEC315 2017

EMR

BDT305 2015

BDM401 2016

ABD305 2017

DynamoDB

DAT310-R 2017

Kinesis

ABD301 2017

Free Answers

I’m not here to give you free answers, but keep an eye out for the following phrases/learning points as you approach the exam:

1) Redshift Distribution Keys

2) Redshift Star Schemas

3) Redshift Distribution Keys

4) Redshift data loading — slices

5) Machine Learning Models — what they’re called, what they’re used for, what the outputs are (brief)

6) Have a clear understanding between Kinesis Streams and Kinesis Firehose

7) EMR –

a. How/when a metastore is used

b. Why consistent views are used

8) DynamoDB architecture

a. Performance considerations

b. GSI/LIS use

9) Security

a. S3 bucket ACLs

b. Sending log data to Cloudwatch/ElasticSearch

c. Encryption Options on EMR

--

--

corcoran

Is this bio too short? Or is it just the right length?