We are not a course on Python. This is a serious data science program for candidates with serious quantitative backgrounds. We consider programming languages tools to accomplish our goals. Candidates can expect to dive into the theory and build robust models from ground up in order to cater them to the problem at hand.
With support from our industry experts, candidates will navigate the stages of machine learning from theory to deployment. A strong foundation in supervised, unsupervised and deep learning approaches will be established before candidates branch off and specialize in a particular area during the capstone project.
Before we embark on the adventure of building a model, we must explore the data to uncover patterns, identify new avenues of interest and extract impactful variables that drive a particular outcome.
Data exploration is an iterative process that combines data processing, feature engineering and visualization to support a proposed hypothesis or set of hypotheses.
Data Management and Processing
Data Scientists in today's industries are confronted with data ranging from simple tables to files that are multiple terabytes in size. The source and format of these data are equally variable.
A core emphasis is placed on developing the skills required for proper management and scalable processing of a variety of data formats that you will encounter in the industry. You will gain exposure to Extract, Transform and Load (ETL) operations through tools such as SQL, Python, R, Hadoop and Spark from industry professionals.
Data visualization is an essential component of the data science workflow. Through visualization, high-level patterns and insights can be quickly derived to guide analyses. Data visualization is also vital in a production setting where key insights must be conveyed to stakeholders who will use this information in their decision making processes.
Candidates will learn how to gracefully complement their analyses and meet industry expectations when summarizing important insights through R, Python and Tableau. More sophisticated dimension reduction techniques for high-dimensional visualization will also be rigorously covered.
Expect to find massive amounts of data in the industry. When it comes to processing and training our models on these data, we need to rethink our strategies and use more advanced tools and techniques. With AQM's Google Cloud infrastructure, you will learn how to wrangle “big” data and scale your machine learning models to tackle the largest of problems.
The theoretical aspects of parallelization, as well as the technical aspects of setting up a cloud-computing platform will also be covered in detail and may be leveraged during the capstone project.