- Throughout AQM, familiarity with statistical programming and scientific computing in R and Python will be emphasized.
- Core computer science skills such as algorithm development, analytic problem solving and goal oriented development and development cycles of programs will be incorporated throughout the curriculum.
- GitHub will be used extensively throughout AQM as a collaborative tool for version control of software projects.
- Students will be introduced to powerful tools for data wrangling, exploration and visualization. These tools are used extensively in practical data science and knowledge of effective use of them is crucial for any data science related role.
- The students will have opportunities to apply these tools and others first on example databases and apply the same tools and concepts to larger more complex data-sets such as the translink data-set from the previous semester.
Regression, Classification, Resampling methods
- In this section applications of linear least-squares regression and classification will be covered.
- The relationship between with matrix algebra and regression using matrix algebra as a formalism to mathematically represent regression will be emphasized.
- Students will progress from smaller learning examples and move to larger projects with more complex data-sets.
Topics in Regression: least squares, GLM, assumptions testing, modelling, diagnostics, statistical significance testing, ANOVA.
Topics in Classification: Logistic regression, Linear and quadratic discriminant analysis, K-nearest neighbours.
- This section will proceed with introductions to probability and probabilistic modelling with concrete examples at the beginning and then eventually to more abstract descriptions of distributions, expectation and inference.
- Aspects of probabilistic simulation and Monte Carlo methods and their applications in practical statistics will be covered.
- In practice, data-sets are not so nice. We will learn about what we can do if parameters of a distribution and initial assumptions change throughout the analysis of a data-set.
- Advanced regression techniques and theoretical considerations of linear least squares will be covered in the section once the probabilistic background has been developed.
- Further applications to Bayesian statistics, stochastic processes and statistical learning will be covered.
Practical Modelling and Machine Learning
- Once students have a comfortable grasp of the basics of R and applied statistics, practical modelling techniques and machine learning models and their applications will be introduced.
- Concepts such as cross validation, training and test sets and accuracy measures for predictive models will be covered.
- Briefly the principles behind machine learning, such as optimization and paradigms of learning (supervised, unsupervised, reinforcement learning) will be covered.
- Students will have opportunities to apply machine learning models to classify, forecast and find patterns in data, compare results and gain a practical understanding of machine learning models in R and beyond.
The capstone project will combine all aspects of practical statistics, algorithm design, program development in R and problem solving skills in an effort to solve a problem or discover insights for a large multi-million dollar company using a large data-set that they provide. The capstone projects are student lead and allows student teams to demonstrate their knowledge as well as showcase their creative skills in tackling and overcoming real-world data analysis problems