Data Scientists Course

2 days | Online or in-person

Class discussing a lesson on table and at the whiteboard

This is a two-day course for professionals who can, or already have, built data-driven models. The course aims to develop and enhance the technical skills necessary for building systems that use machine learning to make automated decisions whilst accounting for ethical objectives.

“The calibre was high, it felt rather authentic and each of the presenters was passionate. Your team truly did a great job and should be commended for being so engaging in the sessions.”

Data Scientist

By the end of the course, participants will have explored, tested and scrutinised simple model systems that rely on machine learning to make automated decisions whilst also accounting for ethical objectives. Participants will understand and delved into some of the technical pitfalls that prevent machine learning systems from behaving ethically, and how to identify and correct for these. While many of the concepts discussed in this course are applicable across a wide range of AI systems, the course primarily focuses on models built using structured and labelled training data.

This course is for people who have experience building data-driven models, interpreting graphs and are comfortable discussing terms such as “parameter optimisation”, “overfitting” and “model validation”. Exercises and activities are based on interactive models and visualisations; however, no coding is needed during the course.

The course is run in classes of up to 15 participants and led by two instructors from Gradient Institute’s team of machine learning specialists. At the start of each topic, there is a short presentation on key concepts, followed by class discussion. Participants also learn by working through exercises and examples in open-source Jupyter Notebooks. The notebook solutions and the presentation material are provided after the course as a reference.


Automated decision making

A review of the foundations of machine learning and model validation, with an emphasis on ensuring a strong conceptual understanding and the ethical implications of algorithmic decision making. Covers core concepts underlying supervised learning, overfitting and underfitting, model uncertainty, plus classification and regression.


Automated decision making and uncertainty (advanced option)

An alternative to the automated decision-making module for more advanced participants with coding skills. The foundations of machine learning are reviewed, the importance of quantifying uncertainty for ethical decision-making is discussed and some approaches for estimating uncertainty with machine learning models are explored. Other topics are machine learning as optimisation; estimating parameterised uncertainty via maximum likelihood methods; quantifying model uncertainty with Bayesian modelling and MCMC; and bootstrapping models.


Loss functions and robust modelling

Building a data-driven automated decision system requires explicitly specifying its objectives, often in the form of a loss function and constraints. The particular choice of the loss function – including what considerations to omit and include – are the primary mechanism of control which designers have over the ethical operation of a system. In data-driven systems, losses are specified with respect to data and rely on assumptions about that data. We examine design choices and assumptions that are made when translating a real-world problem into an algorithmic decision-making system and the ethical issues that can arise from this process. Other topics include encoding values in loss functions; cost-sensitive classification; calibration and decision-making based on predicted probabilities; and dataset shift.


Causal versus predictive models

Machine learning models rely on correlations in data to predict outcomes, on the assumption that the data-generating process is fixed. Where models are used to drive decisions and interventions, failure to consider causality can lead to poor results and unintended consequences despite good intentions. We clarify the distinction between causal and predictive models and how they can be used and interpreted. Other topics include identifying when a causal model is required; and understanding Simpson’s Paradox.


Fair machine learning

Machine learning systems perform well on average, but still systematically err, or discriminate against individuals or groups, in the wider population. We examine some of the common notions of algorithmic fairness that seek to measure and correct for such disparate treatment or outcomes in machine learning systems. Other topics include sources of unfairness in machine learning models; fairness metrics; and approaches to removing bias.


Interpretability, transparency and accountability

These approaches help identify when models might break down, when they lack vital context, and whether they have been designed and motivated in an acceptable way. An introduction to some of the tools and techniques available for making models more interpretable and transparent will be provided, and how to communicate key information about model behaviour and ethical risks to those ultimately accountable for the system will be discussed. Other topics include motivations and audience for interpretability; feature importance and partial dependence and causality; local interpretability and LIME; and global interpretability and surrogate models.


Applied project

The final component of the course is a project that challenges participants to put the concepts they have learned into practice. They work in teams to analyse an algorithmic system, identify potential ethical issues, propose solutions and present the results to the rest of the class at the end of the day.