When applying machine learning in the wild we often face situations where we deploy a model in conditions that differ from the ones the model was trained in. A popular example is vocabulary drift in natural language processing systems. Under such circumstances, there are no guarantees that predictions are still accurate and often the only solution is to retrain the model on new, fresh data.

In this talk we explore how machine learning models are affected by changes in the underlying data generating process and look at methods that allow us to identify such changes.

In the last part of the talk we will go one step further and see how we can build predictive models in the face of dataset shift.

Peter Prettenhofer

Peter Prettenhofer is the VP of Engineering at DataRobot. He studied computer science at Graz University of Technology, Austria and Bauhaus University Weimar, Germany, focusing on machine learning and natural language processing. He is a contributor to scikit-learn where he co-authored a number of modules such as Gradient Boosted Regression Trees, Stochastic Gradient Descent, and Decision Trees.

Event Timeslots (1)

Track A (Upper Floor)
Peter Prettenhofer