I attended this seminar presentation by Mr. Geoff Webb from Faculty of IT at Monash University in April 2016. The seminar focussed on the theoretical tools for analysing non-stationary distributions and discusses insights that they provide. The underlying idea of the presentation was that, “The world is dynamic – in a constant state of flux – but most learned models are static. Models learned from historical data are likely to decline in accuracy over time.”
Date: Wednesday 13 April, 2016
Time: 2:00pm – 3:00pm
Venue: Clayton CL_26_G12A, VC to 7.84 Caulfield
My PhD project deals with non-stationary data from learning management system and the notion of “concept-drift” is highly prevalent in these environments. The seminar highlighted that models getting out of sync and/or out of date is a serious problem in real world. Google’s Flu Trend prediction system was one such famous project that had to succumb to the reality of concept drift.
Important Dimensions of Concept Drift
- Re-learning (preferred in practice) and/or Incremental Learning (most researched)
- Window / Ageing – notion of decay (used in practice)
- Detention and Fixed Schedule (when to update the model when drift is detected)
Summarising the presentation (as per the presenters conclusion):
- Concept drift is a critical problem
- Much current work is ad-hoc
- Quantitative models provide rigorous theoretical basis for
- mechanisms to detect, characterise and resolve drift.
- understand best forms of drift best handled by each mechanism
- develop synthetic drift data generators.
- evaluating stream mining algorithms and
- designing incremental learning techniques that are robust to a diversity of situations
- Need different windows in different attribute subspaces.
- Need different bias-variance profiles for different windows.
- Useful to exploit cycles
- Useful to map drift
- Do we need high-dimensional forecasting ???
Overall, the seminar was very enlightening and will contribute positively towards my development as a research student.