Evaluating machine learning for projecting completion rates for VET programs

By Michelle Hall, Melinda Lees, Cameron Serich, Richard Hunt Technical paper 19 April 2023 978-1-922801-11-1

Description

This technical paper evaluates the effectiveness of using machine learning approaches to calculate projected completion rates for VET programs compared with the current Markov chains methodology. Projected rates were calculated for the 2016 commencing cohort with completion rate predictions using machine learning algorithms generally more accurate than the rates achieved using Markov chains methodology. While the machine learning model results show how well the model can generate projected rates for the 2016 commencing year, it is untested how well the model is able to consistently make accurate predictions for other commencing years and if assumptions underlying either methodology remain valid for years where training may have been disrupted by the pandemic.

Summary

About the research

This paper summarises exploratory analysis undertaken to evaluate the effectiveness of using machine learning approaches to calculate projected completion rates for vocational education and training (VET) programs, and compares this with the current approach used at the National Centre for Vocational Education Research (NCVER) — Markov chains methodology.

NCVER publishes annual observed VET qualification completion rates for qualifications that commenced four years prior to the most recent data collection period, based on the assumption that sufficient time has passed for all students who intended to complete their qualification to have done so. Projected rates are published for the more recent years, as the actual completion rates cannot be known until enough time has passed for the qualifications to be completed and the outcomes reported to NCVER.

While the Markov chains methodology currently used by NCVER has demonstrated that it is reliable, with predictions aligning well with the actual rates of completion for historical estimates, it has not been reviewed for some time and it does have some limitations. The evaluation of machine learning techniques for predicting VET program completion rates was undertaken to overcome some of these limitations and with a view to improving our current predictions.

This report includes:

  • an overview of the methodologies: Markov chains and two machine learning algorithms that were applied to predict completion rates for VET programs (XGBoost and CatBoost)
  • a comparison of the accuracy of the predictions generated by both methodologies
  • an evaluation of the relative strengths and limitations of both methodologies.
  • For the 2016 commencing cohort, the completion rate predictions using machine learning algorithms were generally more accurate than the rates achieved using Markov chains methodology. When evaluated against actual published completion rates:

Key messages

  • For the 2016 commencing cohort, the completion rate predictions using machine learning algorithms were generally more accurate than the rates achieved using Markov chains methodology. When evaluated against actual published completion rates:
    • The ‘XGBoost’ machine learning approach produced the most accurate predictions overall, with a high level of recall and precision.
    • The ‘XGBoost’ machine learning approach also had fewer instances where the prediction for a training attribute deviated from the actual completion rate by more than three percentage points, as compared with the Markov chains methodology.
  • Both projection approaches have strengths and limitations:
    • The key advantage of Markov chains theory is that the projected rates are calculated from a three-year period of recent enrolments (and their transitions between enrolment states), without requiring the full history of all qualification enrolments. That said, a key limitation of this methodology is the 12-month delay before projected rates can be calculated, the reason being that the calculation of the transitional probabilities that form the basis for the completion rate projection for a given year relies on data that includes the following year.
    • Markov chains projected completion rates for VET qualifications commencing in the most recent years are overinflated (particularly the current year projections). The alignment of projections to actual rates improves as time passes and as more records reach their final state of ‘completed’ or ‘discontinued’.
    • One of the key advantages anticipated by the adoption of a machine learning model for predictions is the timeliness of the predictions. The machine learning model is anticipated to allow projections to be calculated for a new cohort as soon as the enrolment data are received from the various training providers. However, this method relies on a four-year window of historical training activity data to train the model.
    • While the results from the machine learning model demonstrate how accurately the model can generate projected rates for the 2016 commencing year, the model’s ability to consistently make accurate predictions for other commencing years is as yet untested.
    • Due to the significant disruption to the VET sector from the COVID-19 pandemic, it is not clear whether the assumptions underlying either methodology remain valid for the years where training may have been disrupted by the pandemic.

Download

TITLE FORMAT SIZE
Evaluating machine learning for projecting completion rates for VET programs .pdf 1012.9 KB Download
Evaluating machine learning for projecting completion rates for VET programs .docx 1.2 MB Download

Related items

This technical report provides an evaluation of three alternative methodological approaches for calc… Show more