Evaluation, refinement and advancement of apprentice and trainee and VET completion rate machine learning models

Commenced

June 2023

Estimated publish date

June 2025

Principal researcher(s)

Michelle Hall, Senior Data Analyst, NCVER
Melinda Lees, Team Leader, NCVER

Research sponsor(s)

N/A

Contact

Melinda Lees, Team Leader, Research & Data Analytics MelindaLees@ncver.edu.au +61 8 8230 8487

Project code

50105

Project purpose

The purpose of this project is to further evaluate, refresh, and refine the machine learning models that NCVER has developed for estimating apprentice and trainee (A&T) and VET completion rates, with the view to extend the evidence for their effectiveness and to improve the usability of the models.

The NCVER publishes annual VET qualification and A&T completion rates. A life tables approach is used to project completion rates for apprentice and trainee contracts and a Markov chains methodology is used for VET qualifications.

The initial machine learning model projections developed by NCVER were generally comparable and, in some instances, more accurate than the current methodologies used in predicting actual rates. However, the evaluation of their performance needs to be extended to additional commencing years and to various cohorts of interest. In addition, the efficiency and maintainability of the machine learning models could be improved through further refinements.

Research questions

The proposed analysis is designed to address the following questions:

How well do the machine learning models generalise to future cohorts?
How do the machine learning projections compare to those generated by the current methodology over time?
Can the machine learning models produce accurate predictions for various student/training cohorts?
Can alternative machine learning algorithms such as gradient boosting methods provide a more user-friendly approach to projecting A&T completion rates, as well as achieving alignment with the VET completion rates machine learning methodology, without compromising the accuracy of predictions?

Methodology

The evaluation and refinement of the machine learning models for the apprentice and trainee and VET completion rates models will be undertaken as independent streams of work.

Apprentice and trainee model evaluation and refinement stages:

Retrain the current machine learning models with alternative gradient boosting algorithms
Evaluate how well the models generalise to more recent commencing cohorts
Comparison of deep learning and (or) gradient boosting model outputs with current methodology over time

VET model evaluation and refinement stages:

Evaluate how well the machine learning model generalises to more recent commencing cohorts
Evaluate the accuracy of the models for various student and training cohorts
Exploratory analysis with the view to extend the machine learning model to capture completion rate predictions for non-qualifications (e.g., training package skillsets and accredited courses).
Comparison of XGBoost machine learning model outputs with current methodology over time