Project purpose
The purpose of this project is to further evaluate, refresh, and refine the machine learning models that NCVER has developed for estimating apprentice and trainee (A&T) and VET completion rates, with the view to extend the evidence for their effectiveness and to improve the usability of the models.
The NCVER publishes annual VET qualification and A&T completion rates. A life tables approach is used to project completion rates for apprentice and trainee contracts and a Markov chains methodology is used for VET qualifications.
The initial machine learning model projections developed by NCVER were generally comparable and, in some instances, more accurate than the current methodologies used in predicting actual rates. However, the evaluation of their performance needs to be extended to additional commencing years and to various cohorts of interest. In addition, the efficiency and maintainability of the machine learning models could be improved through further refinements.
Research questions
The proposed analysis is designed to address the following questions:
- How well do the machine learning models generalise to future cohorts?
- How do the machine learning projections compare to those generated by the current methodology over time?
- Can the machine learning models produce accurate predictions for various student/training cohorts?
- Can alternative machine learning algorithms such as gradient boosting methods provide a more user-friendly approach to projecting A&T completion rates, as well as achieving alignment with the VET completion rates machine learning methodology, without compromising the accuracy of predictions?
Methodology
The evaluation and refinement of the machine learning models for the apprentice and trainee and VET completion rates models will be undertaken as independent streams of work.
Apprentice and trainee model evaluation and refinement stages:
- Retrain the current machine learning models with alternative gradient boosting algorithms
- Evaluate how well the models generalise to more recent commencing cohorts
- Comparison of deep learning and (or) gradient boosting model outputs with current methodology over time
VET model evaluation and refinement stages:
- Evaluate how well the machine learning model generalises to more recent commencing cohorts
- Evaluate the accuracy of the models for various student and training cohorts
- Exploratory analysis with the view to extend the machine learning model to capture completion rate predictions for non-qualifications (e.g., training package skillsets and accredited courses).
- Comparison of XGBoost machine learning model outputs with current methodology over time