Unpacking data science in an education context

Issue: Volume 101, Number 2

Posted: 23 February 2022
Reference #: 1HASxy

Data is everywhere and it’s used by everyone, but what is it and how is it used in schools and kura? Ministry of Education chief data scientist Chris Casey explains.

data science

Data is an observation, a measurement, a comment, a place, a description, a person, an address, the position of an atom, the magnitude of an earthquake, the speed of a vehicle, the contents of your shopping basket, a score in a test, even something as seemingly ephemeral as a colour descriptor.

Data is all of these and a whole lot more. The neat thing about all of these types of data is that they can be codified – converted to a basic set of symbols that both humans, and machines can read: letters and numbers.

Lots of fancy names have been applied to the art of how computers process data – Artificial Intelligence (AI), Machine Learning (ML), Data Mining (DM), Business Intelligence (BI), Natural Language Processing (NLP) to name a few.

Don’t let the jargon scare you though. All of these things rely on simple algorithms to process, sort, extrapolate, classify and clean large amounts of letters and numbers. The computer code normally consists of many ‘IF this, THEN do that’, statements. It’s not rocket science – it’s data science!

Data science is an inter-disciplinary field that encompasses computer programming, applied statistics and mathematics, and subject matter expertise in almost equal parts. It is used to process large amounts of  data and turn this into information that a human may then derive insight from.

We often hear the term data mining, and the metaphor fits well. Data is like the ore in the ground, the bauxite. It’s mined and refined by machines and their algorithms and turned into alumina in this case. More complex machines and even smarter algorithms smelt and turn this refined data into information, into shiny aluminium metal. Finally (and crucially) humans design useful tools and implements from this metal – they turn the information into useful and actionable insight.

This is key – the insight or intelligence should be actionable; there is absolutely no point doing all that mining and refining if the end product has no “so what?”

Data science in education

So, how do we begin in education? It begins with the data, the ore from which the insight we want is derived. In our education system we rightly do not want to reduce our ākonga to just letters and numbers – of course they are so much more than that.

Our students have context, they have stories and lived experiences, but to simplify data collection we start with the basics:

  • The System (school sizes, locations, types, number of staff and their demographics)
  • Student demographics (age, gender, ethnicity, location)
  • Administrative data (enrolment dates, year levels, interventions) and finally,
  • Attendance and Achievement Data which deserve their own categories such is their importance.

In terms of measuring ākonga engagement (which is a huge factor in a student’s success and wellbeing), attendance data is the most critical piece of a complex puzzle.

It’s exact and tells a compelling story on a student, regional and national basis. We know we have falling attendance nationally over the past decade, but we don’t know the exact causes or triggers.

Collecting timely and accurate attendance data lets us apply data science tools to cross reference attendance with other national datasets such as socio-economic, macro-economic, housing, welfare and migration trends.

There may be bigger forces at play, links we’ve not been able to point to or prove existed except in conjecture and hearsay.

This is the power of good data, objective analysis without bias. But therein lies the rub – we must collect the data without bias.

We must also learn to capture data that better describes our students and their outcomes. NCEA results are only a small part of what a student gets from their education.

Support and tools for schools and kura

The Analysis and Insights team at the Ministry welcome interaction with schools and are happy to explore any ideas on data and its collection.

Fortunately, some of the best tools for data science are open-source (you can just download them) and they’re free. Many of these tools can also be used by teachers with their students.

The current doyen of the applied statistics crowd is a software tool that has its origins at the University of Auckland. It’s called, rather sardonically, ‘R’.

The beauty of this tool is that on its own, it’s a fantastic analysis and plotting tool (invaluable as a teaching aid for statistics and basic programming) but even better than that, there is an ever-expanding collection of ‘packages’ that can be downloaded to provide functionality and tools that can answer and solve almost any data science problem in existence.

The online community is massive and ubiquitous. No question goes unanswered and sample code and examples exist on a multitude of topics. R can read data from many formats, including those provided by school Student Management Systems (SMSs) as well as data extracts available from Education Counts and the Ministry’s own collections.

There are other free tools too – ‘Python’ is a slightly more advanced language but not beyond the realms of senior students and teachers. And not to discourage the use of more prosaic tools such as Microsoft Excel and Google’s Sheets – these tools can handle large quantities of data – but were not designed with data science in mind, unlike R and Python.

Keys to good data science

There are two keys to doing good data science: collection and connection.

Collection involves choosing the right sort of data to collect at an early enough time to build a decent data set.

Often, we need to be prescient about the sort of things we might want to ask – and start collecting the data well before we need to tease an insight from it.

A good example of this is attendance. We may collect a year’s worth of data and decide that attendance in term 4 is lower than the other terms and therefore attendance is falling.

If, however, we had collected attendance data for the past five years we may well have seen that in all cases term 4 attendance is lower than term 1, but on average the difference is getting smaller, so in fact, we may be seeing actual improvement.

Also, if we had used minutes attended rather than half-days attended, the more granular measurement may have uncovered more subtle trends.

These collection ideas may be summed up in that now national catchphrase – ‘go early and go hard!’ The best time to start collecting data is yesterday, and the best data to collect is as much as you can.

Connection is the idea that data should be connected to the stories of our learner and the story we’re attempting to tell.

If we’re interested in measuring wellbeing, we must choose data that accurately, and in an unbiased way, connects a person to their wellbeing.

We must ask hard questions of the data – is it available for all of our people? how do we cope with gaps? will any people be left unconnected, with no data?

Only when we connect the data to each person can we then connect all of the dots to get the trends, insights and stories. Connection also reminds us that the data we have refers to real people and we must always be mindful how we use that data – at all times preserving the privacy, mana and dignity of our students and schools.

The 4 Rs: Reading, Writing, Arithmetic and Data

Where to from here? Start with the R software or even just Excel. Try importing files from your school’s SMS or go online to some of the data resources shared in this article.

Persevere, get your students who are digital natives, to help.

The ability to gather data and work with it is fast becoming as important as literacy and numeracy – in fact it combines the two. Data science can be embedded into almost any subject, adding both richness and extra interest for students. And it’s not just academic – it’s a highly marketable and useful skill for teachers and students alike. So, dive in! 

Storytelling with data

Data science allows us to overlay school enrolment data, geo-coded student data (randomly moved slightly - jittered - for privacy) and LINZ road data. Here we see the fascinating interplay of socio-economic, school-zoning and parental-choice forces causing clustering and spreading of the student catchment for the three Thorndon colleges. Compare the average commutes of students to St Mary’s versus those to WGC. What are the effects on student well-being and quality learning time? We can further overlay attendance and attainment data to widen the scope of our questions and theories.
Every layer of data makes our storytelling that much richer.

The Thorndon Girls' College Storytelling

The Thorndon Girls' College Storytelling

Data science resources

For more information and resources, contact
chris.casey@education.govt.nz or analytics.insights@education.govt.nz

BY Education Gazette editors
Education Gazette | Tukutuku Kōrero, reporter@edgazette.govt.nz

Posted: 12:38 PM, 23 February 2022

Get new listings like these in your email
Set up email alerts