Calculating Student Pathways in Excel
A common starting point for planning the forthcoming year's timetable is to look at what combinations of courses students enrolled for in the previous year.
Unfortunately, most timetabling or student record systems don't make it easy to obtain this information. It's easy enough to pull out lists of which students were enrolled on which course units, but they typically come out with one choice per row, like this:
Student ID | Student Choice |
---|---|
010123 | MATH102 |
010123 | PHYS104 |
010123 | MATH101 |
As timetablers, we want to work out the unique combinations of student choices and the number of students for each combination. That transforms the data to look like this:
Pathway | Students |
---|---|
MATH101, MATH102, PHYS104 | 1 |
This sort of transformation isn't straightforward. I used to do it by copying the student IDs to a new worksheet and using the "remove duplicates" feature in Excel, leaving me with one row per student. I would then write a VBA macro to find the choices for each student, sort them into alphabetical order and then concatenate them. Once this had run, I would then create a pivot table to count how many students had the same combinations of choices. This approach worked, but it was slow and fiddly – it was also difficult to troubleshoot for non-technical users.
However, it turns out that Excel for Microsoft 365 has some new features which massively simplify this process. If you have this version of Excel you can now perform the whole transformation using native Excel functions, no VBA or pivot tables required.
Step 0 – Sort your data before you start!
The first and most crucial step is to sort your student enrolment data alphabetically by course unit / module code. This will ensure that our process knows "MATH101, MATH102" is the same as "MATH102, MATH101". This will change the data shown in Table 1 to look like this:
A | B |
---|---|
010123 | MATH101 |
010123 | MATH102 |
010123 | PHYS104 |
Step 1 – Get a list of unique student IDs
The first function we need to use is UNIQUE
. This function takes an array of data and returns only the unique rows. Therefore if our student IDs are listed in column A then =UNIQUE(A:A)
will give us a list of student IDs with each student appearing only once. We can store this in colum C as shown below.
A | B | C |
---|---|---|
010123 | MATH101 | 010123 |
010123 | MATH102 | |
010123 | PHYS104 |
Step 2 – Get the choices for a student
You can now use the FILTER
function to return a list of the course units for a particular student. If we have our original choice data in columns A (student IDs) and B (course unit codes) then =FILTER(B:B, A:A=C1, "")
will return the course unit choices for the student ID listed in cell C1.
If you try this out, you'll quickly spot a problem - the course units are listed vertically, which means the choices for a second student would start overwriting the choices of the first...
A | B | C | D |
---|---|---|---|
010123 | MATH101 | 010123 | MATH101 |
010123 | MATH102 | MATH102 | |
010123 | PHYS104 | MATH103 |
Step 3 – Concatenate the choices
Our third and final function is TEXTJOIN
. This will take the list of values in the previous step and concatenate them into a single string. We need to wrap TEXTJOIN
around our previous FILTER
function so it looks like =TEXTJOIN(",", TRUE, FILTER(B:B, A:A=C1, ""))
. This will connect the codes using a comma, and the "true" indicates that we should ignore any blank entries. Now our spreadsheet looks like this:
A | B | C | D |
---|---|---|---|
010123 | MATH101 | 010123 | MATH101, MATH102, MATH103 |
010123 | MATH102 | ||
010123 | PHYS104 |
Step 4 – Things to be aware of
In theory these three functions – UNIQUE
, FILTER
and TEXTJOIN
give us exactly what we want. In practice, if you have a large number of students then Excel will struggle to process these. Be prepared for it to take a while to produce your final results.
But also, because this processing takes a while Excel cheats a little. It only performs the function for the data you're looking at. If you save the spreadsheet you'll find that the later rows just contain duplicates of the top rows. You actually need to scroll down to the bottom of the spreadsheet to make sure that Excel actually calculates all the combinations.
It's a little fiddly, but much simpler than the options we've had available to user before now.