Paul Briërley

Calculating Student Pathways in Excel

A common starting point for planning the forthcoming year's timetable is to look at what combinations of courses students enrolled for in the previous year.

Unfortunately, most timetabling or student record systems don't make it easy to obtain this information. It's easy enough to pull out lists of which students were enrolled on which course units, but they typically come out with one choice per row, like this:

Student IDStudent Choice
010123MATH102
010123PHYS104
010123MATH101
Table 1 - Student choice data.

As timetablers, we want to work out the unique combinations of student choices and the number of students for each combination. That transforms the data to look like this:

PathwayStudents
MATH101, MATH102, PHYS1041
Table 2 – Desired output.

This sort of transformation isn't straightforward. I used to do it by copying the student IDs to a new worksheet and using the "remove duplicates" feature in Excel, leaving me with one row per student. I would then write a VBA macro to find the choices for each student, sort them into alphabetical order and then concatenate them. Once this had run, I would then create a pivot table to count how many students had the same combinations of choices. This approach worked, but it was slow and fiddly – it was also difficult to troubleshoot for non-technical users.

However, it turns out that Excel for Microsoft 365 has some new features which massively simplify this process. If you have this version of Excel you can now perform the whole transformation using native Excel functions, no VBA or pivot tables required.

Step 0 – Sort your data before you start!

The first and most crucial step is to sort your student enrolment data alphabetically by course unit / module code. This will ensure that our process knows "MATH101, MATH102" is the same as "MATH102, MATH101". This will change the data shown in Table 1 to look like this:

AB
010123MATH101
010123MATH102
010123PHYS104
Table 3 – Our original data, sorted by course unit code (column B).

Step 1 – Get a list of unique student IDs

The first function we need to use is UNIQUE. This function takes an array of data and returns only the unique rows. Therefore if our student IDs are listed in column A then =UNIQUE(A:A) will give us a list of student IDs with each student appearing only once. We can store this in colum C as shown below.

ABC
010123MATH101010123
010123MATH102
010123PHYS104
Table 4 – We've added column C, the list of unique student IDs.

Step 2 – Get the choices for a student

You can now use the FILTER function to return a list of the course units for a particular student. If we have our original choice data in columns A (student IDs) and B (course unit codes) then =FILTER(B:B, A:A=C1, "") will return the course unit choices for the student ID listed in cell C1.

If you try this out, you'll quickly spot a problem - the course units are listed vertically, which means the choices for a second student would start overwriting the choices of the first...

ABCD
010123MATH101010123MATH101
010123MATH102MATH102
010123PHYS104MATH103
Table 5 – FILTER gives us the student's choices in column D, but listed vertically.

Step 3 – Concatenate the choices

Our third and final function is TEXTJOIN. This will take the list of values in the previous step and concatenate them into a single string. We need to wrap TEXTJOIN around our previous FILTER function so it looks like =TEXTJOIN(",", TRUE, FILTER(B:B, A:A=C1, "")). This will connect the codes using a comma, and the "true" indicates that we should ignore any blank entries. Now our spreadsheet looks like this:

ABCD
010123MATH101010123MATH101, MATH102, MATH103
010123MATH102
010123PHYS104
Table 6 – Adding TEXTJOIN gives us the data we want in columns C and D.

Step 4 – Things to be aware of

In theory these three functions – UNIQUE, FILTER and TEXTJOIN give us exactly what we want. In practice, if you have a large number of students then Excel will struggle to process these. Be prepared for it to take a while to produce your final results.

But also, because this processing takes a while Excel cheats a little. It only performs the function for the data you're looking at. If you save the spreadsheet you'll find that the later rows just contain duplicates of the top rows. You actually need to scroll down to the bottom of the spreadsheet to make sure that Excel actually calculates all the combinations.

It's a little fiddly, but much simpler than the options we've had available to user before now.