Visualization of Enrollment Data using Chord Diagrams

Laia Blasco-Sopl

, Josep Grau-Valldosera

and Juli

a Minguill

Computer Science, Multimedia and Telecommunication Studies, Universitat Oberta de Catalunya,

Rambla Poblenou 156, Barcelona, Spain

Marketing Department, Universitat Oberta de Catalunya, Rambla Poblenou 156, Barcelona, Spain

Keywords:

Enrollment, Dropping Out, Chord Diagrams, Radial Visualization, Distance Education, Higher Education.

Abstract:

Distance and online universities have usually more ﬂexible academic requirements than brick-and-mortar ones,

so students do not uniformly advance within the programme they enroll into as they are supposed to. Further-

more, due to their intrinsic nature, distance students need additional support for deciding which subjects they

will take every semester. On the other hand, teachers have usually no idea about what other subjects are tak-

ing the students enrolled in their subjects. This paper proposes the use of chord diagrams for visualizing the

relationships between pairs of subjects taken by students, with the aim of providing both university managers

and students with a clear picture of possible bottlenecks, that is, combinations of subjects that might affect

students’ performance. Chord diagrams can be used to analyze intra-semester data (subjects taken simultane-

ously) and inter-semester data (sequences of subjects taken by students) and provide a fast overview of what

is happening within a given programme at subject level. Furthermore, it is possible to interact with chord

diagrams in order to ﬁlter and obtain additional subject details if desired.

1 INTRODUCTION

Enrollment and dropping out in higher education are

two sides of the same coin. In the case of distance

and online universities, dropping out mostly occurs

after the end of the ﬁrst academic semester or year

(Grau-Valldosera and Minguill

on, 2014; Tyler-Smith,

2006). In (Campbell et al., 2007), the authors de-

scribe several examples of learning analytics related

to enrollment, some of them oriented towards improv-

ing student retention. Usually, all these studies are

used-centered, trying to determine whether a student

will be accepted or not, or trying to predict her aca-

demic performance. Although such data-driven de-

cision support systems can be built using historical

enrollment data (see (Sac

ın et al., 2011; Siraj and

Abdoulha, 2009), for instance), we are more inter-

ested in obtaining a fast overview of what is happen-

ing within a given programme, adopting a subject-

centered approach and following the basic premises

of information visualization (Card et al., 1999). In

(Grau-Valldosera and Minguill

on, 2014), the authors

proposed a novel deﬁnition of dropping out adapted

to the speciﬁc characteristics of distance universities,

based on the minimum number of break semesters (N)

a student takes, needed to determine that such student

never enrolls again (i.e. drops out), for a given er-

ror rate (less than 5% of students come back after N

semesters). Using such deﬁnition, the authors were

able to compute the dropping out rate for different

programmes, which lead to strong evidences: half of

the dropouts occur after the ﬁrst academic semester,

and up to 75% of accumulated dropouts occur after

the second semester (i.e. the ﬁrst year). As most stu-

dents take only two or three subjects during their ﬁrst

semester, they should know which combinations are

known to be problematic, using accumulated enroll-

ment data. Therefore, it is very important to provide

both university managers and students with visualiza-

tion tools that allow them to detect possible bottle-

necks for a given programme and help them to adjust

their expectations about subject enrollment, respec-

tively. As stated in (Park and Choi, 2009), “internal

factors such as subject design strategies and learners’

motivation should be prioritized at the subject devel-

opment stage in order to make the subject participa-

tory and interesting and to keep learners engaged”.

We think that these strategies should be moved one

level upward, trying to detect problems not only at

subject level but at programme level, visualizing how

students advance within programmes and the possible

barriers caused by wrong enrollment decisions.

This paper is structured as follows: the dataset

with enrollment data used in this paper is analyzed

511

Blasco-Soplón L., Grau-Valldosera J. and Minguillón J..

Visualization of Enrollment Data using Chord Diagrams.

DOI: 10.5220/0005360505110516

In Proceedings of the 10th International Conference on Computer Graphics Theory and Applications (GRAPP-2015), pages 511-516

ISBN: 978-989-758-087-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

Table 1: subjects and degree plans for the Economics pro-

gramme.

S subjects

1 01.001 01.079 01.003 01.004 00.010 00.002

2 01.005 01.006 01.007 01.078 01.009 00.003

3 01.080 01.021 01.086 01.020 00.004 x.x

4 01.014 01.015 01.012 01.087 01.011 x.x

5 01.018 01.019 01.016 01.022 x.x x.x

6 01.008 01.023 x.x x.x x.x x.x

in Section 2. Section 3 proposes the use of chord di-

agrams for visualizing subject relationships with re-

spect to enrollment. Finally, conclusions and future

work are stated in Section 4.

2 ENROLLMENT DATA

The dataset used in this paper is taken from Universi-

tat Oberta de Catalunya (UOC) academic databases.

Only valid enrollments have been included, i.e. ones

that have been formalized and paid for, thus excluding

enrollments that were subsequently canceled. As we

are just exploring the possibility of using chord dia-

grams for visualizing relationships between subjects,

we have chosen the largest programme amongst all

available data, i.e., students enrolled into Economics

between Spring 1999 and Spring 2011, containing

data about 21792 students and 501 different subjects,

giving a total of 328467 subject enrollments during

25 consecutive semesters.

Table 1 shows the expected sequence of subjects

for the Economics programme. It is supposed to be

ﬁnished in 6 semesters (3 years), taking 6 subjects ev-

ery semester. Here x.x means subjects chosen from a

pool of optional subjects or from other programmes.

This structure is only a recommendation, so students

can take subjects in any order and number. Actually,

the only requirement is subject 00.010 (a basic sub-

ject on online competencies) which is mandatory for

all students during the ﬁrst semester. A more com-

plete version of Table 1 is what students have before

they decide which subjects they will enroll into, in-

cluding information about each subject. As expected,

due to the nature of distance students (most of them

have a full-time work, family responsibilities, and so),

they usually do not enroll into a complete semester

(i.e. 6 subjects), but fewer. Furthermore, they do

not even follow the predetermined order of subjects,

that is, they can enroll into subjects from the second

semester without having taken subjects from the ﬁrst

one. Therefore, the concept of cohort is completely

undermined.

Table 2 partitiones students according to the num-

ber of subjects (C) they enroll into, as well as the

Table 2: Number of students taking / passing C subjects

during the two ﬁrst academic semesters.

C 1st Sem. Pass 2nd Sem. Pass

0 — 6069 — 3924

1 814 3155 1289 2641

2 5518 5190 5760 4284

3 9451 5102 5286 3367

4 4181 1692 2557 1450

5 1130 388 929 463

6 521 150 354 167

7 or more 177 46 181 60

Total 21792 15723 16356 12432

Mean 3.073 1.78 2.873 1.888

Median 3 2 3 2

1st-3rd Qs. [1,4] [0,3] [1,3] [1,3]

number of subjects they successfully pass. Notice

that students take more subjects in average during the

ﬁrst semester than during the second one. This could

be a sign that students learn to regulate their learn-

ing process by narrowing their enrollment once they

have acquired the experience of being online learners.

As stated in (Kiernan et al., 2004), the process of be-

coming a good “e-learner” depends not only on the

student herself but also on the institutional support,

so the more information the learners have, the better

decisions they might take.

On the other hand, Table 2 shows also that, in

the ﬁrst semester, most students fail to pass all sub-

jects they enrolled into. Preliminary results show that

the most important variable for predicting dropping

out after the ﬁrst semester is the number of subjects

successfully taken, so adjusting the number of sub-

jects taken during the ﬁrst semester becomes a key

issue, for both the institution and the students. One of

the main differences between brick-and-mortar uni-

versities and distance ones is that the latter have more

ﬂexible requirements: no minimum enrollment, pos-

sibility of taking one or more consecutive semester

breaks, and so. But, are students following the institu-

tional recommendations? We propose to analyze the

subjects taken by students, in order to determine the

most common subject combinations and try to visu-

ally detect any relationship with the fact of passing or

not a subject. We call this “intra-semester analysis”.

We are also interested in analyzing how students ad-

vance within a given programme, that is, which sub-

jects they will enroll in the next semester according to

the results of the preceding one. We call this “inter-

semester analysis”.

2.1 Intra-semester Analysis

Table 3 shows the most popular subjects taken by

students in their ﬁrst academic semester. From the

GRAPP2015-InternationalConferenceonComputerGraphicsTheoryandApplications

512

Table 3: Number of students (N) taking / passing a given

subject the ﬁrst academic semester sorted by popularity.

Rank subject N Pass Acum. Pct.

1 00.010 15229 10500 15229 69.9 %

2 01.001 7433 4519 17566 80.6 %

3 01.079 6662 3481 18528 85.0 %

4 01.005

∗

6615 3698 19328 88.7 %

5 00.002 4654 2621 19795 90.8 %

... ... ... ... ... ...

10 01.009

∗

1432 756 20790 95.4 %

... ... ... ... ... ...

18 01.020

∗

527 233 21562 98.9 %

total pool of 501 possible subjects, students choose

only among 128, following a long-tail distribution.

Roughly, 90% percent of the students take at least one

of the ﬁrst ﬁve most popular subjects, 95% of the stu-

dents take one of the ﬁrst ten most popular subjects

and 99% take one of the ﬁrst eighteen. These will be

the subjects selected for the visualization using chord

diagrams, in order to see whether the density of the

graph is appropriate enough to include as much as in-

formation as possible without being too complex. No-

tice also that there are students enrolling into subjects

that are not supposed to be taken during the ﬁrst aca-

demic semester (those marked in Table 3 with

∗

). This

is a typical situation in distance universities where

students are not compelled to enroll a predetermined

set of subjects.

Table 4 reproduces the same analysis but for the

most popular subjects taken in the second academic

semester. Notice that in this case, “second” has a

more complex deﬁnition, as we take into account

breaks taken by students. For instance, if a student

takes a break during her second semester, the next

semester she is taking a subject is used as her second

one. Notice also that, as some students drop out after

the ﬁrst semester, the number of students is smaller

(16356). Compared to Table 3, the accumulated per-

centage of students taking one of the most popular

subjects grows more slowly. In fact, students choose

among 181 subjects out of the total pool of 501, so

the number of possible combinations is larger than in

the ﬁrst academic semester. Therefore, in order to an-

alyze the same percentage of students (90%, 95% and

99%) it is necessary to include more subjects in the

visualization (12, 16 and 24 respectively).

This analysis can be repeated for the third and fur-

ther semesters. In summary, in the third semester, at

least one of the 15, 18 and 26 most popular subjects is

taken by the 90%, 95% and 99% of the total students

(13718) from a pool of 200 subjects. In the fourth

semester, ﬁgures are 16, 20 and 30 respectively, for

a total of 11831 students and 191 different subjects.

Notice that 30 different subjects generate





differ-

Table 4: Number of students (N) taking / passing a given

subject the second academic semester sorted by popularity.

Rank subject N Pass Acum. Pct.

1 01.005 4022 2470 4022 24.6 %

2 01.006 3704 2699 6611 40.4 %

3 01.079

∗

3369 2005 8670 53.0 %

4 01.001

∗

3362 2135 10266 62.8 %

5 01.078 2942 2247 11172 68.3 %

... ... ... ... ... ...

12 00.002

∗

1827 1087 14810 90.5 %

... ... ... ... ... ...

16 00.004

∗

1299 1059 15665 95.8 %

... ... ... ... ... ...

24 01.012

∗

481 333 16185 99.0 %

Table 5: Most popular combinations of subjects taken in

consecutive enrollments.

subject 01.005 01.006 01.079 01.001 01.078

00.010 3230 3026 2533 2654 2061

01.001 1915 1711 1473 898 1447

01.079 1597 1434 1048 1166 1775

01.005 1075 1398 1178 1321 886

00.002 915 807 725 767 644

ent combinations (i.e. 435), which is probably a ﬁg-

ure too large for visualization purposes. Therefore, in

the following sections we will use the 95% level for

visualizing relationships between subjects.

2.2 Inter-semester Analysis

In this case we are interested in analyzing the se-

quence of subjects taken by students, that is, what

subjects they enroll into once they know the results

of the previous semester. Usually, when a student

fails to pass a subject, she is more likely to enroll into

such subject again the next semester, altogether with

other new subjects, following (or, unfortunately, not)

the recommendations given by the university (Table

1).

Table 5 shows, for each subject in the ﬁrst

semester (rows), the number of students that take a

speciﬁc subject the second semester (columns), for

the ﬁrst ﬁve most popular subjects each semester.

Only 65.3% of the 16356 students take one of these

25 combinations. In order to represent the 95% of stu-

dents’ enrollments, this table should have at least, 10

rows × 16 columns, which is clearly unwieldy.

3 VISUALIZING ENROLLMENT

DATA

Visualizing large volumes of data is not a trivial prob-

lem, specially when several dimensions are involved

VisualizationofEnrollmentDatausingChordDiagrams

513

Figure 1: Example of chord diagram.

(temporal, relationships among elements, and so). As

stated by (Shneiderman, 1996), it is very important to

ﬁnd the appropriate visual representations of different

types of relationships between data entries. Among

the huge amount of possibilities (see (McCandless,

2009; Yau, 2011)), we have chosen to explore radial

visualizations (Draper et al., 2009) as they are well

suited for describing relationships between hierarchi-

cal data. We will focus in visualizing adjacency edges

(i.e. relationships between subjects), following the

approach described in (Holten, 2006), as well as con-

sidering the e-learning visualization context (G

omez-

Aguilar et al., 2010).

3.1 Chord Diagrams

In the light of the results described in the previous

section, we propose to use chord diagrams for visual-

izing both intra-semester and inter-semester data (not

included due to paper length restrictions). A chord

diagram is composed by segments, namely nodes and

chords. Nodes are arranged radially, drawing thick

curves (i.e. chords) between them. The thickness of

the curve encodes the frequency of a given aspect be-

tween the two connected nodes. In our case, each

node is a subject, the more students take a subject,

the bigger the node is, while chords between nodes

represent the number of students taking both subjects

at the same time.

For building chord diagrams we used D3.js, a

Javascript library for manipulating data (Bostock

et al., 2011). The aspect of a chord diagram is de-

termined by the following variables: the number of

subjects N, a vector of N elements containing sev-

eral attributes for each subject (in our case, num-

ber of students and success rate, which will deter-

mine node size and color respectively), a N × N ad-

jacency matrix containing elements in the form of

i j

= N

i j

+ N

i j

+ N

i j

+ N

i j

where N

i j

is the num-

Figure 2: Relationships between the 10 most popular sub-

jects in the ﬁrst academic semester.

ber of students simultaneously taking subjects i and

j and subscripts describe the 2 × 2 matrix containing

the number of students failing/passing (0/1) each sub-

ject respectively. Due to printing restrictions, chord

diagrams are shown here using gray tones only. In-

teractive chord diagrams use color for increasing the

ability of discovering patterns: success rate [0, 1] is

mapped to a continuous [red, green] interval (contain-

ing yellow). Such interval could be quantized into

three bins [0, a), [a, b) and [b, 1] (a < b) representing

“under average”, “average” and “above average”, re-

spectively. More complex coloring strategies could

be designed as well, specially when visualizing sub-

ject combinations as chords.

3.2 Visualization of Intra-semester Data

Figure 2 shows the generated chord diagram for the

data described in Table 3, for the 10 most popular

subjects of the ﬁrst semester. We have used a = 0.4

and b = 0.6 for quantizing node colors. subjects are

in clockwise order according to the number of stu-

dents taking each subject. On the other hand, Figure

3 reproduces the same visualization for the 16 most

popular subjects of the second semester. In this case

we show only the chords starting from a particular

subject, as an example of the interaction described in

Section 3.4. The complexity of these diagrams with

10 and 16 subjects is quite high, so 90% will be prob-

ably a better level for exploration purposes, including

more subjects dynamically if desired.

3.3 Visualization of Inter-semester Data

In this case we will think of chord diagrams as having

two halves: the left one contains the subjects from the

ﬁrst semester, while the right one contains the sub-

GRAPP2015-InternationalConferenceonComputerGraphicsTheoryandApplications

514

Figure 3: Relationships between the 16 most popular sub-

jects in the second academic semester, showing some data

only for a given subject.

jects from the second one. Obviously some subjects

can be repeated in both halves (i.e. students taking

the same subject again), but no chords will be drawn

between subjects on the same half. Actually, it is like

visualizing a bipartite graph but maintaining the same

metaphor. Segment width and color follow the same

rules described in the previous sections. Chord width

is determined by the number of students taking one

subject from the right half after having taken one from

the left one. Chord color can be determined by the

percentage of students successfully passing both sub-

jects, showing dangerous/suitable subject enrollment

sequences. Figure 4 shows the chord diagram gener-

ated with the data in Table 5. This diagram could also

be created trying to reproduce the institutional rec-

ommendations (Table 1), including one more subject

every semester, in order to see reality compared to the

predetermined programme sequence. Nevertheless,

as some of the most popular subjects in the second

semester are from the ﬁrst one (because students have

not taken them yet or they have but failed to do so),

the number of subjects should be larger in the right

half of the chord diagram, increasing its complexity.

3.4 Adding Interaction

Following Shneiderman’s mantra, “overview ﬁrst,

zoom and ﬁlter, then details on demand” (Shneider-

man, 1996), we can use the possibilities of D3.js to

add some interaction to chord diagrams. While the

basic chord diagram (as shown in Figure 2) would

be the ﬁrst task (“overview”), it is possible to move

forward the second task (“zoom and ﬁlter”) and the

third one (“details on demand”) using the following

approach:

• “zoom and ﬁlter”: if the user moves the cursor

(i.e. the mouse) over any segment, only those

Figure 4: Relationships between the 5 most popular sub-

jects each semester seen as consecutive enrollments.

other segments directly connected to such seg-

ment are visualized, the rest are ﬁltered out by

fading. Therefore, if the user moves the cursor

over a node, only the node and the chords con-

necting such node to other nodes are visible. On

the other hand, if the user moves the cursor over

a chord, only that chord and the two nodes con-

nected by such chord are visible. Zoom can help

users to perform the “ﬁlter” task for small seg-

ments.

• “details on demand”: if the user selects (i.e. by

clicking) any segment, a small window contain-

ing information about such segment is displayed.

If the segment is a node, data about such sub-

ject is displayed, namely its code, name, academic

semester, percentage of students passing it and

percentage of students taking it for second and

further times. If the segment is a chord, the per-

centage of students taking the two subjects simul-

taneously (or consecutively) as well as the 2 × 2

matrix with the pass/fail rates are shown.

4 CONCLUSIONS

Enrollment in educational institutions with ﬂexible

requirements (such as distance and online universi-

ties) does not follow uniform patterns with respect

to the subjects each student enrolls into or with re-

spect to the recommended sequence proposed by the

institution. The concept of cohort (students taking the

same subjects and advancing within a programme at

the same pace) is completely unsuitable. Students get

scattered between semesters, so their only nexus are

subjects taken simultaneously. Therefore, programme

planning becomes a complex issue involving large ta-

VisualizationofEnrollmentDatausingChordDiagrams

515

bles of numbers which are too large to comprehend

by university managers. On the other hand, students

have no support for determining which combinations

of subjects are more suitable, specially in their ﬁrst

academic semester when they have no experience in

what means being an online learner.

In this paper we have described the use of chord

diagrams for visualizing intra-semester enrollment

data, namely the combinations of subjects taken by

students simultaneously. The number of subjects (and

their relationships) visualized as chord diagrams is

determined by a threshold, trying to capture as many

students as possible. We have created diagrams that

include 95% of the students, but they are quite com-

plex, so probably a threshold of 90% is enough for

exploration purposes. Currently now, university man-

agers are using tabular data for detecting programme

bottlenecks; we expect to introduce and evaluate the

use of chord diagrams as a simple way to visualize

such information as part of an internal institutional in-

novation project. These visualizations will be part of

an enrollment support system that will guide students

and their mentors through the large amount of subject

combinations, according to both their personal inter-

ests and background, but taking into account the aca-

demic performance of previous students with similar

enrollment patterns.

Current and future work in this topic should in-

clude the creation of more complex visualizations in-

volving more than two semesters, using concentric

chords or a 3D version, arranging chords in an imag-

inary 3D cone or cylinder. More experimentation for

determining the best number of subjects and the color-

ing scheme is also needed, as well as the information

provided by the interaction with the chord diagram.

ACKNOWLEDGEMENTS

This work is supported by Spanish Ministry of Sci-

ence and Innovation project MAVSEL (ref. TIN2010-

21715-C02-02).

REFERENCES

Bostock, M., Ogievetsky, V., and Heer, J. (2011). D

data-

driven documents. IEEE Transactions on Visualiza-

tion and Computer Graphics, 17(12):2301–2309.

Campbell, J., DeBlois, P., and Oblinger, D. (2007). Aca-

demic analytics: A new tool for a new era of educa-

tional research. EUCAUSE Review, 42(4):40–57.

Card, S. K., Mackinlay, J. D., and Shneiderman, B. (1999).

Readings in information visualization - using vision to

think. Academic Press.

Draper, G. M., Livnat, Y., and Riesenfeld, R. F. (2009).

A survey of radial methods for information visualiza-

tion. IEEE Transactions on Visualization and Com-

puter Graphics, 15(5):759–776.

omez-Aguilar, D. A., Su

arez-Guerrero, C., Theron-

anchez, R., and Garc

ıa-Pe

nalvo, F. (2010). Visual

analytics to support e-learning. In Advances in Learn-

ing Processes.

Grau-Valldosera, J. and Minguill

on, J. (2014). Rethink-

ing dropout in online higher education: The case of

the universitat oberta de catalunya. The International

Review of Research in Open and Distance Learning,

15(1).

Holten, D. (2006). Hierarchical edge bundles: Visualiza-

tion of adjacency relations in hierarchical data. IEEE

Transactions on Visualization and Computer Graph-

ics, 12(5):741–748.

Kiernan, M., Woodroffe, M., and Thomas, P. (2004). Open

24/7: The journey from e-user to e-learner. In Nall, J.

and Robson, R., editors, Proceedings of World Con-

ference on E-Learning in Corporate, Government,

Healthcare, and Higher Education 2004, pages 95–

97, Washington, DC, USA. AACE.

McCandless, D. (2009). Information is beautiful. Collins,

London.

Park, J.-H. and Choi, H. J. (2009). Factors inﬂuencing adult

learners’ decision to drop out or persist in online learn-

ing. Educational Technology & Society, 12(4):207–

217.

Sac

ın, C. V., Chue, J., Peche, J. P., Alvarado, G., Vinatea,

B., Estrella, J., and Ortigosa, A. (2011). A data min-

ing approach to guide students through the enrollment

process based on academic performance. User Model-

ing and User-Adapted Interaction, 21(1-2):217–248.

Shneiderman, B. (1996). The eyes have it: A task by

data type taxonomy for information visualizations. In

Proceedings of the IEEE Symposium on Visual Lan-

guages, pages 336–343.

Siraj, F. and Abdoulha, M. A. (2009). Uncovering hid-

den information within university’s student enroll-

ment data using data mining. In Al-Dabass, D., Tri-

weko, R., Susanto, S., and Abraham, A., editors, Asia

International Conference on Modelling and Simula-

tion, pages 413–418. IEEE Computer Society.

Tyler-Smith, K. (2006). Early attrition among ﬁrst time

elearners: A review of factors that contribute to drop-

out, withdrawal and non-completion rates of adult

learners undertaking elearning programmes. Journal

of Online Learning and Teaching, 2(2).

Yau, N. (2011). Visualize This: The FlowingData Guide to

Design, Visualization, and Statistics. John Wiley &

Sons.

GRAPP2015-InternationalConferenceonComputerGraphicsTheoryandApplications

516