2 METHODS
2.1 The Student Career Tracking Data
Warehouse Architecture
Building a data warehouse has always been an
important decision for every enterprise and
organization, including universities. The most critical
decision in its design is finding the right way to
follow to build it, whether a Bill Inmon Top-down
approach which advocates that a global data
warehouse is constructed first and serves as a basis
for small data marts (Inmon, 2005). Or to follow a
Bottom up data warehouse design approach
recommended by Ralph Kimball (Kimball and Ross,
2013), based on building data marts first to provide
the reporting and analytics capability for specific
business processes and then compounding them to
make a data warehouse also named dimensional
modelling (Kimball and Ross, 2016).
The university is composed of faculty, institutions
and departments that operate separately and
independently. This is why a bottom-up Kimball’s
approach is recommended for implementing our data
warehouse (Vogelgesang and Appelrath, 2016).
We will begin first by gathering the primary data
sources that will connect to an ETL (Extract
Transform Load) tool to clean, unify and load data in
data marts that answers some specific questions and
through a bus architecture we will chain them up then
build the data warehouse.
In Figure 1 below, we explain the process of
extracting data from our primary data sources, which
are the university database that is connected to the
data collection platform the university web
application that collects data from students and
graduates. StudentDB a database that contains all
students path before university data such as personal
data, high school data, family data, etc, and a bunch
of files containing data about the university, different
study programs, which are subject to change and
delivered every year from the Moroccan minister of
research and higher education.
Through an ETL process, we clean and unify our
data to load it in the data marts and extract reports.
We decided to gather the most important
questions that we need to understand the pattern of
our graduates then collected a bunch of inquiries like
the following:
Q 1: How many graduates by faculty and by
degree or specialty?
Q 2: What is the average mark of our graduates
by the institute and by degree?
Q 3: How many graduates are hired within the
first six months after graduation?
Q 4: Which are the most hiring sectors of our
graduates?
Q 5: What is the average time for our graduates to
find a Job?
Q 6: How many graduates used to work while
studying?
Based on these questions, we started designing
our data marts, extracted three principal data marts
based on three main events: enrollment, graduation,
and hiring (Rahman et al., 2015):
Figure 1. Data extraction, transformation, and loading in
Bottom-up approach
2.2 Design of Data Marts
2.2.1 Defining the Scope of Data Marts
Enrolment Data Mart
The first data mart is the enrollment data mart, which
provides access to meaningful data that is specific to
the student registration phase.
This data mart will provide answers to some specific
questions such as:
- How many students enrolled in institute X and
degree YY?
- Where our students are coming from?
- How many foreign students per institute and
degree?
- What is the popularity of each degree?
- How many foreign students overall?
Logical design of enrollment data mart
A logical design is a conceptual design, which is
highly abstracted from the physical layer, and it is
called dimensional modelling in data mart design and
Ralph Kimball first introduced this concept in
(Kimball and Ross, 2013). We begin by defining a
central fact table that models an event which can be a
single transaction such as enrollment by a student, a
periodic time where a snapshot of events are collected
such as registered students in spring session or a