Analyzing the Determinant Characteristics for a Good Performance at

ENADE Brazilian Exam Stratiﬁed by Teaching Modality: Face-to-face

versus Online

Eric Gondran

1 a

, Giancarlo Lucca

1 b

, Rafael Berri

1 c

, Helida Santos

1,2 d

and Eduardo N. Borges

1 e

C3, Universidade Federal do Rio Grande, Av. It

alia km 08, Rio Grande, Brazil

ISC, Universidad Publica de Navarra, Campus Arrosadia, Pamplona, Spain

Keywords:

ENADE, Feature Selection, Educational Data Mining, Data Science.

Abstract:

The National Student Performance Exam (ENADE) annually evaluates different Brazilian higher education

courses. This exam considers both face-to-face and distance learning courses. Distance learning is growing

increasingly, especially during the coronavirus (COVID-19) pandemic. This study applies different techniques

for selecting ENADE 2018 database characteristics, like information gain, gain rate, symmetric uncertainty,

Pearson correlation, and relief F. The objective of the work is to discover which personal and socioeconomic

characteristics are decisive for the student’s performance at ENADE, whether the student is in the context of

Distance Education or face-to-face. It can be concluded, among other results, that: the father’s level of edu-

cation directly inﬂuences performance; the higher the income, the better the performance; and white students

have better performance than black and brown-skinned ones. Thus, the results obtained in this study may

initiate analyzes of public policies towards improving performance at ENADE.

1 INTRODUCTION

Higher education is booming in Brazil, and according

to CES (acronym in Portuguese for Higher Education

Census) (Inep, 2019), from 2009 to 2019, enrollment

in higher education increased by 43,7%. CES con-

stitutes an essential instrument for obtaining data to

generate information that subsidizes public policies’

feeding, monitoring, and evaluation. In 2019, stu-

dents enrolled in higher education reached 8,6 mil-

lion, a growth of 5,4% compared to 2018. This in-

crease is due to distance learning (DL) (Barreto and

Amaral, 2019) which, from 2009 to 2019, increased

378,9 %. DL is a form of education in which learning

occurs at a distance (physical and temporal), mediated

by a technology tool that allows communication and

interaction between participants.

Directly linked to teaching, we have the National

Student Performance Exam (ENADE), an assessment

https://orcid.org/0000-0002-0013-3298

https://orcid.org/0000-0002-3776-0260

https://orcid.org/0000-0002-5125-2756

https://orcid.org/0000-0003-2994-2862

https://orcid.org/0000-0003-1595-7676

that annually analyzes the different Brazilian higher

education courses. This exam considers both on-site

and distance learning courses. The National Institute

of Educational Studies and Research An

ısio Teixeira

(Inep), a federal agency linked to the Ministry of Ed-

ucation (MEC), currently applies and elaborates this

exam. ENADE microdata is generated through the

examination, the minor level of granularity of col-

lected data. They meet the demand for speciﬁc in-

formation by providing tests, templates, information

about items (manual and dictionary), grades, and the

student questionnaire, which contains different infor-

mation regarding the candidate.

Considering the microdata from ENADE, this

work aims to ﬁnd which personal and socioeconomic

characteristics are decisive for the student’s perfor-

mance, whether he/she is a DL or a face-to-face (F2F)

student. To achieve our objective, this work employs a

well-established methodology called Knowledge Dis-

covery in Database (KDD) (Tan et al., 2016). Ap-

plying data mining techniques in the ﬁeld of edu-

cation has shown promising results, giving rise to a

new area of scientiﬁc investigation called Educational

Data Mining (EDM) (Baker et al., 2011).

234

Gondran, E., Lucca, G., Berri, R., Santos, H. and Borges, E.

Analyzing the Determinant Characteristics for a Good Performance at ENADE Brazilian Exam Stratiﬁed by Teaching Modality: Face-to-face versus Online.

DOI: 10.5220/0011070600003179

In Proceedings of the 24th International Conference on Enterprise Information Systems (ICEIS 2022) - Volume 1, pages 234-242

ISBN: 978-989-758-569-2; ISSN: 2184-4992

This article is organized as follows. In Section 2,

we ﬁnd the related works. In Sect. 3, feature selec-

tion techniques are presented. In Sect. 4, the adopted

methodology is described. The results are presented

in Sect. 5, and we have the conclusions in Sect. 6.

2 RELATED WORK

The possibilities for applying EDM techniques in

Brazil are presented by (Baker et al., 2011). The

study demonstrates how this area of research can con-

tribute to a better understanding of teaching and learn-

ing processes and student motivation. (Gottardo et al.,

2012) proposes the deﬁnition of a broad and gener-

alizable set of attributes used to make inferences re-

garding student performance. Experiments performed

showed indexes of 76% accuracy in predicting perfor-

mance. (Romero et al., 2008) highlights the possibil-

ities of using data mining to extract relevant informa-

tion about students in the educational context.

Speciﬁcally, at ENADE, (Ara

ujo et al., 2019) pro-

posed the use of knowledge discovery techniques to

develop a tool for exploring the exam data. In addi-

tion to evaluating the structure and distribution of test

data, they also proposed a model based on the CART

(Breiman et al., 1984) algorithm capable of predicting

student performance. (Faria, 2017) work has as its

primary objective the identiﬁcation of the determin-

ing factors in the performance of students in Business

Administration courses. They used microdata from

ENADE 2012 of the Federal District. The research

has predominantly quantitative analysis characteris-

tics based on descriptive and multivariate statistical

techniques. The multiple regression method was used

to verify whether the student’s characteristics, such as

personal and socioeconomic aspects, the institution,

and the didactic-pedagogical organization, would be

signiﬁcant variables in explaining the test result. The

results showed that the student relevant factors that

aided in predicting performance were: family income,

male gender, mother’s and father’s education level.

Unlike the works exposed above, our proposal

aims to apply different techniques for feature selec-

tion, to point out which personal and socio-economic

aspects are signiﬁcant characteristics in explaining

the result of ENADE 2018.

3 FEATURE SELECTION

Feature Selection (FS) is the process of identifying

and removing irrelevant attributes and redundant in-

formation as much as possible (Miao and Niu, 2016).

FS reduces the dimensionality of the data. It improves

the performance of the classiﬁers, as it eliminates at-

tributes that do not add value to the classiﬁcation or

deteriorate the results. It contributes to a better under-

standing and analysis of the results obtained and al-

lows learning algorithms to operate more quickly and

effectively. FS main objective is to identify the set of

attributes best representing the useful information in

the data (Tasca, 2015), within a context.

3.1 Entropy

Entropy (Shannon, 1948) is deﬁned as a form of mea-

surement or average degree of uncertainty regarding

sources of information, which consequently allows a

quantiﬁcation of the information present that ﬂows in

the system. In simple terms, the concept of entropy is

associated with the idea that the more certain the out-

come of a random experiment, the more information

you get from observing its occurrence.

It can also be deﬁned as the amount of uncer-

tainty in a message, which decreases as the sym-

bols are transmitted, that is, as the message becomes

known, then information is obtained, which can be

seen as uncertainty reduction. Entropy is calculated

by: E(A) = −

∑

a∈A

P(a)log

P(a), where A is the at-

tribute to be calculated, a is the value of this attribute

and P is the relative frequency of values.

3.2 Information Gain

The information gain (IG) (Hall and Smith, 1998) is

deﬁned as the amount of information obtained about

a random variable or signal from the observation of

another random variable. It measures the signiﬁcance

of the attribute with the target class; i.e., it measures

the reduction of uncertainty (entropy) as a division

function. As a disadvantage, it tends to prefer divi-

sions that result in numerous partitions, each one be-

ing small but neat. Information Gain is calculated by:

IG(A) = E(C) −

∑

a∈A

E(a), where E denotes the en-

tropy function, C is the class, A is the attribute to be

evaluated, n

is the number of instances of the cate-

gory belonging to the attribute, n is the total number

of instances and a is the attribute value.

3.3 Gain Ratio

The gain ratio (GR) (Karegowda et al., 2010) was de-

veloped to solve the IG problem. It is the ratio of IG

and the attribute entropy, which is nothing more than

the relative IG as an evaluation criterion. It adjusts

Analyzing the Determinant Characteristics for a Good Performance at ENADE Brazilian Exam Stratiﬁed by Teaching Modality:

Face-to-face versus Online

235

the IG by partitioning entropy, causing high entropy

partitioning (a large number of small partitions) to be

penalized. GR is deﬁned by: GR(A) =

IG(A)

E(A)

, where

IG(A) is information gain and E is the entropy.

3.4 Symmetric Uncertainty

Symmetric uncertainty (SU) (Yu and Liu, 2003) is

a nonlinear correlation measure developed with the

same purpose of GR, that is, an attempt to normalize

the IG of an attribute A with the class C. SU is deﬁned

by: SU(A) = 2 ·

IG(A)

E(A)+E(C)

, where IG(A) is Informa-

tion Gain and E is the entropy.

3.5 Pearson Correlation

Pearson Correlation (PC) (Hall, 1998), also known as

a linear coefﬁcient, measures the degree of correlation

between two metric scale variables. It is a relationship

degree between two quantitative attributes, and it ex-

presses the correlation degree through values between

−1 (negative or inverse correlation) and 1 (positive

linear relationship). A correlation coefﬁcient near

zero indicates no relationship between the attributes.

The PC is given by: PC(A) =

Cov(X,Y )

√

Var(X)∗Var(Y )

, where

Cov is the covariance between the two attributes and

Var is the variance of each attribute. To calculate the

qualitative attributes correlation, data are adapted by

turning them into binary data.

3.6 Relief F

Over the years, a Relief extension called Relief F

(Kononenko, 1994; Kira and Rendell, 1992; Ur-

banowicz et al., 2018) has been developed, aiming to

improve the original algorithm by estimating proba-

bilities more reliably. It handles multiclass and in-

complete datasets, while the complexity remains the

same. It is calculated using a function W deﬁned by:

W (A) = W (A) −

di f f (A,R

,H)

di f f (A,R

,M)

, where A

is the attribute, W(A) is a vector with each attribute

score, R

is the target instance, H is the closest in-

stance of the same class, M is the closest instance to

the other class, m is the number of random instances

selected to be part of the calculation, and the function

di f f calculates the difference between attributes.

4 METHODOLOGY

In this section, we present the methodology used for

this study. We emphasize that the work is supported

by the KDD process, which comprises ﬁve stages.

1. Selection. This work takes into account ENADE

2018 microdata. They have 548,127 instances and

137 attributes of the numeric or character type.

The attributes are divided, among others, into the

institution and course information, student infor-

mation, the number of items in the objective part,

types of presence (participant present, absent or

canceled test), test perception questionnaire, and

student questionnaire. The original database was

divided into online students (96,927 instances)

and F2F students (451,200 instances). After an-

alyzing all database attributes, we focus on the

personal, socioeconomic aspects and participant’s

course. We emphasize that at this point, 23 at-

tributes were kept in each database

2. Preprocessing. The ﬁrst preprocessing opera-

tion was the application of a ﬁlter to select only

those participants who had actually taken the test.

We removed 32,285 participants from the online

modality and 115,765 F2F students. The crite-

ria for removing attributes include absent candi-

dates, candidates with a blank test in the objective

and discursive part of general education, candi-

dates with a blank test in the objective and discur-

sive part of the speciﬁc component, participation

with a result disregarded by the Applicator. The

second step veriﬁed null or incomplete data, in-

cluding blank test notes and the blank part of the

questionnaire. We excluded 15 online cases and

103 F2F. Online databases had 64,627 instances,

and F2F had 335,332.

3. Transformation. The ﬁrst operation was to re-

name the attributes. At this stage, 23 attributes

had names referring to the student’s questionnaire

number (QE I01 to QE In). The nominal values

of the attributes (A, B, etc.) were also renamed,

for example, the father’s level of schooling was

renamed to (None, Elementary 1, Elementary 2,

High school, Undergraduate, Graduate).

The courses were also grouped according to their

primary areas, according to the tables provided by

CNPq and CAPES, Brazilian funding agencies.

ENADE’s exam occurs every three years in a spe-

ciﬁc set of courses. Not all courses took the test in

2018. The scores obtained by the candidates were

also categorized, with their values discretized into

three frequency categories (low, medium, and

high performance), keeping the original distribu-

tion. Discretized online student grades perfor-

mance: Low (≤ 30), Medium (30 < grade ≤ 60)

The original database and the complete list of at-

tributes are available at – https://www.gov.br/inep/pt-

br/acesso-a-informacao/dados-abertos/microdados/enade

ICEIS 2022 - 24th International Conference on Enterprise Information Systems

236

and High (> 60). Discretized grades of face-to-

face students: Low (≤ 31), Medium (31 < grade

≤ 62) and High (> 62).

4. Data Mining. In this step, we apply the ﬁve dis-

tinct methods for FS. We consider the database

(online and F2F), taking into account 23 pre-

selected attributes. In Sect. 3 we present the ap-

plied FS algorithms in detail. For each algorithm,

the most relevant characteristics are selected ac-

cording to the ranking generated by the algo-

rithms. We also perform an exploratory analysis

of the most frequent attributes in all methods.

5. Interpretation and Evaluation. After apply-

ing the ﬁve FS methods and analyzing the rank-

ing generated by the algorithms, the top-10 most

frequent attributes are considered in at least four

of the FS algorithms, both for the students in

online and F2F modality. After choosing the

best attributes, we perform an exploratory anal-

ysis of each attribute. We generate graphics for

each modality and compare online and face-to-

face modality proﬁles. In Sect. 5, we present and

discuss the results of the exploratory analysis.

5 RESULTS

This section presents the results of applying the dif-

ferent FS algorithms.

5.1 Online Modality Results

This subsection presents the results obtained for the

FS techniques in the online modality data. Such re-

sults are available in Table 1, where the lines are the

22 attributes considered and the columns are the dif-

ferent FS methods. Each cell values refer to the re-

sult of the operation and the ranking obtained by the

attribute. The top ten values for each method are

underlined. In addition, we highlight in bold the at-

tributes selected in the ﬁrst ten positions in at least

four of the algorithms used. At the end of the ta-

ble, the column Rank contains the sum of the four

best positions obtained by the different SA methods.

To deﬁne the most relevant characteristics to explain

the online students’ performance at ENADE 2018, we

considered the top-10 most frequent attributes in at

least four of the feature selection algorithms.

Analyzing the gender (Figure 1), most (60.8%)

of the participants are female, and however, they ob-

tained inferior results compared to the males. Analyz-

ing the skin color (Fig. 2), white people are predom-

inant, with 33,845 participants (52.4% of them) hav-

ing the most outstanding high-performance rates (9%)

and the least low-performance rates (25%). Brown-

skinned people is the second-highest rate of partic-

ipants (35.5%), having the worst high-performance

(5%) and one of the highest low-performance indices

(32%). Similarly, we ﬁnd low-performance indices

for black, yellow, and indigenous people. Finally, the

best overall performance came from those who did

not declare their skin color.

5,722

12,068

17,013

25,245

2,598

1,981

5000

10000

15000

20000

25000

30000

Male Female

Low

Medium

High

Figure 1: Relationship between gender and participant per-

formance in the online modality.

8,364

1,429

428

7,237

268

22,575

3,314

937

14,570

107

755

2,906

285

1,165

122

5000

10000

15000

20000

25000

Low

Medium

High

Figure 2: Relationship between skin color and participant

performance in the online modality.

5.2 Face-to-face Modality Results

This subsection presents the results obtained when the

FS techniques are applied in the base that considers

the F2F modality. Such results are available in Table

2, using the same approach seen in Table 1.

The same criterion used in online education was

used to deﬁne the most important characteristics.

Most participants (289,580/86.4%) did not receive

any academic scholarship; 10% and 20% of them

presented high and low performances, respectively.

Those with a scholarship (Fig. 3) show a notable supe-

rior performance compared to the ones without it. We

highlight that scientiﬁc research and PET (acronym

in Portuguese for Tutorial Education Program) partic-

ipants have the most elevated high-performance (32%

and 35%) and the least low-performance (7% and

8%). Analyzing the weekly study time (Fig. 4), the

Analyzing the Determinant Characteristics for a Good Performance at ENADE Brazilian Exam Stratiﬁed by Teaching Modality:

Face-to-face versus Online

237

Table 1: Online modality results obtained with different feature selection methods.

# characteristics PC IG GR SU RF Rank

1 Knowledge Area 1 0.1255 2 0.0404 1 0.0377 1 0.0358 1 0.0415 4

2 Family Income 3 0.0445 1 0.0424 2 0.0177 2 0.0237 2 0.0412 7

3 Gender 2 0.0510 5 0.0108 4 0.0112 4 0.0100 13 0.0102 15

4 High School Education 6 0.0390 3 0.0139 3 0.0131 3 0.0124 11 0.0120 15

5 Scholarship/Funding 4 0.0436 7 0.0099 6 0.0050 6 0.0063 3 0.0229 20

6 Father’s Level of Education 17 0.0163 4 0.0113 5 0.0051 5 0.0067 6 0.0190 20

7 Chosen Course 13 0.0229 6

0.0102 7 0.0042 7 0.0057 4 0.0228 24

8 Skin Color 7 0.0373 10 0.0063 10 0.0040 8 0.0046 10 0.0128 35

9 Mother’s Level of Education 18 0.0128 8 0.0070 13 0.0031 9 0.0041 9 0.0173 39

10 Weekly Study Time 9 0.0333 11 0.0058 14 0.0031 12 0.0038 7 0.0189 39

11 Age 8 0.0369 12 0.0057 12 0.0034 10 0.0040 20 0.0029 42

12 Financial Status 15 0.0197 9 0.0069 16 0.0029 11 0.0039 8 0.0189 44

13 High School Modality 12 0.0244 13 0.0046 11 0.0037 13 0.0038 12 0.0111 48

14 Family’s Undergraduate Degree 14 0.0197 15 0.0040 8 0.0042 14 0.0037 14 0.0101 50

15 Work Status 10 0.0314 14 0.0046 15 0.0029 16 0.0033 16 0.0083 55

16 Marital Status 5 0.0402 18 0.0029 17 0.0019 17 0.0021 17 0.0081 56

17 People Living in your Household 19 0.0123 16 0.0039 20 0.0015 18 0.0021 5 0.0211 58

18 Social Inclusion Program 16 0.0163 17 0.0038 9 0.0041 15 0.0036 15 0.0083 60

19 Household Location and People in it 11 0.0310 19 0.0024 19 0.0017 19 0.0019 19 0.0065 68

20 Chosen Education Institution 21 0.0058 20 0.0016 22 0.0008 20 0.0010 18 0.0079 79

21 Academic Scholarship 20 0.0074 21 0.0004 21 0.0012 21 0.0005 21 0.0021 83

22 Student ﬁnancial aid 22 0.0020 22 0.0001 18 0.0017 22 0.0002 22 0.0001 84

Table 2: Face-to-face modality results obtained with different feature selection methods.

# characteristics PC IG GR SU RF rank

1 Scholarship/Funding 5 0.0272 1 0.0337 2 0.0124 1 0.0173 1 0.0482 5

2 Academic scholarship 2 0.0431 3 0.0166 1 0.0206 2 0.0168 18 0.0080 8

3 High School Education 3 0.0360 4 0.0150 3 0.0111 3 0.0119 13 0.0130 13

4 Family Income 11 0.0196 2 0.0168 5 0.0065 5 0.0090 2 0.0360 14

5 Knowledge Area 1 0.0561 6 0.0124 4 0.0106 4 0.0106 6 0.0246 15

6 Chosen Course 10 0.0205 5 0.0130 6 0.0056 6 0.0075 3 0.0327 20

7 Weekly Study Time 8 0.0218 9 0.0104 7 0.0053 9 0.0066 5 0.0254 29

8 Father’s Level of Education 15 0.0149 7 0.0123 8 0.0052 7 0.0069 8 0.0229 30

9 Mother’s Level of Education 14 0.0155 8 0.0121 9 0.0051 8 0.0068 7 0.0231 32

10 Social Inclusion Program 9 0.0215 11 0.0063 11 0.0050 10 0.0051 11 0.0173 41

11 Family’s Undergraduate Degree 6 0.0229 14 0.0045 10 0.0050 11 0.0043 16 0.0090 41

12 Age 4 0.0297 10 0.0080 16 0.0030 12 0.0041 21 0.0017 42

13 Work Status 7 0.0219 12 0.0058 15 0.0030 14 0.0038 10 0.0174 43

14 Household Location and People in it 16 0.0138 13 0.0053 13 0.0035 13 0.0039 15 0.0093 54

15 Skin Color 17 0.0136 15 0.0040 18 0.0025 15 0.0029 12 0.0158 59

16 Financial Status 20 0.0108 16 0.0037 20 0.0015 19 0.0021 9 0.0229 64

17 Marital Status 13 0.0161 18 0.0030 17 0.0029 16 0.0027 20 0.0054 64

18 Chosen Education Institution 12 0.0182 17 0.0037 19 0.0018 18 0.0023 17 0.0088 64

19 People Living in your household 21 0.0077 19 0.0030 22 0.0011 21 0.0016 4 0.0269 65

20 High School Modality 22 0.0064 20 0.0026 14 0.0031 17 0.0026 19 0.0067 70

21 Student ﬁnancial aid 19 0.0124 21 0.0015 12 0.0044 20 0.0019 22 0.0014 72

22 Gender 18 0.0135 22 0.0014 21 0.0015 22 0.0013 14 0.0112 75

majority (46.3%) studies from 1 to 3 hours a week,

with a high performance of 8% and a low perfor-

mance of 22%. Clearly, when the number of weekly

study hours increases, high-performance increases

and low-performance decreases. Considering the

mother’s education level (Fig. 5), those whose moth-

ers have no education or have completed just elemen-

tary school 1 and 2 have the worst high-performance

rates and the highest low-performance rates. Mothers

with undergraduate and graduate degree studies imply

better performance on student rates.

58,541

608

531

877

4,926

203,274

5,456

3,251

4,758

492

15,852

27,765

2,919

1,510

1,789

304

2,414

50000

100000

150000

200000

250000

Low

Medium

High

Figure 3: Relation of the type of academic scholarship and

the participant performance in the face-to-face modality.

ICEIS 2022 - 24th International Conference on Enterprise Information Systems

238

6,389

34,681

15,908

4,757

3,813

18,177

107,710

65,842

22,927

18,427

2,605

12,872

11,270

5,029

4,925

20000

40000

60000

80000

100000

120000

None Between 1-3

hours

Between 4-7

Hours

Between 8-12

Hours

Over 12 Hours

Low

Medium

High

Figure 4: Relation of weekly study hours and participant

performance in the face-to-face modality.

4,304

15,628

10,342

22,549

8,503

4,222

12,363

46,453

35,569

81,407

36,695

20,596

1,598

4,585

4,240

12,118

8,385

5,775

10000

20000

30000

40000

50000

60000

70000

80000

90000

Low

Medium

High

Figure 5: Relationship of mother’s level of education and

participant’s performance in the face-to-face modality.

5.3 Comparing Results between Online

and Face-to-face Modalities

In this section, we sought to compare the characteris-

tics in common between online and F2F participants,

in which we seek to understand the difference be-

tween these two proﬁles. Comparisons are shown in

Figures 6 to 11. For each ﬁgure (attribute), we will

perform an exploratory analysis.

Analyzing Fig. 6, we can see a slight growth of

applied social sciences, going from 66.5% in online

modality to 74.9% in F2F. There is a better distribu-

tion in the F2F data in the other categories. Regard-

ing performance, there is an improvement in the F2F

students in applied social sciences. Among all cat-

egories in the online modality, human sciences have

the best overall performance with rates of 21% (high)

and 11% (low). In the F2F modality, the highlight is

Linguistics, Languages, and Arts with an efﬁciency of

18% (high) and 8% (low).

Observing the students who receive scholarships

or funding grants (Fig. 7), there is an increase in the

number of students in free courses and FIES (acronym

in Portuguese for Finance Fund for Higher Educa-

tion Students), in F2F modality. Better overall perfor-

mance can be seen in all on-site students, highlighting

free courses and fully funded ProUni students, who

rose from 10%, 6% to 21%, 17% respectively. Sadly,

14,500

175

3,104

26,641

1,096

150

14,371

1,861

336

2,346

5000

10000

15000

20000

25000

30000

Applied Social

Sciences

Human Sciences Linguistics,

Languages and Arts

Others

Low

Medium

High

(a) Online Modality

55,311

4,573

825

4,839

171,188

29,121

7,743

25,031

24,648

6,636

1,904

3,513

20000

40000

60000

80000

100000

120000

140000

160000

180000

Applied Social

Sciences

Human Sciences Linguistics,

Languages and Arts

Others

Low

Medium

High

(b) Face-To-Face Modality

Figure 6: Relation between knowledge area and partici-

pant’s online and face-to-face modality performance.

1,090

11,847

1,228

184

693

495

820

1,138

195

3,150

25,673

4,685

668

147

1,120

823

2,359

3,370

216

474

2,877

387

285

386

5000

10000

15000

20000

25000

30000

Low

Medium

High

(a) Online Modality

7,197

21,912

2,459

1,260

18,666

791

2,281

1,441

1,897

7,123

521

38,392

70,443

22,749

6,155

50,591

3,355

6,028

3,688

6,897

23,700

1,085

12,420

9,611

5,110

700

3,796

365

451

288

678

3,165

117

10000

20000

30000

40000

50000

60000

70000

80000

Low

Medium

High

(b) Face-To-Face Modality

Figure 7: Relation between scholarship/funding type and

participant’s online and face-to-face modality performance.

those with some scholarship or funding grants are the

worst performers in both modalities.

Considering the school type attended in high

Analyzing the Determinant Characteristics for a Good Performance at ENADE Brazilian Exam Stratiﬁed by Teaching Modality:

Face-to-face versus Online

239

14,987

1,312

1,013

459

33,047

5,045

2,534

1,558

2,857

1,201

274

223

5000

10000

15000

20000

25000

30000

35000

Public Private Overseas Mostly

Public

Mostly

Private

Partialy

Overseas

Low

Medium

High

(a) Online Modality

46,173

12,574

3,694

2,887

151

151,158

61,494

230

10,221

9,258

722

18,114

15,873

1,078

1,333

246

20000

40000

60000

80000

100000

120000

140000

160000

Public Private Overseas Mostly

Public

Mostly

Private

Partialy

Overseas

Low

Medium

High

(b) Face-To-Face Modality

Figure 8: Relation between participants’ high school type

and their performance in online and F2F modalities.

school (Fig. 8), we see most students are from pub-

lic schools. It is also noticeable that the proportion of

face-to-face students from private schools (26.8%) is

twice as much compared to online students, 11.7%.

Students’ performance from private schools presents

the best results for high and low performance.

Regarding the parents’ level of education (Fig. 9),

in online modality, elementary school 1 is the major-

ity with 39.4%, whereas, in F2F modality, high school

predominates with 32.6%. There is an increase in

the F2F modality for those with a father having an

undergraduate and graduate degree education. It is

seen that as the level of education increases, the high-

performance index also increases.

The course choice by job market inclusion is

mostly seen in online and F2F modalities, as seen in

Fig. 10. Focusing on F2F participants’ performance,

we notice an increase in performance compared to on-

line students. We note that those who opt for an online

course have the best results in this same modality.

Analyzing the family income (Fig. 11), most stu-

dents have an income lower than 4.5 minimum wages

(mw). There is an increase in F2F students with a

family income above 10 mw. In performance, we ob-

serve F2F students with an income less than 6 mw

increase high performance. In both modalities, as in-

come increases, the high performance also increases.

2,263

7,460

2,673

4,122

1,058

214

4,279

16,714

6,700

10,555

3,179

831

301

1,282

665

1,421

653

257

2000

4000

6000

8000

10000

12000

14000

16000

18000

Low

Medium

High

(a) Online Modality

5,576

18,408

10,417

20,802

8,179

2,166

16,991

56,384

35,915

76,565

34,784

12,444

2,136

5,804

4,556

11,997

8,127

4,081

10000

20000

30000

40000

50000

60000

70000

80000

90000

Low

Medium

High

(b) Face-To-Face Modality

Figure 9: Relation between parents’ level of education and

participant’s online and face-to-face modality performance.

5,844

864

4,339

172

2,770

1,275

2,458

12,594

1,128

12,143

220

6,282

4,234

198

5,459

964

1,452

744

750

587

2000

4000

6000

8000

10000

12000

14000

Low

Medium

High

(a) Online Modality

22,155

6,246

11,936

1,151

13,730

324

9,984

73,009

15,781

36,053

3,370

66,738

1,448

36,603

8,391

1,845

3,989

508

15,073

384

6,501

10000

20000

30000

40000

50000

60000

70000

80000

Low

Medium

High

(b) Face-To-Face Modality

Figure 10: Relation between reason for choosing the course

and participant’s online and F2F modality performance.

ICEIS 2022 - 24th International Conference on Enterprise Information Systems

240

4,759

6,205

3,948

1,482

1,039

321

7,305

12,815

10,637

4,991

4,522

1,818

170

356

732

1,012

711

974

713

2000

4000

6000

8000

10000

12000

14000

Up to 1.5

Minimum

Wage

From 1.5

to 3

Minimum

Wage

From 3 to

4.5

Minimum

Wage

From 4.5

to 6

Minimum

Wage

From 6 to

Minimum

Wage

From 10 to

Minimum

Wage

Over 30

Minimum

Wage

Low

Medium

High

(a) Online Modality

15,411

19,739

14,283

6,371

5,712

3,145

887

43,200

62,329

51,408

26,638

27,313

17,948

4,247

4,979

7,017

6,914

4,500

6,104

5,700

1,487

10000

20000

30000

40000

50000

60000

70000

Up to 1.5

Minimum

Wage

From 1.5

to 3

Minimum

Wage

From 3 to

4.5

Minimum

Wage

From 4.5

to 6

Minimum

Wage

From 6 to

Minimum

Wage

From 10 to

Minimum

Wage

Over 30

Minimum

Wage

Low

Medium

High

(b) Face-To-Face Modality

Figure 11: Relation between family income and partici-

pant’s online and F2F modality performance.

6 CONCLUSIONS

In this paper, we applied educational data mining in

ENADE 2018 data set to ﬁnd the main characteris-

tics related to performance, both in distance learning

and F2F modality. We used ﬁve different algorithms

for selecting attributes to 23 pre-selected personal and

socioeconomic characteristics. The essential charac-

teristics selected in both modalities were: knowledge

area, family income, public or private high school,

scholarships and funding, father’s level of education,

and reason for choosing the course. Gender and skin

color were also important for online modality. And

for the F2F modality, the type of academic scholar-

ship, weekly study time, and the mother’s schooling.

Parents’ education directly inﬂuenced the results,

and the higher the level of education, the better the

performance. Public universities were protagonists,

as students who paid no fees performed better than

those in private institutions, even if ﬁnanced by gov-

ernment programs or scholarships from the institu-

tions. We see a direct and proportional relationship

between family income and student performance. The

higher the income, the better the high-performance in-

dex and the lower the low-performance index. Partic-

ipants who attended (fully or partially) private high

schools have an advantage over those who attended

public schools. Finally, we conclude that analyzing

the factors that inﬂuence the performance of under-

graduate students signiﬁcantly contributes to a bet-

ter understanding of the national education panorama.

Besides, this study can help authorities make deci-

sions and propose new public policies concerning

Higher Education.

ACKNOWLEDGEMENTS

This study was supported by CNPq (305805/2021-5)

and PNPD/CAPES (464880/2019-00).

REFERENCES

Ara

ujo, R. A. et al. (2019). An

alise dos microdados do

enade: Proposta de uma ferramenta de explorac¸

ao uti-

lizando minerac¸

ao de dados. Master’s thesis, Univer-

sidade Federal de Goi

as.

Baker, R., Isotani, S., and Carvalho, A. (2011). Minerac¸

de dados educacionais: Oportunidades para o brasil.

Revista Brasileira de Inform

atica na Educac¸

ao,

19(02):03.

Barreto, L. S. and Amaral, S. C. S. (2019). An

alise da

expans

ao do ensino superior a dist

ancia no contexto

brasileiro a partir do censo da educac¸

ao superior de

2016. InterSciencePlace, 13(3).

Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984).

Classiﬁcation and Regression Trees. Chapman and

Hall (Wadsworth and Inc.).

Faria, E. d. O. (2017). Uma an

alise dos fatores determi-

nantes do desempenho dos alunos dos cursos superi-

ores em administrac¸

ao do distrito federal. Master’s

thesis, Universidade de Bas

ılia.

Gottardo, E., Kaestner, C., and Noronha, R. V. (2012).

Avaliac¸

ao de desempenho de estudantes em cursos de

educac¸

ao a dist

ancia utilizando minerac¸

ao de dados.

In Workshop de Desaﬁos da Computac¸

ao Aplicada

Educac¸

ao, pages 30–39.

Hall, M. A. (1998). Correlation-based Feature Subset Se-

lection for Machine Learning. PhD thesis, University

of Waikato, Hamilton, New Zealand.

Hall, M. A. and Smith, L. A. (1998). Practical feature subset

selection for machine learning. In Australasian Com-

puter Science Conference ACSC’98, volume 20, pages

181–191, Berlin. Springer.

Inep (2019). Censo da educac¸

ao superior. Access on: Oc-

tober, 2020.

Karegowda, A. G., Manjunath, A., and Jayaram, M. (2010).

Comparative study of attribute selection using gain ra-

tio and correlation based feature selection. Int. J. In-

formation Technology and Knowledge Management,

2(2):271–277.

Kira, K. and Rendell, L. A. (1992). A practical approach

to feature selection. In Sleeman, D. H. and Edwards,

P., editors, 9th Int. Workshop on Machine Learning,

pages 249–256. Morgan Kaufmann.

Analyzing the Determinant Characteristics for a Good Performance at ENADE Brazilian Exam Stratiﬁed by Teaching Modality:

Face-to-face versus Online

241

Kononenko, I. (1994). Estimating attributes: Analysis and

extensions of relief. In Machine Learning: ECML-94,

pages 171–182, Berlin, Heidelberg. Springer Berlin

Heidelberg.

Miao, J. and Niu, L. (2016). A survey on feature selection.

Procedia Computer Science, 91:919–926.

Romero, C., Ventura, S., and Garc

ıa, E. (2008). Data

mining in course management systems: Moodle

case study and tutorial. Computers & Education,

51(1):368–384.

Shannon, C. E. (1948). A mathematical theory of communi-

cation. The Bell System Technical Journal, 27(3):379–

423.

Tan, P.-N., Steinbach, M., and Kumar, V. (2016). Introduc-

tion to data mining. Pearson Education India.

Tasca, M. (2015). Contribuic¸

oes ao Problema de Selec¸

de Atributos. PhD thesis, Universidade Federal Flu-

minense, Niter

oi, Rio de Janeiro.

Urbanowicz, R. J., Meeker, M., La Cava, W., Olson, R. S.,

and Moore, J. H. (2018). Relief-based feature selec-

tion: Introduction and review. Journal of biomedical

informatics, 85:189–203.

Yu, L. and Liu, H. (2003). Feature selection for high-

dimensional data: A fast correlation-based ﬁlter so-

lution. In Proc. Twentieth Int. Conference on Machine

Learning, pages 856–863. AAAI Press.

ICEIS 2022 - 24th International Conference on Enterprise Information Systems

242