Addressing Educational Disparities: Assessing the Gap for Indigenous
Community
Shafaq Khan, Viutika Rathod, Abhirup Ranjan, Anika Anjum Una and Neel Manish Pandya
School of Computer Science, University of Windsor, Ontario, Canada
Keywords:
Indigenous Knowledge, Indigenous Perspectives, Educational Disparities, Indigenous Students, Database.
Abstract:
This research paper explores the integration of Indigenous knowledge and perspectives in education to ad-
dress the educational disparities faced by Indigenous students in Canada [Change.org, 2023]. It proposes a
management system utilizing a database and logistic regression model to predict student dropout rates based
on key factors such as Cultural Identity, Gender, Government Funding, and more [Government of Canada,
2021]. The logistic regression model achieved an accuracy rate of 0.831 on the testing dataset and provided
valuable insights into the factors influencing dropout rates among Indigenous students. The paper emphasizes
the importance of cultural sensitivity, ethical considerations, and collaboration with Indigenous communities
throughout the research process. While logistic regression offers interpretability and simplicity, future work
may explore the use of other machine learning models and qualitative data to enhance accuracy and gain
deeper insights. The goal is to promote educational equity and inclusivity while respecting Indigenous knowl-
edge and aspirations in Canadian education. This will help the local and federal authorities in determining
and early forecasting that the student might drop out and accordingly the authorities can take action. Also the
data and results from different regions can be help in determining how they are performing and if their action
planning can be implemented in the other region as well to reduce the drop out ratio. This will help in the
overall development of the indigenous community.
1 INTRODUCTION
The Indigenous communities in Canada experience
significantly higher dropout rates compared to non-
Indigenous communities in this part of the world, at-
tributed to a range of factors such as historical and
inter-generational impacts, financial inequities, so-
cial marginalization, lack of support systems, and
geographic barriers (Government of Ontario, 2017).
These factors restrict access to quality education for
Indigenous students, resulting in lower educational
achievements compared to the general population of
Canada (CFS Ontario, 2021).
Outline of the problem of educational disparities
faced by Indigenous students in Canada, there are
multiple factors such as the legacy of colonialism,
residential schools, racism, and insufficient funding
hinder access to post-secondary education, which is
recognized as a treaty right for Indigenous students.
Funding inadequacy and access difficulties contribute
to a significant disparity in educational opportuni-
ties (Brown, 2023). Also, the lack of necessary in-
frastructure has hindered the distribution of cultur-
ally relevant educational curricula to Indigenous com-
munities. Teaching practices in non-Indigenous in-
stitutions need refinement, focusing on incorporating
Indigenous history, cultures, and perspectives, and
addressing racism and marginalization (Government
of Canada, Interagency Advisory Panel on Research
Ethics, 2019).
Our work in this research paper is motivated by
a deep understanding of past injustices and current
hardships faced by Indigenous communities. We aim
to address the lack of cultural responsiveness in the
mainstream educational system and promote justice,
fairness, and reconciliation (Weston, 2019).Our goal
is to assist the government authorities of Canada in
understanding the dropout rates of Indigenous stu-
dents so that they can devise appropriate strategies for
their advancement and development. By embracing
empathy and recognizing the generational effects of
policies like the residential school system, we seek to
restore pride, dignity, and self-respect among Indige-
nous children (Kim, 2019). Through cultural respon-
siveness, we aim to foster respect, understanding, and
inclusive education for all students.
Our objective here is to bridge the educational gap
between Indigenous and non-Indigenous populations
in Canada by addressing issues such as insufficient
funding, cultural disconnection, discrimination, and
Khan, S., Rathod, V., Ranjan, A., Una, A. and Pandya, N.
Addressing Educational Disparities: Assessing the Gap for Indigenous Community.
DOI: 10.5220/0012253600003693
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 16th International Conference on Computer Supported Education (CSEDU 2024) - Volume 2, pages 453-460
ISBN: 978-989-758-697-2; ISSN: 2184-5026
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
453
lack of support systems for Indigenous communities.
We aim to assess and analyze the quality of educa-
tion for Indigenous students, promote evidence-based
decision making, and ensure educational equity. In-
tegrating Indigenous knowledge systems, languages,
and histories into the curriculum is a crucial focus,
empowering Indigenous students and improving the
educational experience for all.
To achieve our goals, we propose a model that
utilizes a database to address educational disparities
faced by Indigenous students. Also using the Logis-
tic Model we will capture unique information such as
cultural identity, language proficiency, community in-
volvement, gender, and other relevant details, while
adhering to Indigenous data governance principles.
This will assist us in determining whether a particular
person may drop out, and if so, it will enable con-
cerned local or governmental bodies to take appro-
priate action. It will also enable comprehensive data
integration from various sources, evaluating the suc-
cess of educational initiatives and facilitating timely
support and prevention of widening educational gaps.
There is a lack of published papers addressing our
identified issue, making our research work unique.
Through our comprehensive model and data-driven
approach, we aim to contribute to the development
of effective strategies and policies that empower In-
digenous students and bridge the educational gap in
Canada.
2 BACKGROUND STUDY
The paper by Wu, J., Lin, S., Kong, H., and Shi, H.
(2019) titled ”The Combination Forecasting Model of
Telecommunication User Tricking Account Overdraft
Limit Based on Logistic Regression and SVM ex-
plores the role of educators, particularly those teach-
ing aspiring conservation practitioners, in responding
to the Truth and Reconciliation Commission (TRC)
and the National Inquiry into Missing and Murdered
Indigenous Women and Girls (MMIWG). The em-
phasis is on the significance of reconciliation and eth-
ical engagement with Indigenous Peoples through a
revolutionary approach to teaching indigenous knowl-
edge. The focus is on fostering understanding, empa-
thy, and respect for Indigenous knowledge and aspi-
rations while equipping students with critical analysis
skills for ethical engagement. The paper calls for uni-
versities to ”Indigenize” their approaches, embracing
anti-racism, humility, reciprocity, and confronting on-
going colonialism and white supremacy. The goal is
to create a learning environment that respects and val-
ues Indigenous scholars, knowledge,and voices, fos-
tering re-conciliatory relationships between conser-
vation practitioners and Indigenous Peoples. Efforts
should go beyond course content and focus on build-
ing an anti-racist, anti-oppressive campus culture that
centers Indigenous perspectives and enables Indige-
nous intellectual expression. The paper advocates for
hiring more Indigenous faculty members and center-
ing Indigenous Peoples as experts about themselves
in the curriculum. The goal is to prepare students
to engage with Indigenous Peoples in a just and af-
firming manner while respecting Indigenous knowl-
edge and aspirations. The possible difficulties of
successfully integrating Indigenous knowledge within
the current curriculum and ensuring cultural sensitiv-
ity and authenticity in its application might, however,
be a drawback of this strategy (Wu et al., 2019).
In this study by Li, J., Brar, A., and Roihan, N.
(2021) “The Use of Digital Technology to Enhance
Language and Literacy Skills for Indigenous People:
A Systematic Literature Review,”the focus is on the
use of digital technologies to support Indigenous peo-
ple’s language and literacy learning, particularly in
English. Here, the emphasis is on addressing the neg-
ative inter-generational impacts of colonization and
socioeconomic stress on Indigenous academic perfor-
mance. The systematic review of 25 empirical stud-
ies provides insights into the efficacy of digital tech-
nology in supporting Indigenous learners. While the
studies demonstrate positive outcomes, there are limi-
tations, such as the lack of rigorous research methods
and comprehensive reporting. To improve the effec-
tiveness of digital technology-based interventions, fu-
ture research should consider culturally relevant mul-
tiliteracies frameworks, engage in longitudinal stud-
ies to track students’ progress and incorporate more
Indigenous cultural elements. Additionally, there is a
need for data coding schemes to capture the nuances
of Indigenous language and literacy learning. The
research emphasizes the importance of culturally re-
sponsive practices, partnership with Indigenous com-
munities, and addressing the unique contextual fac-
tors affecting Indigenous education. The reliance on
digital infrastructure and access to technology, how-
ever, may be a weakness of their strategy or approach
and present difficulties in isolated or disadvantaged
Indigenous communities where dependable internet
connectivity may be scarce (Li et al., 2021).
Exploring the NOW (Northern Oral Language and
Writing) Play project, Stagg Peterson, S., and Dwyer,
B. (2016) in their paper titled ”Research in Canada’s
Northern Rural and Indigenous Communities: Sup-
porting Young Children’s Oral Language and Writ-
ing” shed light on efforts to enhance oral language
and writing skills among young children in north-
CSEDU 2024 - 16th International Conference on Computer Supported Education
454
ern rural and Indigenous communities. The research
emerged from the concerns raised by kindergarten
teachers in northern Canadian communities about stu-
dents’ limited language abilities upon entering school.
The project involved collaborating with public school
divisions in small rural communities and local educa-
tion authorities in Indigenous regions of four Cana-
dian provinces. Focusing on the assessment and sup-
port of young children’s oral language and writing
development in play contexts. These communities
face challenges related to geographic isolation, lim-
ited teaching resources, and a lack of opportunities for
teachers’ professional learning. The project empha-
sized the use of play contexts to enhance children’s
language and literacy skills. Additionally, the project
aimed to develop a culturally relevant oral language
assessment tool. This tool was designed to capture
the diverse ways in which children use language for
various social purposes during play and small-group
academic activities. Teachers and researchers collab-
oratively analyzed video recordings and transcripts
of children’s play to identify different language uses
and track individual children’s language development
(Stagg Peterson and Dwyer, 2016). However, the
project also acknowledges challenges such as parental
resistance to play based approaches and the necessity
for cultural sensitivity in research methods. Careful
navigation of these obstacles is crucial to ensure the
project’s ultimate success and positive impact.
The paper, ”Educational Equity in Canada: The
Case of Ontario’s Strategies and Actions to Ad-
vance Excellence and Equity for Students” (Camp-
bell, 2020) by Carol Campbell, delves into the edu-
cational equity initiatives in Ontario, Canada, focus-
ing on two main strands: the Literacy and Numer-
acy Strategy and the Student Success/Learning to 18
Strategy. With a commitment to multiculturalism and
diverse outcomes, the paper discusses policies tar-
geting gender, language learners, special education
needs, and Indigenous students within the context of
two specific policy strands. While the literature re-
view provides a comprehensive historical perspective,
it could benefit from a more critical analysis of the
strategies’ effectiveness, including a nuanced explo-
ration of Indigenous issues and a discussion of poten-
tial limitations. The paper considers students across
K-12 levels but could be strengthened by specifying
the age groups within this range.
Moules et al.s (2014)(Schaub, 2020) research
delves into the educational disparity between Indige-
nous and non-Indigenous students in Canada, em-
ploying hermeneutics for an in-depth understanding.
Referring to it as an ”educating gap, the study em-
phasizes the potential for actionable solutions. Ex-
amining historical issues in Indigenous education, the
paper advocates for meaningful change while recog-
nizing leadership challenges. The investigation ex-
plores the gap from various perspectives, including
quantitative data, personal narratives, and perceptual
insights. Despite the strengths in its approach, the
study could strengthen its discussion by explicitly ad-
dressing limitations and providing a detailed analysis
of the term ”deficit model” in education. The authors
acknowledge constraints such as time, financial re-
sources, personal boundaries, and regional focus in
coastal British Columbia, expressing confidence in
the broader applicability of their findings.
The studies by Wu et al. (2019), Li et al. (2021),
Stagg Peterson and Dwyer (2016),Campbell (2020)
and Moules et al’s(2014) collectively focus on the in-
tegration of Indigenous knowledge across various ed-
ucational contexts. While acknowledging challenges
and limitations in implementing this integration goal,
these papers emphasize the importance of reconcil-
iatory relationships, culturally responsive practices,
and addressing unique contextual factors affecting In-
digenous education. Notably, it should be highlighted
that the background studies, specifically concentrate
on younger students, offering insights into enhancing
language and literacy skills among children in north-
ern rural and Indigenous communities.
3 PROPOSED MODEL
In this section, we propose a predictive model us-
ing Logistic Regression, a binary classification algo-
rithm. The model predicts whether a student will drop
out (1) or not (0) by using the key factors that have
been identified as input variables. Its interpretable re-
sults allow for a clear understanding of the impact of
predictor variables on dropout rates, helping to iden-
tify key factors contributing to the disparities (Das,
2021). Moreover, logistic regression provides prob-
ability estimates, prioritizing higher-risk students for
support systems. With its low complexity, minimal
data preprocessing requirements, and ease of imple-
mentation, logistic regression offers a practical and
efficient solution for handling the data management
system. While it may not capture complex relation-
ships as effectively as advanced algorithms, combin-
ing it with other techniques through ensembling could
further enhance its predictive performance and lead to
more comprehensive strategies for bridging the edu-
cational gap among Indigenous communities.
The proposed model takes a systematic approach
to predicting the likelihood of dropout among Indige-
nous students (as shown in Figure 1). It starts by
Addressing Educational Disparities: Assessing the Gap for Indigenous Community
455
compiling pertinent data from various sources, such
as governmental agencies and academic institutions
(Bisong, 2019).
Figure 1: Workflow diagram.
Key traits like cultural identity, language profi-
ciency, and educational level are chosen after the data
has been preprocessed. Following that, the dataset is
divided into training and test sets. To determine the
correlation between the selected features and dropout
outcomes, the logistic regression model is trained us-
ing the training data. The model is tested on the test-
ing set, and metrics like accuracy, precision, recall,
F1-score, and ROC-AUC (K, 2020) are used to as-
sess its performance. An effective predictive model
for comprehending dropout patterns among Indige-
nous students is being developed through this thor-
ough process.
4 METHODOLOGY AND
EXPERIMENTATION
The logistic regression model was implemented as
part of our proposed management system to address
educational disparities faced by Indigenous students
in Canada. The model’s objective was to predict
the likelihood of student dropout (1) or non-dropout
(0) based on key factors, including Cultural Identity,
Gender, Government Funding, Type of Educational
Institute, Employment Sector, Language Proficiency,
Community Involvement, Age, and Level of Educa-
tion. We trained the model using a binary classifi-
cation algorithm and evaluated its performance using
various metrics.
4.1 Data Collection
A variety of sources, including the Alberta Open
Data Portal, Statistics Canada, the FNIGC (First Na-
tions Information Governance Center), and the Gov-
ernment of British Columbia, contributed to the col-
lection of data for preprocessing in Indigenous edu-
cation research. The dataset, consisting of indige-
nous headcount enrolment within the Alberta Post
Secondary Education-system with a sample size of
10000 records, includes various types of information,
such as demographic data (e.g., age, gender), cultural
identity, government funding status, enrollment fig-
ures, and academic performance metrics. To ensure
accuracy and suitability for analysis, the data under-
went rigorous cleaning, transformation, and organi-
zation processes. Outliers were addressed, features
were normalized or encoded as required, and missing
values were imputed. Such pre-processing steps are
crucial, especially with a dataset of this size and com-
plexity, to prepare the data for subsequent analysis,
such as logistic regression modeling. This modeling
aims to shed light on the variables influencing the aca-
demic performance and dropout rates of Indigenous
students, providing valuable insights into the educa-
tional landscape.
The Figure 2 illustrates the credential they have
achieved so far in the Alberta region based on the
dataset available on Alberta Open Data Portal web-
site.
Figure 2: Distribution of Indigenous students based on their
education level in Alberta Region.
4.2 Data Processing
For data preprocessing, we will be categorizing the
fields into two categories: Binary and Numerical
variables (Rao et al., 2023). After one-hot encoding,
there will be only two values for categorical variables
which we will include as binary variables. Here is the
categorization:
Binary Variables
Cultural Identity (after binary encoding)
First Nations (1 for ”First Nations”, 0 for oth-
ers)
Inuit (1 for ”Inuit”, 0 for others)
M
´
etis (1 for ”M
´
etis”, 0 for others)
Gender (after binary encoding)
CSEDU 2024 - 16th International Conference on Computer Supported Education
456
Gender (1 for ”Female”, 0 for ”Male”)
Government Funding (after binary encoding)
Government Funding (1 for ”Yes”, 0 for ”No”)
Numerical Variables
Language Proficiency (after label encoding)
Language Proficiency (0 for ”Basic”, 1 for ”In-
termediate”, 2 for ”Advanced”)
Community Involvement (after label encoding)
Community Involvement (0 for ”Low”, 1 for
”Medium”, 2 for ”High”)
Age (Numerical variable)
Age (numeric values)
Level of Education (after label encoding)
Level of Education (0 for ”School”, 1 for ”High
School”, 2 for ”Bachelor’s”, 3 for ”Master’s”,
etc.)
Type of Educational Institute (after label en-
coding)
Type of Educational Institute (0 for ”Public
School”, 1 for ”Private School”, 2 for ”Home-
schooling”, 3 for ”Online Learning”)
Employment Sector (after label encoding)
Employment Sector (0 for ”Government”, 1 for
”Private”, 2 for ”Others”)
Table 1: Summary of Categorical and Numerical Features
in the Dataset.
Field Category
First Nations Binary (Cultural Identity)
Inuit Binary (Cultural Identity)
M
´
etis Binary (Cultural Identity)
Gender Binary
Government Funding Binary
Language Proficiency Numerical (Ordinal)
Community Involvement Numerical (Ordinal)
Age Numerical (Ordinal)
Level Of Education Numerical (Ordinal)
Type of Educational Insti-
tute
Numerical (Ordinal)
Employment Sector Numerical (Ordinal)
4.3 Feature Selection
Recursive feature elimination (RFE) (Misra and Ya-
dav, 2020) is used in the proposed algorithm for
addressing educational disparities among Indigenous
students in Canada to pinpoint the key characteristics
that significantly influence the likelihood of dropout
prediction made by logistic regression. Initially, the
logistic regression model is trained using all perti-
nent features, including binary and numerical vari-
ables. Following a systematic elimination of less sig-
nificant features, the RFE process ranks the remain-
ing features according to how well they predict aca-
demic outcomes. The subset of crucial elements with
the greatest influence on Indigenous students’ dropout
rates is identified through this iterative process.
Algorithm 1: Algorithm: Feature Selection using RFE and
Logistic Regression.
Data: Input data X, Target variable y
Result: Selected features X
selected
Encode categorical features;
X
enc
OneHotEncode(X);
Split data into training and testing sets;
X
train
,X
test
,y
train
,y
test
split(X
enc
,y,test size = 0.2, random state =
42,stratify = y);
Create logistic regression model with
balanced class weights;
model LogisticRegression(class weight =
’balanced’);
Implement RFE for feature selection;
num features 5;
r f e RFE(model,n features to select =
num features);
r f e.fit(X
train
,y
train
);
Fit model on selected features;
X
train sel
X
train
[:,r f e.support ];
model.fit(X
train sel
,y
train
);
Predict target variable on the testing set;
X
test sel
X
test
[:,r f e.support ];
y
pred
model.predict(X
test sel
);
Calculate F1-score and accuracy;
f 1 F1-score(y
test
,y
pred
);
accuracy accuracy score(y
test
,y
pred
);
Print results;
print(”F1-score for ’Dropout’:”, f1);
print(”Accuracy:”, accuracy);
The F1-score is used in the above algorithm to
evaluate how well the logistic regression model pre-
dicted educational outcomes for Indigenous students.
Particularly when working with datasets that are un-
balanced, it aids in assessing the model’s capacity to
balance precision and recall. The computational com-
plexity of RFE, which is roughly O(np
2
/2), is also
taken into account when choosing the features. With
RFE’s iterative design, subsets of features are used to
train the logistic regression model, allowing for the
effective identification of key predictors to reduce ed-
ucational disparities among Indigenous communities.
Addressing Educational Disparities: Assessing the Gap for Indigenous Community
457
4.4 Data Split
The ”train-test split, in which the dataset is divided
into two parts: a training set and a test set”, is the
most popular data splitting technique.
1. The training set and the test set should be sep-
arated from the dataset. The logistic regression
model will be trained using the training set, and
its performance will be assessed using the test set.
The training set is utilized to train the machine
learning model. During this phase, the model
learns patterns and relationships within the data.
The test set is reserved for evaluating the model’s
performance. It consists of data that the model has
not seen during the training phase, allowing for an
unbiased assessment of its generalization capabil-
ity.
2. The split ratio is typically 80% training data and
20% test data, but it can be changed depending
on the size of the dataset and the particular re-
quirements.Given our dataset size of 10000, we
have allocated 8000 records for training and 2000
records for testing.
4.5 Model Training
In order to accurately predict whether a student is
likely to drop out (1) or not (0) based on the selected
features: ”Cultural Identity, ”Gender,” ”Government
Funding, ”Type of Educational Institute, ”Employ-
ment Sector, ”Language Proficiency, ”Community
Involvement, ”Age, and ”Level of Education, the
logistic regression model must be trained to recog-
nize patterns and relationships within the training data
(Parker et al., 2013).
For each data point in the training set, the calcu-
lated values in the context of logistic regression re-
fer to the linear combinations of the chosen features
(’Cultural Identity’, ’Gender’, ’Government Fund-
ing’, ’Type of Educational Institute’, ’Employment
Sector’, ’Language Proficiency’, ’Community In-
volvement’, ’Age’, ’Level of Education’). For each
data point, the model will compute a weighted sum
of the feature values and add a bias term. The linear
combination (y) for a single data point (x) is com-
puted mathematically as follows:
y = b
0
+ b
1
x
1
+ b
2
x
2
+ . .. + b
n
x
n
(1)
Here:
y is the calculated value for a given data point.
b0 is the bias term (intercept).
b1, b2, ..., bn are the weights (coefficients) as-
signed to each feature.
x1, x2, ..., xn are the feature values for the corre-
sponding data point.
The calculated values (y) will then be subjected to
the sigmoid function to convert them into probabili-
ties. The formula for the function is:
p =
1
1 + exp(y)
(2)
Figure 3: The S Curve: Visualization of Logistic Regres-
sion’s Probabilistic Model.
Thus, p is the logistic regression model’s out-
put—the likelihood that a student will drop out—for
a specific data point.
The exponential representation of -y is represented by
exp
y
. No matter the range of the calculated val-
ues (y), the sigmoid function (as shown in Figure 3
(Essampally, 2020)) will make sure that the output
probabilities (p) are bounded between 0 and 1. For
the corresponding data point, the model will predict a
dropout (1) when p is greater than or equal to 0.5, and
a non-dropout (0) when p is less than or equal to 0.5.
The regression algorithm will iteratively adjust the
weights (b1, b2,..., bn) and the bias term (b0) during
the model training process using optimization tech-
niques like gradient descent. The goal is to identify
the best combination of weights and biases to mini-
mize the discrepancy between the predicted probabil-
ities and the actual binary outcomes (dropout or non-
dropout) in the training data, thereby enhancing the
model’s capacity to make precise predictions on fresh,
untested data.
4.6 Model Evaluation
Evaluation metrics like accuracy, precision, recall,
F1-score, and ROC-AUC (Area Under the Curve - Re-
ceiver Operating Characteristic) are used to assess the
performance of the logistic regression model, which
was trained using the features like Cultural Identity,
Gender, Government Funding, Type of Educational
Institute, Employment Sector, Language Proficiency,
Community Involvement, Age, and Level of Educa-
tion. These metrics evaluate how well the model per-
forms in accurately predicting Indigenous students’
CSEDU 2024 - 16th International Conference on Computer Supported Education
458
likelihood of dropping out of school. We can assess
the model’s efficacy in addressing educational dispar-
ities and promoting educational equity among Indige-
nous communities in Canada by analyzing these eval-
uation results on the testing dataset.
4.7 Interpretability
The logistic regression model’s feature importance
(coefficients) offers important insights into how each
feature affects the likelihood that Indigenous students
will drop out of school. While negative coefficients
suggest factors linked to lower dropout rates, pos-
itive coefficients point to factors linked to higher
dropout rates. Stakeholders can pinpoint key causes
of educational disparities by looking at these coef-
ficients for categories like Cultural Identity, Gender,
Government Funding, Type of Educational Institute,
Employment Sector, Language Proficiency, Commu-
nity Involvement, Age, and Level of Education. In
order to promote educational equity for Indigenous
communities, targeted interventions and support sys-
tems are informed by this understanding. While there
are other interpretability techniques, Feature Impor-
tance (Coefficients) stands out for its clarity, usabil-
ity, and universal interpretability, enabling evidence-
based decision-making to effectively close the educa-
tional gap for Indigenous students.
5 RESULTS
The logistic regression model demonstrated promis-
ing results in predicting dropout rates among Indige-
nous students. The model achieved an accuracy rate
of 0.831 on the testing dataset, indicating its ability
to correctly classify dropout outcomes. Moreover, the
F1 score was 0.831 (as shown in Figure 4), showcas-
ing its effectiveness in identifying students at risk of
dropping out and minimizing false positives.
Figure 4: Showing the Accuracy for Dropped out instances
for Indigenous Students.
The interpretability of the logistic regression
model allowed us to identify key factors influencing
dropout rates among Indigenous students. The coeffi-
cients for various features provided valuable insights
into the impact of Cultural Identity, Gender, Gov-
ernment Funding, Type of Educational Institute, Em-
ployment Sector, Language Proficiency, Community
Involvement, Age, and Level of Education on edu-
cational outcomes. For instance, positive coefficients
indicated factors associated with higher dropout rates,
while negative coefficients pointed to factors linked
to lower dropout rates. This understanding empow-
ered us to design targeted interventions and support
systems to enhance educational equity and bridge the
educational gap for Indigenous students in Canada.
6 LIMITATIONS/CHALLENGES
The proposed approach to addressing educational dis-
parities faced by Indigenous students in Canada, uti-
lizing a database and logistic regression model, shows
promise but has several limitations. Initially, the ap-
proach was highly data-driven, relying on the quan-
tity and availability of data, posing challenges due to
privacy concerns, historical factors, and limited data
collection in Indigenous communities. Missing val-
ues and biased data, along with outliers, can impact
the accuracy of predictions, while sampling bias and
data heterogeneity may hinder model generalization.
Additionally, the logistic regression model, though
insightful, may not capture all complexities, assum-
ing linear relationships and struggling with high-
dimensional and non-numeric data.
Understanding the rich tapestry of diversity
among Canada’s over 600 Indigenous communities
is essential for addressing educational disparities ef-
fectively. Each community, comprising First Na-
tions, M
´
etis, and Inuit, possesses distinct cultural
and socio-economic backgrounds. Factors such as
cultural diversity, socioeconomic variations, histori-
cal traumas like colonization, and the intertwining of
language and identity directly influence educational
experiences. Disparities between remote and urban
communities, driven by geographic barriers and in-
frastructure challenges, further complicate the edu-
cational landscape. Limited funding, technical con-
straints, and resistance to technological changes add
to the challenges, requiring collaborative efforts and
strategic initiatives to overcome resource constraints.
While the logistic regression model provides in-
sights into factors contributing to dropout rates, it may
not capture the full spectrum of complexities in In-
digenous education. Its limitations in handling non-
linear relationships, high-dimensional data, and non-
numeric variables highlight the need for a more com-
prehensive approach. Incorporating various machine
learning models such as SVM, decision trees, and sta-
tistical techniques like correlation analysis can offer a
broader understanding of educational disparities. Up-
Addressing Educational Disparities: Assessing the Gap for Indigenous Community
459
holding ethical principles in Indigenous education re-
search, including informed consent, confidentiality,
cultural sensitivity, and community involvement, is
crucial for ensuring positive, inclusive outcomes and
promoting long-term sustainability.
7 CONCLUSION AND FUTURE
WORK
The future work of this research paper holds promis-
ing directions to enhance its impact and tackle ad-
ditional challenges in Indigenous education. Ensur-
ing data privacy and security while collaborating with
Indigenous communities is essential for comprehen-
sive data collection. Cultural sensitivity should be
prioritized throughout the research process to respect
community values and align research outcomes with
their needs. Incorporating qualitative data alongside
quantitative measures can provide deeper insights into
Indigenous students’ experiences. Exploring various
machine learning models and forming collaborations
between academic institutions, government agencies,
and Indigenous communities can lead to more accu-
rate predictions and sustainable change. Ultimately,
future research should focus on promoting equitable
and inclusive education while addressing educational
disparities faced by Indigenous students. In con-
clusion, the logistic regression model proved to be
a valuable tool in addressing educational disparities
faced by Indigenous students. Its accuracy and in-
terpretability allowed for the identification of sig-
nificant predictors of dropout, enabling the develop-
ment of evidence-based strategies for educational im-
provement. By incorporating Indigenous data gover-
nance principles and utilizing MongoDB as a NoSQL
database, our management system offers a unique and
comprehensive solution to empower Indigenous com-
munities and promote educational equity in Canada.
The integration of Indigenous knowledge systems,
languages, and histories into the curriculum further
enhances the educational experience for all students
and fosters reconciliation. With our research and
data-driven approach, we aim to contribute to the de-
velopment of effective policies and strategies that pro-
mote justice, fairness, and inclusivity in education for
Indigenous communities.
REFERENCES
Bisong, E. (2019). Logistic regression. pages 243–250.
Brown, D. (2023). Ontario has ‘come quite far’ on indige-
nous education but there’s much more to be done: re-
port. CBC.
Campbell, C. (2020). Educational equity in canada: the
case of ontario’s strategies and actions to advance ex-
cellence and equity for students. School Leadership &
Management, 41(4-5):1–20.
CFS Ontario (2021). Post secondary education and treaty
rights.
Das, A. (2021). Logistic regression. pages 1–2.
Essampally, D. (2020). Logistic regression. Medium.
Government of Canada, Interagency Advisory Panel on Re-
search Ethics (2019). Tri-council policy statement:
Ethical conduct for research involving humans - tcps
2 (2018) - chapter 9: Research involving the first na-
tions, inuit and m
´
etis peoples of canada.
Government of Ontario (2017). Aboriginal students - col-
leges and universities.
K, G. M. (2020). Machine learning basics: Logistic regres-
sion.
Kim, P. J. (2019). Social determinants of health inequities
in indigenous canadians through a life course ap-
proach to colonialism and the residential school sys-
tem. 3(1):378–381.
Li, J., Brar, A., and Roihan, N. (2021). The use of digi-
tal technology to enhance language and literacy skills
for indigenous people: A systematic literature review.
Computers and Education Open, 2.
Misra and Yadav (2020). Improving the classification ac-
curacy using recursive feature elimination with cross-
validation. 11(3).
Parker, P. D., Bodkin-Andrews, G., Marsh, H. W., Jerrim,
J., and Schoon, I. (2013). Will closing the achieve-
ment gap solve the problem? an analysis of primary
and secondary effects for indigenous university entry.
Journal of Sociology.
Rao, K. M., Saikrishna, G., and Supriya, K. (2023). Data
preprocessing techniques: emergence and selection
towards machine learning models - a practical review
using hpa dataset. Multimedia Tools and Applications.
Schaub, A. (2020). Examining an apparent educating gap
between non-indigenous and indigenous learners: A
hermeneutic phenomenology approach.
Stagg Peterson, S. and Dwyer, B. (2016). Research in
canada’s northern rural and indigenous communities:
Supporting young children’s oral language and writ-
ing. The Reading Teacher, 70.
Weston, D. (2019). Importance of safe water for aboriginal
children’s education. Heart and Art.
Wu, J., Lin, S., Kong, H., and Shi, H. (2019). The com-
bination forecasting model of telecommunication user
tricking account overdraft limit based on logistic re-
gression and svm. pages 411–415.
CSEDU 2024 - 16th International Conference on Computer Supported Education
460