health problems, which shows high potential in early
detection as a data source. A. B. R. Shatte, et al.
adopted a scoping analysis method to quickly define
the realm of machine learning in mental health
(Shatte et al 2019). The extraction of data included
information on mental health applications, machine
learning techniques, data types, and research results.
Support vector machine, decision tree, and neural
network were used. The application of machine
learning has shown a series of benefits to mental
health, but most of researches concentrate on the
identification and treatment of mental health
disorders. Machine learning applications still have
plenty of scope to grow in other fields.
Jetli Chung and Jason Teo used the PRISMA
methodology when collecting relevant research
articles and studies by searching reliable databases
(Chung and Teo 2022). Researchers' challenges and
limitations were reflected in the research. In addition,
specific suggestions on potential future research and
development were also provided. Currently, there is
no model that can predict a person's likelihood of
having mental health issues. Machine learning
techniques could improve logistic regression of the
standard prediction modelling technique. Ashley E.
Tate, et al aimed to evaluate whether machine
learning techniques are superior to logistic regression
and create a model to forecast mental health issues in
mid-adolescence (Tate et al 2020). The research used
nearly 500 predictors from register data and parental
report. Finally, the best preforming model is not fit for
clinical use. It is not necessary to seek more complex
methods and forgo logistic regression for similar
studies.
Sumathi M.R. and B. Poorna identified eight
algorithms and evaluated the efficacy in diagnosing
five basic mental health issues with various measures
(Sumathi and Poorna 2016). In order to train and
detect the accuracy of the algorithms, a data set of
sixty cases was collected in the research. Ayako Baba
and Kyosuke Bunji obtained data from 63%
responses of about 6000 undergraduate students from
a Japanese national university (Baba and Bunji 2023).
The research compared the results of different
machine learning models, including elastic net,
logistic regression, XGBoost, random forest, and
LightGBM. According to the comparison, the
LightGBM model performed the best. Both
conditions and analyses reached adequate
performance in this model.
Konda Vaishnavi, et al identified five machine
learning techniques, including KNN, Random Forest,
Decision Tree, etc. (Vaishnavi et al 2021). The
research used several accuracy criteria to assess the
accuracy in identifying mental health problems.
Finally, they acquired the most accurate model based
on the Stacking technique with the prediction
accuracy 81.75%. Jetli Chung and Jason Teo
evaluated some popular machine learning algorithms
(Chung and Teo 2023). Responses to a survey taken
by Open Sourcing Mental Illness were included in the
data set. Machine learning techniques included
Gradient Boosting, Logistic Regression, KNN,
Neural Networks, and Support Vector Machine, as
well as an ensemble approach based on these
algorithms. Gradient Boosting reached the highest
accuracy, which was 88.80%, providing a highly
promising approach toward automated clinical
diagnosis for mental health professionals.
3 METHODOLOGY
Random Forest, Support Vector Machine, and
Logistic Regression are three machine learning
methods that the study suggests using. Their
performances on this data set are compared in order
to identify the best model.
The classification algorithm Support Vector
Machine (SVM) uses interval maximization that
separates data points of different classes by finding an
optimal hyperplane. Data points are mapped into a
high-dimensional space as the fundamental concept
of SVM, which makes it easier to separate data points.
SVM is a commonly used machine learning algorithm
with high accuracy and generalization ability. Finding
an ideal hyperplane that optimizes the separation
between different categories of data points is the aim
of SVM. This distance is known as the margin, and
support vectors are the most closely linked data points
to the hyperplane. The following stages can be used
to explain the fundamentals of SVM, mapping data
points to a high-dimensional space, finding an
optimal hyperplane in a high-dimensional space to
increase the separation of data points from the
hyperplane in several categories., classifying data
points into different categories according to the
hyperplane, categorizing new data points. In SVM,
the mapping of data points can be implemented using
different kernel functions. Gaussian, polynomial, and
linear kernel functions are examples of kernel
functions that are frequently used. These kernel
functions can map data points into higher-
dimensional Spaces, making it easier to separate data
points in higher-dimensional Spaces.
Random forest belongs to the category of
ensemble learning, which creates a strongly
supervised model by mixing weakly supervised
DAML 2023 - International Conference on Data Analysis and Machine Learning
286