and rate it manually and categorize the comments as
positive or negative. This method is only possible if
the number of respondents is not so many, it will not
be efficient if the number of respondents is very large.
2 THEORY
2.1 Machine Learning
Machine learning is one of the fields of computer
science that studies learning to computer devices.
According to Expert Systems, machine learning is an
application of artificial intelligence (AI) that gives
systems the ability to learn and improve automatically
from experience without being explicitly
programmed. Machine learning focuses on
developing computer programs that can access data
and use it for self-study (Expert.ai Team, 2020
).
Machine learning becomes a powerful tool for
automation by combining data science and analysis to
get effective and fast results for analyzing data.
Machine learning algorithms use statistics to find
patterns in large amounts of data. And data, including
many things can be numbers, words, images, clicks,
or anything. Data is stored digitally, then fed into
machine learning algorithms (Hao, 2018).
Machine learning is an area within artificial
intelligence that deals with the development of
techniques that can be programmed and learn from
past data (Kazmaier et al., 2020). Pattern recognition,
data mining and machine learning are often used to
describe the same thing. This field intersects with the
science of probability and statistics and sometimes
optimization. The application of machine learning
methods into large databases is called data mining
(Vairetti et al., 2020). This can be analogized as if a
large area of land on the earth's surface containing
raw materials from nature can be mined, so that it is
able to produce a small amount of very valuable
material. Similarly, in data mining, large amounts of
data are processed to build simple models to obtain
valuable information.
Currently, there are many machine learning
approaches used for spam detection, Optical character
recognition (OCR), facial recognition, online fraud
detection, NER (Named Entity Recognition), Part-of-
Speech Tagger. (Ozyurt et al., 2020).
In machine learning, the learning process can be
grouped into several scenarios, namely Supervised
Learning, Unsupervised Learning, and
Reinforcement Learning (Kusuma, 2020).
2.1.1 Supervised Learning
Learning with supervised learning uses data input that
has been labeled. After that the system is trained so
that it can make predictions from the data that has
been labeled. The real application of supervised
learning is the display of movie shows on Netflix, the
algorithm will provide impressions suggestions by
finding similar shows.
2.1.2 Unsupervised Learning
Learning with unsupervised learning uses learning
data input that is not labeled. This machine learning
algorithm will try to group the data based on the
characteristics encountered. Unsupervised learning
techniques are less popular because their application
is less clear. Interestingly enough, they have gained
traction to be applied in cybersecurity.
2.1.3 Reinforcement Learning
Learning with reinforcement learning uses mixed
learning and testing. The system collects learning
information actively by interacting with the
environment. Reinforcement learning algorithms
learn through trial and error to achieve goals.
Algorithms use many different things and are
rewarded or punished depending on whether the
behavior helps or hinders achieving its goals.
2.2 Term Frequency-Inverse Document
Frequency (TF-IDF)
The TF-IDF method is a way to weight the
relationship of a word (term) to the text of the
document, combining two concepts. The first concept
is weight calculation, namely, the frequency of
occurrence of a word in a particular document called
Term Frequency (TF). The second concept is the
inverse frequency of documents containing words
called Inverse Document Frequency (IDF). The
frequency of occurrence of a word in a document
which indicates how important a word is in a given
document. The weight of the relationship between a
word in a document will be high if the frequency of
the word is high in the document and the frequency
of the entire document containing that word will be
low in the document set (Amrizal, 2019).
TF-IDF is basically the result of a calculation
between TF (Term Frequency) and IDF (Inverse
Document Frequency). There are many ways to
determine the exact value of the two statistics. In the
case of term frequency tf (t, d), the simplest way is to
use raw frequency in the document, i.e. the number of
Determination of Student Satisfaction Perceptions at Bali State Polytechnic using the TF-IDF Method with Linear Regression and Logistic