6 CONCLUSION
This paper presents a framework for analysing the
sentiments in the Arabic language that is related to the
automobile field. The paper also explained the details
of the project phases, data collection, annotation
procedures, how the data is cleaned, the feature
selection process, the way of splitting the data, how
the data is graphically depicted and the classification
process with the results of the twenty-two machine
learning classifiers adopted in this contribution. The
highest obtained result for accuracy is 83.79% by the
Ensemble Hard Vote classifier. Hence, the results
reflect that the Ensemble Hard Vote classifier should
be adopted to analyse the sentiment in Arabic
automobile datasets due to its high results in the four
measured scales.
In future work, more experiments and studies will
be conducted on how to enhance the accuracy results
through improving the cleaning process, including
dictionary as a hybrid approach and adopting
advanced deep learning algorithms.
REFERENCES
Liu, B., 2012. Sentiment Analysis and Opinion Mining,
Morgan & Claypool Publishers.
Zaidan, O., Callison-Burch, C., 2014. Arabic Dialect
Identification. Computational Linguistics Journal.
40(1).
Prabowo, R., Thelwall, M., 2009. Sentiment Analysis: A
Combined Approach. Journal of Informetrics. 3(2).
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos,
A., 2015. Feature Selection for High-Dimensional Data
(Artificial Intelligence: Foundations, Theory, and
Algorithms). Springer. 1st Edition, 2015 Edition.
Dingli, A., 2011. Knowledge Annotation: Making Implicit
Knowledge Explicit. Springer. 2011 Edition.
Nichols, T., Wisner, P., Gulabchand, L., Cripe, G., 2010.
Putting the Kappa Statistic to Use. The Quality
Assurance Journal. 13(3-4).
Manolescu, I., Weis, M., 2007. Declarative XML Data
Cleaning with XClean. In CAiSE 2007, International
Conference on Advanced Information Systems
Engineering.
Squire, M., 2015. Clean Data. Packt Publishing.
Kaur, G., 2014. Usage of Regular Expressions in NLP. In
IJRET, International Journal of Research in
Engineering and Technology. 3(4).
Raulji, J., Saini, J., 2016. Stop-Word Removal Algorithm
and its Implementation for Sanskrit Language.
International Journal of Computer Applications.
150(2).
Jivani, A., 2011. A Comparative Study of Stemming
Algorithms. International Journal of Computer
Technology and Applications. 2(6).
Kanan, T., Fox, E., 2016. Automated Arabic text
classification with P-Stemmer, machine learning, and a
tailored news article taxonomy. Journal of the
Association for Information Science and Technology.
67(11).
Elkhoury, R., Taghva, K., J., Coombs, 2005. Arabic
Stemming without a Root Dictionary. In ITCC’05,
International Conference on Information Technology:
Coding and Computing. Vol. 2.
Mohod, S., Dhote, C., 2014. Feature Selection Technique
for Text Document Classification: An Alternative
Approach. International Journal on Recent and
Innovation Trends in Computing and Communication.
2(9).
Gusev, I., Indenbom, E., Anastasyev, D., 2018. Improving
Part-of-Speech Tagging via Multi-Task Learning and
Character-Level Word Representations. In Dialogue
2018, Computational Linguistics and Intellectual
Technologies: Proceedings of the International
Conference. (17).
Deshmukh, S., Shinde, G., 2016. Sentiment TFIDF Feature
Selection Approach for Sentiment Analysis.
International Journal of Innovative Research in
Computer and Communication Engineering. 4(7).
Ojeda, T., Bilbro, R., Bengfort, B., 2018. Applied Text
Analysis with Python: Enabling Language-Aware Data
Products with Machine Learning. O'Reilly Media. 1st
Edition.
Ghosh, S., Desarkar, M., 2018. Class Specific TF-IDF
Boosting for Short-text Classification: Application to
Short-texts Generated During Disasters. In IW3C2,
International World Wide Web Conference Committee.
TfidfVectorizer. [cited 1-10-2019]; Available from: https://
scikit learn.org/stable/modules/generated/sklearn.featu
re_extraction.text.TfidfVectorizer.html
Kshirsagar, V., Awachate, B., 2016. Improved Twitter
Sentiment Analysis Using NGram Feature Selection
and Combinations. In IJARCCE, International Journal
of Advanced Research in Computer and
Communication Engineering. 5(9).
Bhayani, R., Huang, L., A., Go., 2009. Twitter Sentiment
Classification using Distant Supervision. Stanford
Digital Library Technologies Project.
Allison, B., Guthrie, D., Guthrie, L., 2006. Another Look at
the Data Sparsity Problem. International Conference on
Text, Speech and Dialogue
.
Narayanan, V., Arora, I., Bhatia, A., 2013. Fast and
Accurate Sentiment Classification Using an Enhanced
Naïve Bayes Model. In IDEAL, International
Conference on Intelligent Data Engineering and
Automated Learning.
Cocea, M., Liu, H., 2017. Semi-random partitioning of data
into training and test sets in granular computing context.
Granular Computing Journal. 2(4).
Bazazeh, D., Shubair, R., 2016. Comparative Study of
Machine Learning Algorithms for Breast Cancer
Detection and Diagnosis. In ICEDSA, The 5th
International Conference on Electronic Devices,
Systems and Applications.