Table 3: F1 scores for Jobs 1, 2 and 3 given models M
0
, M
1
and M
2
.
Job M
0
M
1
M
2
1 0.4615 0.667 0.667
2 0.75 0.947 0.947
3 0.889 0.571 0.571
calculated as the proportion of the number of relevant
documents up to that point. On the other hand, recall
is calculated by incrementing 0.1 (1/10) every time
a relevant document is found. The final F1 score is
calculated on the precision and recall obtained in the
tenth row. Hence, for the illustrative example in Table
2 we have F1 = 2
0.5×0.5
0.5+0.5
= 0.5.
As one can see in Table 3, the best fit when con-
sidering only standard N-grams (model M
0
, no skips)
is Job 3 (the software development role), with an F1
score of 0.889. Job 2 (the business development role)
also had a decent F1 score of 0.75. Job 1 (the sales
executive role) had a very poor F1 score on the other
hand. On the other hand, for model M
1
, the F1 score
for Job 1 and Job 2 is greatly improved while the F1
score for Job 3 is much inferior. No difference in re-
sults were given, on the other hand, between model
M
1
and model M
2
. We can also see that the descrip-
tion for Job 1 performed the poorest in the r´esum´e
search when taking into account the F1 score for all
models.
6 DISCUSSION
It can be noted that different models can perform bet-
ter for different applications. The no skips model per-
forms better for highly specific job descriptions such
as the software development role. On the other hand,
when considering less specific job descriptions, mod-
els with skips tend to return more relevantdocuments.
It is therefore advisable to experiment with different
settings for K in the word embedding model when
performing the search.
As a side note, it was observed that the model did
not perform well if the job descriptions included job
titles of other roles such as, for example, ”... reporting
directly to the chief executive officer...”. This is be-
cause the applied model does not look at the the order
in which words are presented, but rather at the col-
lection of terms within each document. Hence, when
using this model, recruiters need to take into consider-
ation that some words might be related more to other
job descriptionsthan to the one in question, as this can
lead the information retrieval system to return seem-
ingly irrelevant documents, and this issue may need
to be rectified.
REFERENCES
Bengio, Y., Ducharme, R., Vincent, P. and Janvin, C.
(2003). A neural probabilistic language model. J.
Mach. Learn. Res., 3:1137-1155.
Cabrera-Diego, L. A., Durette, B., Lafon, M., Torres-
Moreno, J. M., and El-B`eze, M. (2015). How Can We
Measure the Similarity Between R´esum´es of Selected
Candidates for a Job?. In Proceedings of the 11th In-
ternational Conference on Data Mining (DMIN’15),
pages 99-106.
Cabrera-Diego, L. A., El-B`eze, M., Torres-Moreno, J.M.,
and Durette, B. (2019). Ranking Rsums Automatically
Using only Rsums: A Method Free of Job Offers. Ex-
pert Syst. Appl., 123: 91-107.
Guthrie, D., Allison, B., Liu, W., Guthrie, L., and Wilks,
Y. (2006). A Closer Look at Skip-Gram Modelling.
In Proceedings of the Fifth International Conference
on Language Resources and Evaluation (LREC-2006),
Genoa, Italy.
Le, Q. and Mikolov, T. (2014). Distributed Representa-
tions of Sentences and Documents. In ICML’14 Pro-
ceedings of the 31st International Conference on In-
ternational Conference on Machine Learning, pages
(II)1188-(II)1196, ACM.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and
Dean, J. (2013). Distributed Representations of Words
and Phrases and their Compositionality. In Advances
in Neural Information Processing Systems 26, pages
3136-3144, NIPS.
Naveenkumar, D.S.R., Kranthi Kiran, M., Thammi Reddy,
K. and Sreenivas Raju, V. (2015). Applying NLP
Techniques to Semantic Document Retrieval Applica-
tion for Personal Desktop. In Emerging ICT for Bridg-
ing the Future - Proceedings of the 49th Annual Con-
vention of the Computer Society of India (CSI) Volume
1, pages 385-392, Springer.
Pandiarajan, S., Yazhmozhi, V. M. and Praveen Kumar,
P. (2014). Semantic Search Engine Using Natu-
ral Language Processing. Advanced Computer and
Communication Engineering Technology, 351:561-
571, Springer.
Porter, M. F. (1980). An algorithm for suffix stripping. Pro-
gram, 14(3):130137.
Schmitt, T., Gonard, F., Caillou, P., and Sebag, M. (2017).
Language Modelling for Collaborative Filtering: Ap-
plication to Job Applicant Matching. In 2017 IEEE
29th International Conference on Tools with Artificial
Intelligence (ICTAI), pages 1226-1233, IEEE.
Wei, X., and Croft, W. B. (2006). LDA-based Document
Models for Ad-Hoc Retrieval. In Proceedings of the
29th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval,
pages 178-185, ACM.
Wang, C. and Blei, D.M. (2011). Collaborative Topic Mod-
eling for Recommending Scientific Articles. In Pro-
ceedings of the 17th ACM SIGKDD International Con-
ference on Knowledge Discovery and Data Mining,
pages 448-456, ACM.