Authors:
Joscha Grüger
1
and
Georg J. Schneider
2
Affiliations:
1
Computer Science Department, Trier University of Applied Sciences, Main Campus, Trier, Germany, University of Trier, Department of Business Information Systems II, 54286 Trier and Germany
;
2
Computer Science Department, Trier University of Applied Sciences, Main Campus, Trier and Germany
Keyword(s):
Data Analysis, Web Mining, Natural Language Processing, Information Retrieval, Machine Learning, Job Ads, Skills.
Abstract:
The paper presents a concept and a system for the automatic identification of skills in German-language job advertisements. The identification process is divided into Data Acquisition, Language Detection, Section Classification and Skill Recognition. Online job exchanges served as the data source. For identification of the part of a job advertisement containing the requirements, different machine-learning approaches were compared. Skills were extracted based on a POS-template. For classification of the found skills into predefined skill classes, different similarity measures were compared. The identification of the part of a job advertisement containing the requirements works with the pre-trained LinearSVC model for 100% of the tested job advertisements. Extracting skills is difficult because skills can be written in different ways in the German language – especially since the language allows ad-hoc creation of compound. For extraction of skills, POS templates were used. This approac
h worked for 87.33% of the skills. The combination of a fasttext model and Levenshtein distance achieved a correct assignment of skills to skill classes for 75.33% of the recognized skills. The results show that extracting required skills from German-language job ads is complex.
(More)