Automated Analysis of Job Requirements for Computer Scientists in Online Job Advertisements

Joscha Grüger, Georg Schneider

Abstract

The paper presents a concept and a system for the automatic identification of skills in German-language job advertisements. The identification process is divided into Data Acquisition, Language Detection, Section Classification and Skill Recognition. Online job exchanges served as the data source. For identification of the part of a job advertisement containing the requirements, different machine-learning approaches were compared. Skills were extracted based on a POS-template. For classification of the found skills into predefined skill classes, different similarity measures were compared. The identification of the part of a job advertisement containing the requirements works with the pre-trained LinearSVC model for 100% of the tested job advertisements. Extracting skills is difficult because skills can be written in different ways in the German language – especially since the language allows ad-hoc creation of compound. For extraction of skills, POS templates were used. This approach worked for 87.33% of the skills. The combination of a fasttext model and Levenshtein distance achieved a correct assignment of skills to skill classes for 75.33% of the recognized skills. The results show that extracting required skills from German-language job ads is complex.

Download


Paper Citation