what should students learn to best meet the require-
ments of the industry? One possibility would be to
manually review the job advertisements in different
newspapers and online platforms. However, this is a
tedious and time-consuming task. Our research hy-
pothesis is that this task can be automated in a way
that the findings are at least nearly as good as a human.
German, however, is a challenge compared to English,
where similar approaches already exist, as German
has a rich morphology, umlauts, four cases and much
fewer corpora for training.
This paper describes an approach for an automated
survey of current job requirements for computer sci-
entists on the German labor market. Obviously, this
survey has to be carried through regularly to identify
trends and manifesting technologies.
The suggested process to realize the automated
survey is based on technologies of Natural Language
Processing and Machine Learning. The results of the
analysis can be used, for example, to check the content
of university curricula or to identify new technological
trends at an early stage. In addition, the procedure can
be used to implement a skill-based job search. The
next section discusses the current state of research and
various papers regarding the extraction of skills from
job ads. Section 3 introduces our approach to job ad
analysis. Both the processing steps and the evaluation
of each step are described in detail. Finally, Section 4
concludes the results and discusses future work.
2 FOUNDATIONS AND RELATED
WORK
Some studies conclude that education does not comply
with the requirements of employers in the IT sector.
Kwon Lee and Han (2008) for example concluded
for the US labor market that most universities attach
great importance to hardware and operating systems,
although the employers surveyed are rarely interested
in these skills. They see for example deficits in the
teaching of skills in the economic and social cate-
gory. Yongbeom et al. (2006) also speak of a skill
gap between employers’ requirements and universi-
ties’ IT curricula. Among other things, the lack of
project management, Enterprise-Resource-Planning
(ERP) and information security modules in the cur-
ricula is criticized. Scott et al. (2002) criticize poor
database knowledge and the lack of skills in CASE/-
modeling tools and Business Process Reengineering
(BPR) techniques among graduates. Students also
lacked skills in XML and iterative development.
There are a number of reasons for the skill gap:
one is the rapid technological change, another is the
discordance between the content of the curricula of
the universities and the required competencies of the
industry (Scott et al., 2002; Milton, 2000). In addition,
too long revision cycles of curricula relative to the
speed of technology change and a lack of knowledge
at universities about new and upcoming technology
are cited as reasons for the gaps (Lee et al., 2002).
In order to keep curricula up-to-date, universities
need to know which competencies are currently and
in the long term required by the industry. For iden-
tifying the skills required various approaches exist.
Prabhakar et al. (2005) researched online job adver-
tisements for computer scientists in the US in 2005
with regard to the changing demand for skills over the
year. For this purpose, they examined the job advertise-
ment to see whether it contained one of 59 keywords
or not. The approach for identifying requested skills
in the IT sector of Gallagher et al. (2010) is interview-
based. His team interviewed 104 senior IT managers.
The questions were very general, e.g. it was asked
whether programming skills were required, and not for
concrete programming languages like Java. Sibarani
et al. (2017) developed an ontology-guided job market
demand analysis process. Their method is based on
the self-defined SARO ontology and a defined set of
skills. Using the predefined skills and ontology, they
perform a named-entity tagging. The identified skills
are linked by a co-word analysis. Litecky et al. (2010)
used web and text mining techniques for retrieving
and analysing a data set of 244.460 job ads. They
scraped the data from online job exchanges to extract
titles and requirements based on predefined keywords.
Wowczko (2015) took a different approach. They anal-
ysed descriptions of vacancies and reduced the words
used in the descriptions until only significant words
remained. Custom word lists, stemming, removing
stopwords, removing numbers, stripping whitespaces,
etc. were used to clean up the data.
The problem with all these approaches except
Wowczko (2015) and Gallagher et al. (2010) is that
they are based on fixed keyword lists. Thus, only abil-
ities contained in the lists are recognized. New tech-
nologies or skills described in any other way cannot be
recognized. Abbreviations like Active Directory and
AD are assigned to different classes or remain unrecog-
nized. Additionally, some processes were performed
manually and are hence rather time-consuming and are
only carried out periodically. Wowczko (2015) also
finds false positives like strong, excellent and can. Con-
sequently, an automated procedure that monitors job
advertisements permanently would simplify this proce-
dure enormously. Thus the approaches also search in
areas of the job advertisement where no requirements
are described (e.g. in the company description).
Automated Analysis of Job Requirements for Computer Scientists in Online Job Advertisements
227