User Modeling of Skills and Expertise from Resumes
Hua Li, Daniel J. T. Powell, Mark Clark, Tifani O'Brien and Rafael Alonso
Leidos, Inc., Virginia, U.S.A.
Keywords: User Modeling, Expertise Modeling, Resume, Profile, Skill.
Abstract: Job applicants describe their skills and expertise in resumes and curriculum vitaes (CVs). These biographic
data are often evaluated by human resource personnel or a search committee. This manual approach works
well when the number of resumes is small. However, in this information age, the volume of available
resumes can be overwhelming and there is a need for automatic evaluation of applicant skills and expertise.
In this paper, we describe a user modeling algorithm to quantitatively identify skills and expertise from
biographic data. This algorithm is called REMA (Resume Expertise Modeling Algorithm). REMA takes
data from a resume document as input and produces an expertise model. The expertise model details the
expertise topics for which the resume owner has claimed competency. Each topic carries a weight indicating
the level of competency. There are two key insights for this algorithm. First, one's expertise is the
cumulative result of the various “learning events” in one's career. These learning events are mentioned in
various sections of the resume, such as earning a degree, writing a paper, or getting a patent. Second, one's
knowledge and skills can become outdated or forgotten over time if not reinforced by learning. We have
developed a prototype resume evaluation system based on REMA and are in the process of evaluating
REMA’s performance.
1 INTRODUCTION
A resume is a written summary of one’s education,
work experience, skills, credentials, and
accomplishments that is often used to apply for jobs.
One key function of a resume is to provide
information regarding one’s skills and expertise. The
resume evaluator is required to manually judge the
competence of a skill, i.e., the level of mastery of
that skill, and/or the expertise in a skill domain, i.e.,
the level of mastery of all skills in that domain.
Expertise modeling is used to automatically produce
a quantitative assessment of the competence of
various skills and the expertise of relevant skill
domains from a resume.
Expertise modeling is valuable for evaluators
when faced with a large number of resumes to
review. Given a position description, expertise
modeling can be used to find suitable candidates by
matching a set of skills between a position and a
candidate’s resume. There are many other
applications of expertise modeling, some of which
are listed below.
Expertise finder: given a skill, find experts with
that skill, i.e., find resumes with a high expertise
level for that skill.
Virtual expertise group finder: given a set of
resumes, cluster them based on skill set
similarities.
Job finder: given a candidate resume, find
matching positions within a career market.
Next skill to learn: suggest skills for career
development. Given a user resume, find career-
conducive skills (CCS) that the user should learn.
The CCS are skills that frequently co-occur with
a user’s skill set in expert resumes.
Even though there’s extensive literature on expertise
recommendation, to our knowledge, expertise
modeling from resumes is still an open research
area. Our model is called Resume Expertise
Modeling Algorithm (REMA). It takes a resume, CV
or biographical document as input and produces an
expertise model. The algorithm focuses on the
concept of “expertise mention” in the resume, e.g.,
earning a Ph.D. in psychology in 2000 or publishing
a paper on psychology in 2001. Such mentions
imply a certain level of education on a given topic
(i.e., psychology) at a particular point in time (i.e.,
2000 and 2001, respectively). The greater the
number of learning events on a specific topic, the
more competent that person is with the topic. This
process is referred to as “reinforcement”.
Li, H., Powell, D., Clark, M., O’Brien, T. and Alonso, R..
User Modeling of Skills and Expertise from Resumes.
In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - Volume 3: KMIS, pages 229-233
ISBN: 978-989-758-158-8
Copyright
c
2015 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
229
Conversely, skills can get rusty over time if a person
stops learning. This process is referred to as
“forgetting”. The expertise modeling algorithm
processes the expertise mentions incrementally, in a
chronological order. New expertise topics are
inserted into the expertise model and their weights
are increased by reinforcement and decreased by
forgetting.
2 RELATED WORK
The related work is mainly in the area of expertise
recommendation or expert finding which attempts to
find the right person with the appropriate skills and
knowledge. This is useful for many purposes
including problem solving, question answering, and
collaboration. A significant amount of research has
been generated in the Information Retrieval
community (Smirnova1 and Balog, 2011, Balog et
al. 2007; Balog et al. 2009; Liebregts and Bogers,
2009). This line of research focuses on content-
based algorithms, similar to document search. These
algorithms identify experts based on the content of
documents that they are associated with (Liebregts
and Bogers, 2009; Serdyukov et al. 2007). While
these approaches have been very effective in finding
the most knowledgeable people on a given topic
based on a large collection of documents from an
enterprise or the internet, it’s not clear how they can
be used to assess the expertise on multiple topics
based on a single resume.
3 THE REMA ALGORITHM
The REMA algorithm is shown in Figure 1. An
input resume is first parsed into expertise mentions.
The mentions are evaluated by Natural Language
Processing (NLP) tools to extract expertise topics.
Figure 1: REMA algorithm diagram.
These topics are then processed by REMA’s
expertise model adaptation component to generate
the expertise model. This algorithm is an extension
of our user modelling algorithm RAMA
(Reinforcement and Aging Modeling Algorithm)
described in detail elsewhere (Li and Alonso, 2014;
Li and Alonso, 2012; Alonso et al., 2010).
3.1 Expertise Mentions
Expertise mentions are phrases or statements in the
resume that indicate significant learning events. For
example, a resume may mention a paper on
databases in a certain year in the publication section.
When parsing the expertise mentions, the associated
resume section and the date of the event are captured
because they are important indicators of level of
expertise. REMA uses a source relevance parameter
to register the fact that expertise mentions in
different parts of a resume carry different
significance. For example, a mention in a patent and
publication section should indicate more expertise
than one in an experience and education section.
Even within the same section, mentions originated
from different sources may carry different
significances. For example, within the publication
section, mentions of a book or journal paper are
more indicative of expertise than those of a
conference paper. The date of event mentioned
reflects the recency of the learning. In other words,
skills or expertise acquired more recently are more
up-to-date and less likely to be forgotten. We use
regular expression and GATE to extract the date and
time information from the expertise mention.
3.2 Expertise Topics
Expertise topics are terms indicating skills or
expertise such as database or machine learning.
They are extracted from expertise mentions using
NLP tools. In particular, Apache Lucene®
1
is used
to extract simple terms from text. WordNet®
2
is
used to identify noun words. GATE is used to
extract noun chunks and named entities. OpenCalais
web service is used to extract expertise related tags
including "Industry Term", "Technology", and
"Programming Language". Relationships between
1
Apache Lucene is a registered trademark of the Apache
Software Foundation within the United States and/or
other countries.
2
WordNet is a registered trademark of the Trustees of
Princeton University within the United States and/or
other countries.
KMIS 2015 - 7th International Conference on Knowledge Management and Information Sharing
230
Figure 2: Left: Effects of Reinforcement Factor (Learning Rate) on Weight; Right: Effects of decay half-life on time
(Forgetting Factor) weight.
expertise topics are also extracted from expertise
mentions. For example, WordNet’s semantic relation
“hyponym” is useful to map a broader expertise to a
narrower one. We can also use Wikipedia’s®
3
categories to establish a similar kind of relationship
between expertise topics.
3.3 Expertise Model Adaptation
This component has two subcomponents: content
adaptation and weight adaptation. The former refers
to the addition of new expertise topics in the
expertise model. As expertise mentions are being
processed in chronological order, unseen expertise
topics will be automatically inserted into the
expertise model with a default weight. The size of
the default weight is equal to the weight change
from the reinforcement of one expertise mention
(see below).
Weight adaptation refers to the dynamic
adjustment of the weights of expertise topics in the
model by reinforcement and forgetting mechanisms.
Reinforcement increases the topic weight. The more
expertise mentions on a given topic, the more life
learning events the person has on that topic. As a
result, the more competent that person is with the
topic. This process is termed “reinforcement”. The
parameter that controls the rate of learning is called
the reinforcement factor. The effects of this factor on
the learning rate in simulation experiments are
shown in the left panel of Figure 2.
Conversely, naturally forgetting will decrease the
topic weight. In general, a person’s skills will
stagnate over time if the person stops learning. There
3
Wikipedia is a registered trademark of the Wikimedia
Foundation, Inc., in the United States and/or other
countries.
are at least two contributing factors to this
stagnation. The first is our memory decay as
described in decay theory
4
. This theory proposes that
memory fades and information becomes less
available for later retrieval as time passes. The
second factor is the technological knowledge
depreciation, or decay (Nemet, 2012, Park, Shin, and
Park, 2006). The average decay rate is estimated at
13.3%, which corresponds to a half-life of 4.86 years
(Park, Shin, and Park, 2006). The effects of five
different decay half-life values over time are shown
in the right panel of Figure 2.
3.4 Expertise Model
The expertise model consists of weighted expertise
topics and their relationships. Each topic carries an
authority label that denotes the origin of the
expertise term, such as a WordNet, Wikipedia, or a
web service like OpenCalais. The collection of
expertise topics represents the breadth of the
person’s skills and expertise. The topic weights
indicate the level of expertise and have a range from
zero to one. The larger the weight, the more
competent the person is with that skill or expertise.
4 REMA RESULTS
We implemented a prototype REMA system in Java.
It has a simple Graphical User Interface (GUI) that
allows the user to process a directory of resumes and
then display the resulting expertise models. This
prototype will be used to conduct evaluation
experiments in the near future to assess the
4
http://en.wikipedia.org/wiki/Decay_theory
User Modeling of Skills and Expertise from Resumes
231
Figure 3: Table view of an example expertise model.
performance of the REMA algorithm. In this section
we show some preliminary results.
4.1 Topic Cloud View for Expertise
Model
An expertise model is shown as a topic cloud where
the larger the font size and the redder the color
indicate higher expertise levels (Figure 3).
Figure 4: Topic Cloud view of an example expertise
model.
4.2 Table View for Expertise Model
An example expertise model is shown as a table with
the following six columns (Figure 4):
ID – the sequence number of each model
element (expertise topic)
ELEMENT – the expertise topic
WEIGHT – the level of expertise
TYPE – the type of topic, XENTITY denotes
an entity topic, XRELATION denotes a synset
(set of synonyms) -> hypernym relationship
defined in WordNet.
NLP – Natural language processing tool used
for extracting this element
Features – features associated with this
element.
5 CONCLUSIONS
We developed REMA, a user modeling algorithm
that quantitatively identifies skills and expertise
from biographic data. REMA takes data from a
resume document as input and produces an expertise
model. There are two key concepts for this
algorithm. First, one's expertise is the cumulative
result of the various “learning events” in one's
career. Second, one's knowledge and skills can
become outdated or forgotten over time if not
reinforced by learning. We have developed a
prototype resume evaluation system based on
REMA and are in the process of evaluating REMA’s
performance.
REFERENCES
Alonso, R., P. Bramsen, and H. Li, 2010. Incremental user
KMIS 2015 - 7th International Conference on Knowledge Management and Information Sharing
232
modeling with heterogeneous user behaviors,
International Conference on Knowledge Management
and Information Sharing, International Conference on
Knowledge Management and Information Sharing
(KMIS 2010).
Li, H. and R. Alonso. User Modeling for Contextual
Suggestion. In Proceedings of 21st Text REtrieval
Conference (TREC 2014). NIST, 2014.
http://trec.nist.gov/pubs/trec23/papers/pro-
RAMA_cs.pdf
Li, H. and R. Alonso, Managing Analysis Context,
ESAIR'12: Fifth International Workshop on Exploiting
Semantic Annotations in Information Retrieval, 2012.
Nemet, G., 2012. Historical Case Studies of Energy
Technology Innovation, 1–11.
Park, G., Shin, J., & Park, Y., 2006. Measurement of
depreciation rate of technological knowledge:
Technology cycle time approach. Journal of scientific
and industrial research 65 (2), 121–127.
Balog, K., Bogers, T., Azzopardi, L., de Rijke, M., van
den Bosch, A.: Broad expertise retrieval in sparse data
environments. In: SIGIR 2007, pp. 551–558 (2007)
Balog, K., Soboroff, I., Thomas, P., Craswell, N., de
Vries, A.P., Bailey, P. Overview of the TREC 2008
enterprise track. In: TREC 2008 (2009).
Smirnova1 E. and K. Balog. A User-Oriented Model for
Expert Finding. P. Clough et al. (Eds.): ECIR 2011,
LNCS 6611, pp. 580–592, Springer-Verlag Berlin
Heidelberg, 2011.
Liebregts, R. and Bogers, T.: Design and evaluation of a
university-wide expert search engine. In: Boughanem,
M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.)
ECIR 2009. LNCS, vol. 5478, pp. 587–594. Springer,
Heidelberg (2009).
Serdyukov, P., Hiemstra, D., Fokkinga, M.M., Apers,
P.M.G.: Generative modelling of persons and
documents for expert search. In: SIGIR 2007, pp. 827–
828 (2007).
User Modeling of Skills and Expertise from Resumes
233