different weighted information gain ratio. Discipline
classification rules are deduced by decision tree. An
automatic classification system is implemented. It
investigates the application of data mining
technology in discipline classification, and provides
advice for the discipline construction in universities
and colleges.
2 RELATED WORK
Advantageous disciplines and newly-emerging
disciplines are the basis of development of first-class
discipline in the world. They play important roles in
the development of the discipline cluster. At the
same time, they take advantage of complementary
disciplines to promote cooperative development
among related disciplines. At present, discipline
evaluation is commonly used to determine whether it
is advantageous discipline, newly-emerging one or
not. The research of discipline evaluation is
composed of several categories. One kind of
discipline evaluation is based on university ranking,
such as Times Higher Education World University
Rankings (THE) (Marijk, 2008), U.S. News College
Rankings (USNWR) (Jamil and Alenoush, 2007),
China Discipline Ranking (CDR) from China
Academic Degrees and Graduate Education
Development Center (CDGDC, 2013). One kind of
discipline evaluation is based on scientific mapping,
such as Bibliometric Rankings from the Centre for
Science and Technology Studies (CWTS) in Laiden
University of Holland (Moed, 2006). One kind of
discipline evaluation is based on tendency research,
such as discipline value evaluation in The National
Center for Scientific Research (CNRS) in France.
Another kind of discipline evaluation is based on
scientific fund management, such as evaluation for
the financing disciplines from Biotechnology and
Biological Sciences Research Council (BBSRC)
(Aghion et al., 2010). All the above discipline
evaluation uses the traditional method, which is
combined of subjective and objective evaluation,
such as expert assessment and bibliometric. The
evaluation process is complex. The evaluation result
is easy to be influenced by the subjectivity and so
on.
In order to solve the existing problems in
discipline evaluation, this paper proposes a new
discipline decision tree classification algorithm
based on weighted information gain ratio. An
automatic discipline classification system is
implemented, verifying the algorithm and analyzing
data from universities in Shanghai. It provides
advice and guidance for comprehensive discipline
evaluation and developing strategy of disciplines.
3 DISCIPLINE DECISION TREE
CLASSIFICATION
ALGORITHM
The discipline decision tree classification algorithm
determines evaluation attributes according to
weighted information gain ratio and correlation
between them. Then it establishes decision tree. The
decision tree is a directed graph to classify items. It
consists of a root node (a node in the graph to which
no other node points), internal nodes (nodes that are
pointed at and to other nodes), and leaves (nodes
that don’t point to other nodes) (Han et al., 2011).
The classified item travels from the root to one of
the leaves, where classification is made. Discipline
classification rules can be deduced by decision tree.
3.1 Basic Definitions
Definition 1 Let S be the set of training samples, and
C
i
the set of all classification attributes. Let S
Ci
be
the subset of S. Probability mass function P
i
is
defined as (1). T(S) is the cardinal number of S.
P
i
= T(S
Ci
) / T(S) (1)
Definition 2 The entropy of S relative to C
i
is
defined as (2). Accum is accumulation of P
i,
i=1…m.
I(S) = - accum (P
i
*Log
2
P
i
) (2)
Definition 3 Let D be the subset of S, which
contains several different evaluation attributes, { A
1,
A
2, …
A
n
}. If each A
i
has k
i
corresponding
characteristics, D can be divided into k
i
subsets
according to evaluation attribute A
i
. The entropy of
D relative to A
i
is defined as (3).
E(D, A
i
) = - accum{ [T(D
ij
) /accum T(D
ij
) ]
*I(D
i