2.2 Metabolic Syndrome
In Japan, the decision to diagnose metabolic
syndrome is based on the following criteria: waist
circumference, blood sugar level, HbA1c, systolic
blood pressure, diastolic blood pressure,
triglycerides, HDL cholesterol, and LDL cholesterol,
and people are classified into the metabolic
syndrome group, preliminary group, and normal
group. The screening method is explained in Figure
1 (Wataru et al. 2008).
Figure 1: Screening for metabolic syndrome in Japan.
This screening checks only the combination of four
risk factors: the combination of obesity and
hyperglycemia, hypertension, or dyslipidemia.
However, there are other risk factors reported, such
as mental disorders (Maria D. Llorente et al. 2006; H.
Klar Yaggi 2006). Therefore, many persons at high
risk for these diseases cannot be screened by this
method.
3 PROPOSED METHODS
To realize a screening method that can identify the
various health risk factors, we propose a machine-
learning-based screening method using medical
checkup data and medical billings. In general,
medical checkup data involve blood tests that
indicate health status, and medical billings involve
personal medical history that involves information
about all diseases. Technical knowledge is needed to
make the rules using medical billings for decisions
like metabolic syndrome. Thus, we applied machine-
learning techniques to handle the huge volume of
data statistically. In this paper, we propose using
latent Dirichlet allocation (LDA) (Blei et al. 2003).
LDA is a kind of topic model where machine-
learning techniques are mainly used for natural
language processing. With LDA, we can model data
(such as documents) as a mixture of multiple topics
more precisely than the mixed Gaussian distribution
such as k-means.
3.1 Latent Dirichlet Allocation
LDA has the advantage of easily modeling
documents and is now applied to various data
mining tasks, such as information retrieval and voice
recognition (Ishiguro et al. 2012; Otsuka et al. 2012),
data visualization, and image processing (Fei-Fei et
al. 2005; Wang et al. 2009; Wang and Mori 2009;
Niebles et al. 2008). LDA infers the topic of
documents containing many words from a document
set by assigning each word to a certain topic. In
LDA, documents are handled as a bag-of-words
representation, and these documents are analyzed
according to a word-topic probability matrix ( ϕ
matrix) and topic-document probability matrix (θ
matrix). Some approximation techniques estimate
the parameters of LDA, such as variational Bayes
(Blei et al. 2003), Gibbs sampling (Griffiths and
Steyvers 2004), and collapsed variational Bayes
(The et al. 2006). In this paper, we use the Gibbs
(Griffiths and Steyvers 2004) technique. Now, we
explain and review LDA following the notation of
(Griffiths and Steyvers 2004). Let there be T topics
and
,…
represent bag-of-words representations
for each D document. (Document d becomes
,
,…
,
where N be number of all types
of words. ) Also, let
be the hidden topic from
which
is generated,
|, and
for document d. LDA involves the
following generative model:
θ~Dirα
|
~
ϕ~Dirβ
|
,~
Dir and Mult mean the Dirichlet distribution and
multinomial distribution, respectively. α and βare
hyperparameters for the document-topic and topic-
word Dirichlet distributions, respectively. Here we
assume α and β are scalars resulting in symmetric
Dirichlet priors. Given observed words, we have to
infer the hidden topics. To approximate this
posterior, we resort to a Markov chain Monte Carlo
(MCMC) sampling scheme, specifically a collapsed
Gibbs sampling:
1.Confirmriskfactorofaccumulationofvisceralfat
・waistcircumference
>85centimeters(male)
>90centimeters(female)
2.Confirmadditionalriskfactors
hyperglycemia
hypertension
hyperlipidemia
・Bloodsugarlevel>110mg/dl
・HbA1c>5.5%
・Takingmedicineofdiabetesmellitus
・Triglyceride>150mg/dl
・HDLcholesterol<40mg/dl
・Takingmedicineofdyslipidemia
・SystolicBP>130mmHg
・DiastolicBP>85mmHg
・Takingmedicineofhypertension
3.Decision
RiskfactorNo.1
RiskfactorNo.2
RiskfactorNo.3
RiskfactorNo.4
Anyoneoftheseconditions
visceralfat
Anyoneoftheseconditions
Anyoneoftheseconditions
Thecasethatsubjectpersonhas
・bothriskfactorNo.1and anytwoofadditionalfactors(No.2~4)
・bothriskfactorNo.1and anyoneofadditionalfactors(No.2~4)
・Exceptthecasesabove
Metabolicsyndromegroup
Preliminarygroup
Notmetabolicgroup
MethodofScreeningtheHealthofPersonswithHighRiskforPotentialLifestyle-relatedDiseasesusingLDA-Towarda
BetterScreeningMethodforPersonswithHighHealthRisks
503