ics, and providing per-label latent topics (Ramage
et al., 2011).
2.4 Deep Generative Model (DGM)
Recently, deep neural networks (DNNs) have become
very popular and widely used in various fields of ma-
chine learning, although they were first introduced in
the 1980s. They can adapt to any problem and achieve
excellent results compared with other approaches, but
suffer from high computation costs and the learned
parameters are hard to interpret. Thanks to huge im-
provements in computation technology, amounts of
data populated, and mathematical knowledge, these
models take less time to be optimized and can be used
in practical applications.
The most common deep neural network-based
generative models are the variational autoencoder
(VAE), and generative adversarial network (GAN). To
train these models, we rely on Bayesian deep learn-
ing, variational approximations, and Monte Carlo
Markov Chain (MCMC) estimation, or the old faith-
ful: stochastic gradient estimation (SGD).
The DNN version of LDA has been proposed
and tested using VAE (LDA–VAE) as the inference
method with a special technique for modeling Dirich-
let beliefs. Surprisingly, it achieves more meaning-
ful topics and takes less time than the usual LDA,
but has many hyperparameters to be determined when
constructing the model (Srivastava and Sutton, 2017).
One way to infer stick-breaking construction for VAE
(SB–VAE) has been presented and tested in an im-
age classification task. The results illustrate that this
approach has greater discriminative quality than the
usual VAE and is also supported by t-SNE projec-
tions (Nalisnick and Smyth, 2017).
Although DNNs are the best at most supervised
problems, they cannot directly output lists. One way
to provide label recommendations is to treat the prob-
lem as a multilabel classification. The main approach
to using DGN s is to embed items and their related
labels separately, followed by learning the joint rep-
resentation for the multilabel classification process.
This approach has been tested using image data with
multilabels, where images and labels are embedded
via a convolution neural network (CNN) and a re-
current neural network (RNN) respectively. The net-
work is called CNN–RNN for this reason. Labels
are recommended via predicted probabilities. The ex-
perimental results indicate that this approach outper-
forms the competitors, including a previous DGN ap-
proach (Wang et al., 2016).
In our problem, one network may be used for doc-
ument embedding, like CNN, and another network
embeds the keywords corresponding to each docu-
ment. We will explain how we can define the network
for our problem in the next section.
3 PROPOSED MODEL
We now build an improved model for polylingual
data. As we have seen from the LDA–VAE model,
VAE has a strong emphasis on improving topical
quality compared with the normal LDA model. We
expect a similar emphasis by applying VAE to PLTM,
but the main challenge is how to apply this inference
method to a polylingual document.
Topic models for polylingual data like PLDA
are usually more complicated and difficult to learn
than those for unilingual data, because polylingual
documents can be considered as data with mul-
tiple sources, where each source provides docu-
ments or even parts of documents in a different lan-
guage. Moreover, there are relationships among those
sources to be modeled. The model for multiple data
sources is called a “multimodal model.” Fortunately,
there is a variational autoencoder for multimodal
learning that involves relating information from mul-
tiple sources, which can model joint relationships
among them. It is called a joint multimodal vari-
ational autoencoder (JMVAE) (Suzuki et al., 2016),
where a general evidence lower bound (ELBO) of the
model is written as:
L
JMVAE
= −D
KL
q
Φ
z|{x
(s)
}
s∈S
||p (z)
|
{z }
Regularization term
+
∑
s∈S
E
q
Φ
(
z|{x
(s
0
)
}
s
0
∈S
)
h
log p
Θ
x
(s)
x
(s)
|z
i
| {z }
Expected reconstruction error
.
(1)
where Θ
x
(s)
is a set of model parameters relating to
observed data from source s, (x
(s)
), and Φ is a set of
variational distribution parameters. The only differ-
ence from the original VAE is the summation over all
data sources S.
We can simply apply JMVAE for polylingual doc-
uments as used in LDA–VAE (Srivastava and Sutton,
2017) and write the ELBO function as:
D
KL
q
Φ
z|{x
(s)
}
s∈S
||p (z)
=
1
2
∑
s∈S
(
tr
Σ
−1
1
Σ
(s)
0
− K + log
|Σ
1
|
|Σ
(s)
0
|
+
µ
1
− µ
(s)
0
>
Σ
−1
1
µ
1
− µ
(s)
0
(2)
ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods
198