(a) MoMR worked well, selecting the expected model
MLPs(J = 2) as the best for artificial data, and select-
ing the model composed of one linear and two MLPs
as the best for Abalone data. These best models show
smaller BIC and RSS values than those of any mixture
of linear regressions or any single MLP regression.
(b) The learning of MoMR goes in a double loop;
the EM controls the outer loop and MLP learning
method controls the inner loop. As for MLP learning,
a quasi-Newton method called BPQ worked well for
MoMR, while BP worked rather poorly, frequently
finding rather poor solutions, having larger (worse)
RSS than BPQ, selecting inadequate models differ-
ent from those by BPQ. This tendency was caused by
BP’s weak capability to find excellent solutions.
(c) MoMR using EM+BPQ is expected to improve
goodness of fit for data having poor fit by any single
regression model or mixture of linear regressions.
5 CONCLUSIONS
This paper proposes modeling and learning of mix-
ture of MLP regressions (MoMR). The learning of
MoMR goes in a double loop; the outer loop is con-
trolled by the EM and the inner by MLP learning. As
for MLP learning in MoMR, a quasi-Newton worked
satisfactorily, while BP did not work. Our experi-
ments showed MoMR worked well for artificial and
real datasets. In the future we plan to apply MoMR
using EM+BPQ to more data to show MoMR can be
a useful regression model for noisy data.
ACKNOWLEDGMENT
This work was supported by Grants-in-Aid for Scien-
tific Research (C) 16K00342.
REFERENCES
Bishop, C. M. (2006). Pattern recognition and machine
learning. Springer.
Chen, Y.-C., Genovese, C., Tibshirani, R., and Wasserman,
L. (2016). Nonparametric modal regression. The An-
nals of Statistics, 44(2):489–514.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).
Maximum-likelihood from incomplete data via the
EM algorithm. J. Royal Statist. Soc. Ser. B, 39:1–38.
Goldfeld, S. and Quandt, R. (1973). A Markov model
for switching regressions. Journal of Econometrics,
1(1):3–15.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep
learning. MIT Press.
Huang, M., Runze, L., and Shaoli, W. (2013). Nonpara-
metric mixture of regression models. Journal of the
American Association, 108(503):929–941.
Hurn, M., Justel, A., and Robert, C. (2003). Estimating
mixtures of regressions. Journal of Computational
and Graphical Statistics, 12(1):1–25.
Leisch, F. (2004). FlexMix: A general framework for fi-
nite mixture models and latent class regression in R.
Journal of Statistical Software, 11(8):1–18.
Luenberger, D. G. (1984). Linear and nonlinear program-
ming. Addison-Wesley.
McLachlan, G. J. and Peel, D. (2000). Finite mixture mod-
els. John Wiley & Sons.
Nakano, R. and Satoh, S. (2018). Weak dependence on ini-
tialization in mixture of linear regressions. In Proc. of
Int. Conf. on Artificial Intelligence and Applications
2018, pages 1–6.
NCSS (2013). Regression clustering. Technical Report
Chapter 449, pp.1–7, NCSS Statistical Software Doc-
umentation.
Qian, G. and Wu, Y. (2011). Estimation and selection in
regression clustering. European Journal of Pure and
Applied Mathematics, 4(4):455–466.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning internal representations by error propaga-
tion. In Parallel Distributed Processing, Vol.1, pages
318–362. MIT Press.
Saito, K. and Nakano, R. (1997). Partial BFGS update and
efficient step-length calculation for three-layer neural
networks. Neural Comput., 9(1):239–257.
Satoh, S. and Nakano, R. (2013). Fast and stable learn-
ing utilizing singular regions of multilayer perceptron.
Neural Processing Letters, 38(2):99–115.
Satoh, S. and Nakano, R. (2017). How new information
criteria WAIC and WBIC worked for MLP model se-
lection. In Proc. of 6th Int. Conf. on Pattern Recog-
nition Applications and Methods (ICPRAM), pages
105–111.
Schwarz, G. (1978). Estimating the dimension of a model.
Annals of Statistics, 6:461–464.
ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods
516