4 CONCLUSIONS
This paper investigates deep classifier structures
with stacked autoencoder for higher-level feature
extraction. The proposed approach can overcome
possible overfitting and vanishing/exploding
gradient problems in deep learning with limited
training data. It is evident from the experimental
results that the deep multilayer perceptron trained
using the proposed three-stage learning algorithm
significantly outperformed the pre-trained stacked
autoencoder with support vector machine classifier.
Also, it can be seen that the proposed method (M3)
has much smaller difference between testing
accuracy and training accuracy than methods M1
and M2, which can be regarded as evidence of less
serious overfitting in the proposed method.
Preliminary experimental results have demonstrated
the advantages of the proposed method. Further
tests on this algorithm would be applied to deep
neural networks with more layers and hopefully
would beef up the performance of these networks.
Also, tests with other applications would be
conducted in future investigations.
REFERENCES
Abdulhussain M.I. and Gan J.Q., 2016. Class specific pre-
trained sparse autoencoders for learning effective
features for document classification. Proceedings of
the 8th Computer Science and Electronic Engineering
Conference (CEEC), Colchester, UK, pp. 36-41.
Bengio Y., 2013. Deep learning of representations:
looking forward. Proceedings of International
Conference on Statistical Language and Speech
Processing, Spain. Lecture Notes in Computer
Science (LNCS), vol. 7978, pp. 1-37.
Bengio Y., Courville A., and Vincent P., 2013.
Representation learning: A review and new
perspectives. IEEE transactions on pattern analysis
and machine intelligence, 35, 1798-1828.
Bengio Y., Lamblin P., Popovici D., and Larochelle H.,
2007. Greedy layer-wise training of deep networks.
Proceedings of Advances in Neural Information
Processing Systems. MIT press, pp. 153-160.
Chapelle O. and Erhan D., 2011. Improved preconditioner
for hessian free optimization. Proceedings of the
NIPS Workshop on Deep Learning and Unsupervised
Feature Learning, pp. 1-8.
Chen X.W. and Lin X., 2014. Big data deep learning
challenges and perspectives. IEEE Access, vol. 2, pp.
514-525.
DeSousa C.A., 2016. An overview on weight
initialization methods for feedforward neural
networks. Neural Networks (IJCNN), International
Joint Conference , IEEE, 52-59.
Fernandez-Redondo M. and Hernandez-Espinosa C.,
2001. Weight initialization methods for multilayer
feedforward. Proceedings of European Symposium
on Artificial Neural Networks (ESANN), Bruges
Belgium, pp. 119-124.
Gehring J., Miao Y., Metze F., and Waibel A., 2013.
Extracting deep bottleneck features using stacked
auto-encoders. Proceedings of IEEE International
Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 3377-3381.
George, K. and Joseph, S., 2014. Text classification by
augmenting bag of words (bow) representation with
co-occurrence feature. Journal of Computer
Engineering (IOSR-JCE), vol. 16, pp. 34-38.
Geman S., Bienenstock E., and Doursat R., 1992.Neural
networks and the bias/variance dilemma. Neural
Computation, vol. 4, pp. 1-58.
Glorot X. and Bengio Y., 2010. Understanding the
difficulty of training deep feedforward neural
networks. Proceedings of the 13th International
Conference on Artificial Intelligence and Statistics
(AISTATS), Sardinia, Italy, vol. 9, pp. 249-256.
Hinton G.E., et al., 2012. Deep neural networks for
acoustic modeling in speech recognition. IEEE Signal
Processing Magazine, vol. 29, pp. 82–97.
Hinton G.E., Osindero S., and Teh Y.W., 2006. A fast
learning algorithm for deep belief nets. Neural
Computation, vol. 18, pp. 1527-1554.
Hinton G.E. and Salakhutdinov R.R., 2006. Reducing the
dimensionality of data with neural networks. Science,
vol. 313, no. 5786, pp. 504-507.
Krizhevsky A., Sutskever I., and Hinton G.E., 2012.
ImageNet classification with deep convolutional
neural networks. Proceedings of Advances in Neural
Information Processing Systems, vol. 25, pp. 1090–
1098.
LeCun Y., Bengio Y., and Hinton G., 2015. Deep
learning. Nature, vol. 521, pp. 436-444
Lee H., Ng A.H., Koller D., and Shenoy K.V., 2010.
Unsupervised feature learning via sparse hierarchical
representations. Ph.D. Thesis, Dept. of Comp. Sci.,
Stanford University.
Martens J., 2010. Deep learning via Hessian-free
optimization. Proceedings of the 27th International
Conference on Machine Learning (ICML-10), pp.
735-742.
Martens J. and Sutskever I., 2012. Training deep and
recurrent networks with Hessian-free optimization. In
Neural Networks: Tricks of the Trade, Springer,
LNCS, pp. 479-535.
Mohamed A., Dahl G.E., and Hinton G.E., 2012.
Acoustic modeling using deep belief networks. IEEE
Transactions on Audio, Speech, and Language
Processing, vol. 20, pp. 14-22.
Najafabadi M.M., Villanustre F., Khoshgoftaar T.M.,
Seliya N., Wald R., and Muharemagic E., 2015. Deep
learning applications and challenges in big data
analytics. Journal of Big Data, vol. 2, no. 1, pp. 1-21.