Tree-Structured GRU has on average slightly better
performance than N-ary Tree-structured LSTM, but it
is important to notice from Table 2 the standard de-
viation of the individual predictions from N-ary Tree-
GRU seem to fluctuate more than the ones from N-ary
Tree-LSTM therefore it is possible that this difference
of performance can be random.
Something important to notice about the training
process of fine grained classification is that N-ary
Tree-Structured GRU would stop at the 9
th
iteration
while N-ary Tree-Structured LSTM would go all the
way till the 13
th
iteration. Moreover another im-
portant point is that the N-ary Tree-LSTM for fine-
grained classification seems like it has some more
training to do before it overfits, in contrast with N-ary
Tree-GRU which would overfit before having execu-
ted twelve iterations, which can be observed above
(Figures 8). This may have to do with the hyperpa-
rameters that we have chosen. We have set the early
stopping at 2 iterations (as the authors of N-ary Tree-
Structured LSTM paper had), if we would set it to 3
the N-ary Tree-GRU may keep on training till the 12
th
iteration.
Moreover N-ary Tree-GRU’s training and valida-
tion scores seem to fluctuate more in the fine-grained
classification 8 which may underlies unstable pre-
diction and the need to train more.
Table 1: Sentiment Classification Accuracy.
Model Binary Fine-grained
N-ary Tree-LSTM 84.43 45.71
N-ary Tree-GRU 85.61 46.43
Table 2: Sentiment Classification Standard Deviation.
Model Binary Fine-grained
N-ary Tree-LSTM 0.93 0.35
N-ary Tree-GRU 0.98 0.55
5 CONCLUSION AND FUTURE
DIRECTIONS
We can conclude that there is a difference in terms
of performance, not that significant though, between
the tree-structured LSTM and tree-structured GRU.
Moreover, tree-structured GRUs are trained faster -
computationally- since they have fewer parameters.
Therefore it is a good alternative, if not a substi-
tute. The area of Natural Language Processing is very
active area of research, tree-structured architectures
proved to be very powerful for Natural Language Pro-
cessing tasks, mostly because of their capability of
handling negations. Many potential projects can be
developed around Tree-Based GRUs, namely a Child-
Sum approach,or the of use unique reset and update
gate for each child, or even try different GRU archi-
tectures (Dosovitskiy and Brox, 2015).
REFERENCES
Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural ma-
chine translation by jointly learning to align and trans-
late. CoRR, abs/1409.0473.
Berant, J. and Liang, P. (2014). Semantic parsing via para-
phrasing. In Association for Computational Linguis-
tics (ACL).
Bowman, S. R., Potts, C., and Manning, C. D. (2014). Re-
cursive neural networks for learning logical seman-
tics. CoRR, abs/1406.1827.
Charniak, E. and Johnson, M. (2005). Coarse-to-fine n-best
parsing and maxent discriminative reranking. In Pro-
ceedings of the 43rd Annual Meeting on Association
for Computational Linguistics, ACL ’05, pages 173–
180, Stroudsburg, PA, USA. Association for Compu-
tational Linguistics.
Chen, D. and Manning, C. (2014). A Fast and Accurate De-
pendency Parser using Neural Networks. Proceedings
of the 2014 Conference on Empirical Methods in Na-
tural Language Processing (EMNLP), (i):740–750.
Chung, J., G
¨
ulc¸ehre, C¸ ., Cho, K., and Bengio, Y. (2014).
Empirical evaluation of gated recurrent neural net-
works on sequence modeling. CoRR, abs/1412.3555.
Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y., Ahmed, F.,
and Deng, L. (2016). End-to-end reinforcement lear-
ning of dialogue agents for information access. CoRR,
abs/1609.00777.
Dosovitskiy, A. and Brox, T. (2015). Inverting convoluti-
onal networks with convolutional networks. CoRR,
abs/1506.02753.
Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive
Subgradient Methods for Online Learning and Sto-
chastic Optimization. Journal of Machine Learning
Research, 12:2121–2159.
Goller, C. and Kuchler, A. (1996). Learning task-
dependent distributed representations by backpropa-
gation through structure. In Neural Networks, 1996.,
IEEE International Conference on, volume 1, pages
347–352 vol.1.
Graves, A., Jaitly, N., and rahman Mohamed, A. (2013).
Hybrid speech recognition with deep bidirectional
lstm. In In IEEE Workshop on Automatic Speech Re-
cognition and Understanding (ASRU.
G
¨
ulc¸ehre, C¸ ., Cho, K., Pascanu, R., and Bengio, Y.
(2013). Learned-norm pooling for deep neural net-
works. CoRR, abs/1311.1780.
Hochreiter, S. (1998). The vanishing gradient problem du-
ring learning recurrent neural nets and problem soluti-
ons. Int. J. Uncertain. Fuzziness Knowl.-Based Syst.,
6(2):107–116.
Sentiment Classification using N-ary Tree-Structured Gated Recurrent Unit Networks
153