Recall may be more important than precision
depending on the problem being discussed (Hand
and Christen, 2018). Someone may accept a slightly
higher rate of false positives to get more true
positives, if the trade-off is considered interesting.
The LSTM model without Word2vec has a better
recall rate but lower precision. Combining
Word2vec and LSTM brings a great advantage to
the overall performance.
In future works, others neural networks and deep
learning architectures can be evaluated in this
context. There are results that show that the
combination of convolutional neural networks and
LSTM achieve good results in the task of text (Zhou
et al, 2015). In addition, more in-depth research
should be done to find ways to reduce the amount of
false negatives produced by the LSTM model in this
dataset. Overall, we found that the LSTM model and
word2vec can outperform other models.
REFERENCES
Cunningham, W. (1992). The WyCash portfolio
management system. ACM SIGPLAN OOPS
Messenger, 4(2), 29-30.
Seaman, C., & Guo, Y. (2011). Measuring and monitoring
technical debt. In Advances in Computers (Vol. 82,
pp. 25-46). Elsevier.
Potdar, A., & Shihab, E. (2014, September). An
exploratory study on self-admitted technical debt. In
2014 IEEE International Conference on Software
Maintenance and Evolution (pp. 91-100). IEEE.
Guo, Y., & Seaman, C. (2011, May). A portfolio approach
to technical debt management. In Proceedings of the
2nd Workshop on Managing Technical Debt (pp. 31-
34).
Codabux, Z., & Williams, B. (2013, May). Managing
technical debt: An industrial case study. In 2013 4th
International Workshop on Managing Technical Debt
(MTD) (pp. 8-15). IEEE.
Nord, R. L., Ozkaya, I., Kruchten, P., & Gonzalez-Rojas,
M. (2012, August). In search of a metric for managing
architectural technical debt. In 2012 Joint Working
IEEE/IFIP Conference on Software Architecture and
European Conference on Software Architecture (pp.
91-100). IEEE.
Marinescu, R. (2012). Assessing technical debt by
identifying design flaws in software systems. IBM
Journal of Research and Development, 56(5), 9-1.
Alves, N. S., Ribeiro, L. F., Caires, V., Mendes, T. S., &
Spínola, R. O. (2014, September). Towards an
ontology of terms on technical debt. In 2014 Sixth
International Workshop on Managing Technical Debt
(pp. 1-7). IEEE.
Maldonado, E. D. S., & Shihab, E. (2015, October).
Detecting and quantifying different types of self-
admitted technical debt. In 2015 IEEE 7th
International Workshop on Managing Technical Debt
(MTD) (pp. 9-15). IEEE.
da Silva Maldonado, E., Shihab, E., & Tsantalis, N.
(2017). Using natural language processing to
automatically detect self-admitted technical debt.
IEEE Transactions on Software Engineering, 43(11),
1044-1062.
Wattanakriengkrai, S., Maipradit, R., Hata, H.,
Choetkiertikul, M., Sunetnanta, T., & Matsumoto, K.
(2018, December). Identifying design and requirement
self-admitted technical debt using n-gram idf. In 2018
9th International Workshop on Empirical Software
Engineering in Practice (IWESEP) (pp. 7-12). IEEE.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep
learning. nature, 521(7553), 436-444.
Zhou, C., Sun, C., Liu, Z., & Lau, F. (2015). A C-LSTM
neural network for text classification. arXiv preprint
arXiv:1511.08630.
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B.
(2016). Text classification improved by integrating
bidirectional LSTM with two-dimensional max
pooling. arXiv preprint arXiv:1611.06639.
Hand, D., & Christen, P. (2018). A note on using the F-
measure for evaluating record linkage algorithms.
Statistics and Computing, 28(3), 539-547.
Huang, Q., Shihab, E., Xia, X., Lo, D., & Li, S. (2018).
Identifying self-admitted technical debt in open source
projects using text mining. Empirical Software
Engineering, 23(1), 418-451.
Liu, Z., Huang, Q., Xia, X., Shihab, E., Lo, D., & Li, S.
(2018, May). Satd detector: A text-mining-based self-
admitted technical debt detection tool. In Proceedings
of the 40th International Conference on Software
Engineering: Companion Proceeedings (pp. 9-12).
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018).
Recent trends in deep learning based natural language
processing. ieee Computational intelligenCe magazine,
13(3), 55-75.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C.,
Regnell, B., & Wesslén, A. (2012). Experimentation in
software engineering. Springer Science & Business
Media.
Basili, V. R., & Weiss, D. M. (1984). A methodology for
collecting valid software engineering data. IEEE
Transactions on software engineering, (6), 728-738.
François, C. et al. (2015). Keras, https://keras.io
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Vanderplas, J. (2011). Scikit-
learn: Machine learning in Python. Journal of machine
learning research, 12(Oct), 2825-2830.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8), 1735-1780.
Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. A.
(2015). U.S. Patent No. 9,037,464. Washington, DC:
U.S. Patent and Trademark Office.
Rehurek, R., & Sojka, P. (2010). Software framework for
topic modelling with large corpora. In In Proceedings
of the LREC 2010 Workshop on New Challenges for
NLP Frameworks.