Table 5: NLP performance after merging the information
from the HeaderParser module. All results are the average
of 5 runs with different seeds.
Task Dataset Train Dataset Eval Accuracy
I/O io cv io 93.2 %
I/O io cv io balanced 90.1 %
Performance after Merging. The I/O classification
task is special in that it has two knowledge sources
currently: the ASG, and the NLP analysis of the pa-
rameter comment. Thus, the knowledge from the
ASG can be merged with the result from the NLP
analysis. In the case of this experiment, the extra in-
formation is used to minimize the reliance on NLP.
The current implementation of the Merger module
uses following weights: (1) HeaderParser: 4 (2) Doc-
Parser: 2 (3) NLPDoc: 1 This means that NLP is only
used on parameters that could not be classified with
certainty by the HeaderParser module i.e., parame-
ters that are not qualified with the const qualifier and
are not a fundamental type. As shown in Table 5, this
method improves the final accuracy.
5 CONCLUSION
In this paper, we presented an approach to extract
abstract semantic models of C++ libraries automati-
cally, which we evaluated on the perception domain
with two popular computer vision libraries. We also
outlined how the extracted models of the percep-
tion libraries can be used within a planner to fur-
ther automate the creation of perception pipelines. It,
therefore, lowers the engineering barriers to develop
robotics and automation solutions that can adapt to
new tasks automatically via planning. Our approach
is based on the combination of static source code anal-
ysis and NLP, which is used to interpret the corre-
sponding documentation. Because we did not make
any domain specific assumptions, we expect our ap-
proach to perform similarly on other domains.
Our evaluation shows the benefits of additional se-
mantic information on the planning performance. The
required semantic information can be extracted with
a heuristics-based parser in case a machine-readable
documentation is provided (i.e., HALCON). More
generally, however, it is necessary to use NLP to ex-
tract semantic knowledge. Therefore, we fine-tuned
a state-of-the-art LM on two classification tasks to
extract semantic information. Our results show that
this approach works well in the training domain. Un-
fortunately, applying the trained model to another li-
brary showed mixed results. While it worked well for
the input/output classification task, the semantic type
classification task showed the limits of the used LM.
Future work could extend the here described static
analysis with dynamic program analysis to validate
the extracted labels. Additionally, an interesting re-
search direction would be to take more information
into account, like the other parameters of the func-
tion or the functions’ description. Finally, a more di-
verse training data set could improve the transfer per-
formance.
REFERENCES
Beltagy, I., Lo, K., and Cohan, A. (2019). SciBERT: A pre-
trained language model for scientific text. 2019 Con-
ference on Empirical Methods in Natural Language
Processing (EMNLP), arXiv:1903.10676.
Dietrich, V., Kast, B., Fiegert, M., Albrecht, S., and Beetz,
M. (2019). Automatic configuration of the structure
and parameterization of perception pipelines. In 2019
19th International Conference on Advanced Robotics
(ICAR), pages 312–319.
Dietrich, V., Kast, B., Schmitt, P., Albrecht, S., Fiegert, M.,
Feiten, W., and Beetz, M. (2018). Configuration of
perception systems via planning over factor graphs. In
2018 IEEE International Conference on Robotics and
Automation (ICRA), pages 6168–6174.
Fernique, P. and Pradal, C. (2017). AutoWIG: Auto-
matic generation of Python bindings for C++ libraries.
CoRR, abs/1705.11000.
Kast, B., Albrecht, S., Feiten, W., and Zhang, J. (2019).
Bridging the gap between semantics and control for
industry 4.0 and autonomous production. In 2019 15th
International Conference on Automation Science and
Engineering (CASE), pages 780–787.
Kast, B., Dietrich, V., Albrecht, S., Feiten, W., and Zhang, J.
(2018). A hierarchical planner based on set-theoretic
models: Towards automating the automation for au-
tonomous systems. In 16th International Conference
on Informatics in Control, Automation and Robotics
(ICINCO).
Lavrijsen, W. T. L. P. and Dutta, A. (2016). High-
performance Python-C++ bindings with pypy and
cling. In Proceedings of the 6th Workshop on
Python for High-Performance and Scientific Comput-
ing, pages 27–35.
LeClair, A., Eberhart, Z., and McMillan, C. (2018). Adapt-
ing neural text classification for improved software
categorization. In 2018 IEEE International Confer-
ence on Software Maintenance and Evolution (IC-
SME), pages 461–472.
Li, H., Li, S., Sun, J., Xing, Z., Peng, X., Liu, M., and
Zhao, X. (2018). Improving API caveats accessibil-
ity by mining API caveats knowledge graph. In 2018
IEEE International Conference on Software Mainte-
nance and Evolution (ICSME), pages 183–193.
Rabiner, L. and Juang, B. (1986). An introduction to hidden
markov models. IEEE ASSP Magazine, pages 4–16.
ICINCO 2020 - 17th International Conference on Informatics in Control, Automation and Robotics
436