2) We revealed that the initialization process for
modules, which might not be essential for their
source code, can potentially dominate the DA
data.
3) Preprocessing of the system-call sequence to re-
move unnecessary parts should be implemented.
For example, we should exclude failed invo-
cations or compress the sequence of the same
system-call invocation multiple times. Further-
more, obtaining information about the arguments
and return values of the system-calls provides use-
ful information for implementing preprocessing.
4) The system-call subsequences for one line in the
source code contain many less important system-
calls. This observation should be reflected in de-
veloping the preprocessing.
In future work, we will discuss the use of DA data
to select better codes, assuming that there are multi-
ple proper translation candidates; that is, TransCoder
generates multiple 100% computer accuracy func-
tions. For instance, the smaller the time and space
complexities, the better the candidate. Our current
DA system cannot compute or predict the approxi-
mate empirical computational efficiency (the time re-
quired to execute the code) or memory consumption
(the amount of memory allocated by the process).
Hence, by obtaining the arguments of the system-
calls, it may be feasible to improve our system to cal-
culate the approximate space and time complexities.
Code 2 shows the generated and reference source
codes for test ADD 1 TO A GIVEN NUMBER (Roziere
et al., 2020). In the reference source code, the con-
dition part for the while statement performs a bit-
wise AND operation between x and m. However, the
generated source code performs the same operation,
an integer typecasting operation, and verifies whether
the result is one or greater. Therefore, the gener-
ated source code performs additional operations and
is more time consuming in terms of empirical compu-
tational efficiency, which should be captured by the
improved DA approach.
Code 2: Differences in conditional statements.
1 # generated translation
2 while int( x & m ) >= 1:
3
4 # reference source code
5 while x & m:
Another issue is the choice between simple and
advanced modules during library selection. For
example, when we use mathematical operations in
Python, we can select math and NumPy modules.
Code 3 shows the reference source code (“f gold,” us-
ing the math module) and the generated source code
(“f filled,” using the NumPy module) for a test called
PROGRAM FOR SURFACE AREA OF OCTAHEDRON. The
math module has minimal implementation, whereas
the NumPy module implements advanced parallel
processing. Therefore, although NumPy is effective
for large datasets, the parallel processing overhead
is relatively large for small datasets. DA data can
address these issues as they influence the manner in
which the interpreter or compiler triggers system-
calls. Our DA system can capture a sequence of
system-calls while maintaining their temporal order,
regardless of the PID or threads. Therefore, we must
use the DA data gathered by our system for better
code selection.
Code 3: Differences Between math and NumPy.
1 def f_g old ( side ):
2 return 2 * ( math . sqrt (3))
3 * ( side * sid e )
4
5 def f_fi l led ( s ide ):
6 return 2 * ( np . sqr t ( 3))
7 * ( side * sid e )
REFERENCES
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
BLEU: a method for automatic evaluation of machine
translation. In Proceedings of the 40th annual meeting
on association for computational linguistics, pages
311–318. Association for Computational Linguistics.
Puri, R., Kung, D. S., Janssen, G., Zhang, W., Domeniconi,
G., Zolotov, V., Dolby, J., Chen, J., Choudhury, M.,
Decker, L., Thost, V., Buratti, L., Pujar, S., Ramji, S.,
Finkler, U., Malaika, S., and Reiss, F. (2021). Co-
deNet: A large-scale ai for code dataset for learning a
diversity of coding tasks.
Roziere, B., Lachaux, M.-A., Chanussot, L., and Lample,
G. (2020). Unsupervised translation of programming
languages. In Larochelle, H., Ranzato, M., Hadsell,
R., Balcan, M., and Lin, H., editors, Advances in Neu-
ral Information Processing Systems, volume 33, pages
20601–20611. Curran Associates, Inc.
Szafraniec, M., Roziere, B., Leather, H. J., Labatut, P.,
Charton, F., and Synnaeve, G. (2023). Code transla-
tion with compiler representations. In The Eleventh
International Conference on Learning Representa-
tions.
Yoneda, N. (2023). langMorphDA. https://github.com/nar
u-99/langMorphDA.git.
System-Call-Level Dynamic Analysis for Code Translation Candidate Selection
583