loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Dániel Horváth 1 and László Vidács 1 ; 2

Affiliations: 1 Department of Software Engineering, University of Szeged, Dugonics tér 13., Szeged, Hungary ; 2 MTA-SZTE Research Group on Artificial Intelligence, Hungary

Keyword(s): Automated Program Repair, Data Mining, Bug-Fixing.

Abstract: Automated program repair (APR) gained more and more attention over the years, both from an academic, and an industrial point of view. The overall goal of APR is to reduce the cost of development and maintenance, by automagically finding and fixing common bugs, typos, or errors in code. A successful, and highly researched approach is to use deep-learning (DL) techniques to accomplish this task. DL methods are known to be very data-hungry, but despite this, data that is readily available online is hard to find, which poses a challenge to the development of such solutions. In this paper, we address this issue by providing a new dataset consisting of 371,483 code examples on bug-fixing, while also introducing a method that other researchers could use as a feature in their mining software. We extracted code from 5,273 different repositories and 250,090 different commits. Our work contributes to related research by providing a publicly accessible dataset, which DL models could be trained, or fine-tuned on, and a method that easily integrates with almost any code mining tool, as a language-independent feature that gives more granular choices when extracting code parts from a specific bugfix commit. The dataset also includes the summary, and message of the commits in the training data which consists of multiple programming languages, including C, C++, Java, JavaScript, and Python. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.221.211.66

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Horváth, D. and Vidács, L. (2024). Yet Another Miner Utility Unveiling a Dataset: CodeGrain. In Proceedings of the 13th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-707-8; ISSN 2184-285X, SciTePress, pages 338-345. DOI: 10.5220/0012760100003756

@conference{data24,
author={Dániel Horváth. and László Vidács.},
title={Yet Another Miner Utility Unveiling a Dataset: CodeGrain},
booktitle={Proceedings of the 13th International Conference on Data Science, Technology and Applications - DATA},
year={2024},
pages={338-345},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012760100003756},
isbn={978-989-758-707-8},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 13th International Conference on Data Science, Technology and Applications - DATA
TI - Yet Another Miner Utility Unveiling a Dataset: CodeGrain
SN - 978-989-758-707-8
IS - 2184-285X
AU - Horváth, D.
AU - Vidács, L.
PY - 2024
SP - 338
EP - 345
DO - 10.5220/0012760100003756
PB - SciTePress