Subject Classification of Software Repository

Abdelhalim Dahou, Brigitte Mathiak

2023

Abstract

Software categorization involves organizing software into groups based on their behavior or domain. Traditionally, categorization has been crucial for software maintenance, aiding programmers in locating programs, identifying features, and finding similar ones within extensive code repositories. Manual categorization is expensive, tedious, and labor-intensive, leading to the growing importance of automatic categorization approaches. However, existing datasets primarily focus on technical categorization for the most common programming language, leaving a gap in other areas. This paper addresses the research problem of classifying software repositories that contain R code. The objective is to develop a classification model capable of accurately and efficiently categorizing these repositories into predefined classes with less data. The contribution of this research is twofold. Firstly, we propose a model that enables the categorization of software repositories focusing on R programming, even with a limited amount of training data. Secondly, we conduct a comprehensive empirical evaluation to assess the impact of repository features and data augmentation on automatic repository categorization. This research endeavors to advance the field of software categorization and facilitate better utilization of software repositories in the context of diverse domains research.

Download


Paper Citation


in Harvard Style

Dahou A. and Mathiak B. (2023). Subject Classification of Software Repository. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN 978-989-758-671-2, SciTePress, pages 30-38. DOI: 10.5220/0012159600003598


in Bibtex Style

@conference{kdir23,
author={Abdelhalim Dahou and Brigitte Mathiak},
title={Subject Classification of Software Repository},
booktitle={Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2023},
pages={30-38},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012159600003598},
isbn={978-989-758-671-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - Subject Classification of Software Repository
SN - 978-989-758-671-2
AU - Dahou A.
AU - Mathiak B.
PY - 2023
SP - 30
EP - 38
DO - 10.5220/0012159600003598
PB - SciTePress