Similarity of Software Libraries: A Tag-based Classification Approach

Maximilian Auch; Maximilian Balluff; Peter Mandl; Christian Wolff

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Similarity of Software Libraries: A Tag-based Classification Approach

Topics: Data Mining; Data Science; Deep Learning; Predictive Modeling

In Proceedings of the 10th International Conference on Data Science, Technology and Applications DATA - Volume 1, 17-28, 2021

Authors: Maximilian Auch ¹ ; Maximilian Balluff ¹ ; Peter Mandl ¹ and Christian Wolff ²

Affiliations: ¹ University of Applied Sciences Munich, Lothstraße 34, 80335 Munich, Germany ; ² University of Regensburg, Universitätsstraße 31, 93053 Regensburg, Germany

Keyword(s): Software Libraries, Classification, Tags, Similarity, Naíve Bayes, Logistic Regression, Random Forest, Neural Network.

Abstract: The number of software libraries has increased over time, so grouping them into classes according to their functionality simplifies repository management and analyses. With the large number of software libraries, the task of categorization requires automation. Using a crawled dataset based on Java software libraries from Apache Maven repositories as well as tags and categories from the indexing platform MvnRepository.com, we show how the data in this set is structured and point out an imbalance of classes. We introduce a class mapping relevant for the procedure, which maps the libraries from very specific, technical classes into more generic classes. Using this mapping, we investigate supervised machine learning techniques that classify software libraries from the dataset based on their available tags. We show that a tag-based approach to classify libraries with an accuracy of 97.46% can be achieved by using neural networks. Overall, we found techniques such as neural networks and na íve Bayes more suitable in this use case than a logistic regression or a random forest. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.110

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Auch, M., Balluff, M., Mandl, P., Wolff and C. (2021). Similarity of Software Libraries: A Tag-based Classification Approach. In Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-521-0; ISSN 2184-285X, SciTePress, pages 17-28. DOI: 10.5220/0010521600170028

@conference{data21,
author={Maximilian Auch and Maximilian Balluff and Peter Mandl and Christian Wolff},
title={Similarity of Software Libraries: A Tag-based Classification Approach},
booktitle={Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA},
year={2021},
pages={17-28},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010521600170028},
isbn={978-989-758-521-0},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA
TI - Similarity of Software Libraries: A Tag-based Classification Approach
SN - 978-989-758-521-0
IS - 2184-285X
AU - Auch, M.
AU - Balluff, M.
AU - Mandl, P.
AU - Wolff, C.
PY - 2021
SP - 17
EP - 28
DO - 10.5220/0010521600170028
PB - SciTePress