loading
Documents

Research.Publish.Connect.

Paper

Authors: Chuanyi Li 1 ; Jidong Ge 2 ; Victor Chang 3 and Bin Luo 2

Affiliations: 1 State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing, China, State key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China ; 2 State Key Laboratory for Novel Software Technology, Software Institute, Nanjing University, Nanjing, China ; 3 School of Computing and Digital Technologies, Teesside University, Middlesbrough, U.K.

ISBN: 978-989-758-426-8

Keyword(s): Information Retrieval, Project Similarity, Big Text Data, Learn to Rank.

Abstract: The rise of open source community has greatly promoted the development of software resource reuse in all phases of software process, such as requirements engineering, designing, coding, and testing. However, how to efficiently and accurately locate reusable resources on large-scale open source website remains to be solved. Presently, most open source websites provide text-matching-based searching mechanism while ignoring the semantic of project description. For enabling requirements engineers to find software that are similar to the one to be developed quickly at the very beginning of the project, we propose a searching framework based on constructing semantic embedding for software project with machine learning technique. In the proposed approach, both Type Distribution and Document Vector learnt through different neural network language models are used as project representations. Besides, we integrate searching results of different representations with a Ranking model. For evaluatin g our approach, we compare search results of different searching strategies manually using an evaluating system. Experimental results on a data set consisting of 24,896 projects show that the proposed searching framework, i.e., combining results derived from Inverted Index, Type Distribution and Document Vector, significantly superior to the text-matching-based one. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 34.239.149.34

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Li, C.; Ge, J.; Chang, V. and Luo, B. (2020). Retrieving Similar Software from Large-scale Open-source Repository by Constructing Representation of Project Description.In Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-426-8, pages 296-303. DOI: 10.5220/0009400002960303

@conference{iotbds20,
author={Chuanyi Li. and Jidong Ge. and Victor Chang. and Bin Luo.},
title={Retrieving Similar Software from Large-scale Open-source Repository by Constructing Representation of Project Description},
booktitle={Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2020},
pages={296-303},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009400002960303},
isbn={978-989-758-426-8},
}

TY - CONF

JO - Proceedings of the 5th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - Retrieving Similar Software from Large-scale Open-source Repository by Constructing Representation of Project Description
SN - 978-989-758-426-8
AU - Li, C.
AU - Ge, J.
AU - Chang, V.
AU - Luo, B.
PY - 2020
SP - 296
EP - 303
DO - 10.5220/0009400002960303

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.