A Web Scraping Algorithm to Improve the Computation of the Maximum Common Subgraph

Andrea Calabrese, Lorenzo Cardone, Salvatore Licata, Marco Porro, Stefano Quer

2023

Abstract

The Maximum Common Subgraph, a generalization of subgraph isomorphism, is a well-known problem in the computer science area. Albeit being NP-complete, finding Maximum Common Subgraphs has countless practical applications, and researchers are continuously exploring scalable heuristic approaches. One of the state-of-the-art algorithms to solve this problem is a recursive branch-and-bound procedure called McSplit. The algorithm exploits an intelligent invariant to pair vertices with the same label and adopts an effective bound prediction to prune the search space. However, McSplit original version uses a simple heuristic to pair vertices and to build larger subgraphs. As a consequence, a few researchers have already focused on improving the sorting heuristics to converge faster. This paper concentrate on these aspects and presents a collection of heuristics to improve McSplit and its state-of-the-art variants. We present a sorting strategy based on the famous PageRank algorithm, and then we mix it with other approaches. We compare all the heuristics with the original McSplit procedure, and against each other. In particular, we distinguish the heuristics based on the node degree and novel ones based on the PageRank algorithm. Our experimental section shows that PageRank can improve both McSplit and its variants significantly regarding convergence speed and solution size.

Download


Paper Citation


in Harvard Style

Calabrese A., Cardone L., Licata S., Porro M. and Quer S. (2023). A Web Scraping Algorithm to Improve the Computation of the Maximum Common Subgraph. In Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT; ISBN 978-989-758-665-1, SciTePress, pages 197-206. DOI: 10.5220/0012130800003538


in Bibtex Style

@conference{icsoft23,
author={Andrea Calabrese and Lorenzo Cardone and Salvatore Licata and Marco Porro and Stefano Quer},
title={A Web Scraping Algorithm to Improve the Computation of the Maximum Common Subgraph},
booktitle={Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT},
year={2023},
pages={197-206},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012130800003538},
isbn={978-989-758-665-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT
TI - A Web Scraping Algorithm to Improve the Computation of the Maximum Common Subgraph
SN - 978-989-758-665-1
AU - Calabrese A.
AU - Cardone L.
AU - Licata S.
AU - Porro M.
AU - Quer S.
PY - 2023
SP - 197
EP - 206
DO - 10.5220/0012130800003538
PB - SciTePress