Connection Is all You Need! Mining and Linking Disparate Data Sources

for Collaboration Network Analysis

Benjamin Vehmeyer

and Michaela Geierhos

Research Institute CODE, University of the Bundeswehr Munich, Neubiberg, Germany

{benjamin.vehmeyer, michaela.geierhos}@unibw.de

Keywords:

Institutional Networks, Knowledge Discovery, Community Detection.

Abstract:

Business networks are a key driver of innovation and economic growth. However, a major challenge is how

to discover these network relationships in heterogeneous data sources. In this paper, we present an IT arti-

fact that uniﬁes different data types, including patent, research funding, and publication information, into a

uniﬁed graph database. This allows a comprehensive analysis of cooperation patterns. Community detection

algorithms are used to identify research clusters, while centrality measures reveal key players. Visualizations

facilitate the interpretation of research results and provide a user-friendly way to display data about research

communities and institutional behavior. A prototype visualization of these results provides a proof of concept

for the practicality of the method. The proposed design provides a robust framework for understanding the

dynamics of collaborative networks.

1 INTRODUCTION

Collaboration networks between companies, research

institutions, and other entities are crucial drivers of

innovation, knowledge sharing, and economic growth

(Ozcan and Islam, 2017). While analyzing these

networks provides valuable insights into cooperation

patterns and helps identify key players (Long et al.,

2014), institutions face signiﬁcant challenges in ﬁnd-

ing research partners.

Essential information is scattered across heteroge-

neous data sources, such as patent databases, project

funding records, and publication repositories (Wang,

2017). Each source provides only a partial view of

the capabilities of potential partners, making it dif-

ﬁcult to gain a comprehensive understanding of the

collaborative landscape (Angles et al., 2017). Impor-

tant relationships between institutions often go unde-

tected because they are only visible when multiple

data sources are analyzed together. Research insti-

tutions seeking to establish collaborations face chal-

lenges with these hidden networks (Schwartz et al.,

2012). Traditional methods for collecting and analyz-

ing collaboration data are labor-intensive and inefﬁ-

cient (Hogan et al., 2021). While current tools such

https://orcid.org/0009-0009-7974-8479

https://orcid.org/0000-0002-8180-5606

as Patsnap

and istari.ai

automate certain processes

by using patent or web data (Huang et al., 2022), a

comprehensive system for analyzing complete collab-

oration networks based on disparate data sources has

yet to be developed.

Therefore, we propose a system design for analyz-

ing complete collaboration networks by linking three

main data sources (i.e., patents, projects, and publica-

tions) characterizing an R&D-oriented entity to rep-

resent collaborations at the institutional level.

Figure 1: Representation of an institution by its individual

patent, project, and publication portfolio.

The information needed to describe the institu-

tions, as shown in Figure 1, is derived from three pri-

mary sources:

https://www.patsnap.com/ (Last accessed 2024-10-29)

https://www.istari.ai/ (Last accessed 2024-10-29)

210

Vehmeyer, B. and Geierhos, M.

Connection Is all You Need! Mining and Linking Disparate Data Sources for Collaboration Network Analysis.

DOI: 10.5220/0013255200003929

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 27th Inter national Conference on Enterprise Information Systems (ICEIS 2025) - Volume 1, pages 210-217

ISBN: 978-989-758-749-8; ISSN: 2184-4992

PATSTAT for Patents. The European Patent Of-

ﬁce’s global patent database containing biblio-

graphic data on patent applications and grants. It

allows the identiﬁcation of technological collabo-

rations through joint patent applications and cita-

tions (European Patent Ofﬁce, 2024).

orderkatalog for Projects. The German govern-

ment’s database for publicly funded research,

providing insight into institutional partnerships

through joint projects and research initiatives

(Bundesministerium f

ur Bildung und Forschung,

2024).

Scopus for Publications. Elsevier’s bibliometric

database indexing scholarly publications, au-

thors, and institutional afﬁliations, revealing

academic partnerships through co-authorship and

cross-institutional publications (Elsevier, 2024).

Through this work, we provide a practical proof

of concept (PoC) for a system design to identify re-

search collaborations. Our initial focus is on identi-

fying potential partnerships with German institutions.

For this reason, we have included the funding cata-

log (F

orderkatalog) and conducted a corresponding

showcase, but the system is open to integrate any

other resource covering project portfolios.

The remaining sections are organized as follows:

Section 2 reviews related work, Section 3 presents our

methodology and system design, Section 4 demon-

strates practical value through a showcase, and Sec-

tion 5 closes with achievements and future potential.

2 RELATED WORK

In this section, we review related work on collabora-

tion network architecture, community detection, and

data linkage strategies.

2.1 Collaboration Network Architecture

Collaboration networks have long been a subject of

interest in various ﬁelds, including scientiﬁc research,

business, and social sciences. These networks rep-

resent the interconnected relationships between en-

tities, such as companies, research institutions, or

individuals, and can provide valuable insights into

the ﬂow of knowledge, resources, and innovation

(Long et al., 2014). Choosing the right data archi-

tecture is crucial for effectively representing and ana-

lyzing these complex collaboration networks. Tradi-

tional databases are often constrained by predeﬁned

schemas, making it difﬁcult to ﬂexibly integrate het-

erogeneous data sources (Ahuja et al., 2012). Ex-

isting approaches to collaboration network analysis

often rely on traditional databases. However, these

systems struggle with the complexity and intercon-

nectedness of data from multiple sources. In con-

trast, graph databases model data as nodes (entities)

and relationships (edges), which are well suited to

capturing the intricate connections within collabora-

tion networks (Angles et al., 2017). By using graph

databases, researchers can integrate and represent col-

laboration data from multiple sources within a uni-

ﬁed graph structure, enabling the exploration of cross-

domain relationships and patterns. Recent studies

have explored the use of patent data to analyze co-

inventor collaboration networks (Huang et al., 2022),

highlighting the potential of integrating different data

sources for a more comprehensive analysis. Despite

the advantages of graph databases for representing

and analyzing highly connected data, there has been

limited research on using this technology to integrate

and analyze heterogeneous collaboration data from

multiple sources. Most existing studies focus on spe-

ciﬁc domains or data sources and fail to provide a

comprehensive and scalable solution for integrating

and exploring collaboration networks across multiple

data sources.

2.2 Community Detection in Networks

In the context of collaboration network analysis, the

identiﬁcation of communities, or densely connected

groups of nodes, is a key aspect. These communities

may represent research clusters, industry sectors, or

other meaningful groupings that provide researchers

with valuable insights into the structure and dynamics

of collaboration networks (Javed et al., 2018). Com-

munity detection algorithms are widely used in net-

work analysis to discover such communities. For this

reason, it is necessary that the structure of the input

data is compatible with the algorithms used for the

data analysis. Therefore, graphs are the optimal rep-

resentation for this task. Commonly used algorithms

include modularity-based methods, label propaga-

tion, and graph partitioning approaches. Modularity-

based algorithms, such as the Louvain method or the

Girvan-Newman algorithm, aim to maximize a mea-

sure called modularity, which quantiﬁes the density

of edges within communities relative to the density

between communities (Kumar and Hanot, 2021; Que

et al., 2015). These algorithms iteratively optimize

the modularity score by merging or splitting commu-

nities until an optimal partition is achieved. Label

propagation algorithms, such as the one proposed by

Raghavan et al. (2007), assign unique labels to nodes

and iteratively update these labels based on the labels

Connection Is all You Need! Mining and Linking Disparate Data Sources for Collaboration Network Analysis

211

of neighboring nodes. Over time, densely connected

groups of nodes converge on the same label, form-

ing communities. Graph partitioning algorithms di-

vide the network into disjoint partitions or commu-

nities based on various criteria, such as minimizing

edge cuts or maximizing intra-community connectiv-

ity (Kumar and Hanot, 2021). While these commu-

nity detection algorithms have been widely applied

in various domains, their effectiveness in the context

of heterogeneous collaboration network analysis re-

mains unexplored.

2.3 Data Linking and Enrichment

Data linking identiﬁes connections between related

entities across different data sources, while data en-

richment combines complementary data from mul-

tiple sources to enhance information about entities

(Benjelloun et al., 2009). This is essential for col-

laboration network analysis because relevant data is

fragmented across sources such as patent databases,

project records, and publication data – each of which

provides only partial entity and relationship informa-

tion. A more comprehensive view of the collaboration

landscape is obtained by linking and enriching data

from disparate sources. Several linkage approaches

have been proposed for data enrichment, including:

Record Linkage and Entity Resolution. These tech-

niques identify and match records that refer to the

same real-world entity across different data sources

(Benjelloun et al., 2009). This involves comparing

attribute values, such as names, addresses, and iden-

tiﬁers, and using similarity measures and machine

learning models to determine potential matches.

Ontology Matching and Semantic Integration.

These methods align and integrate data from dis-

parate sources using semantic technologies (Shvaiko

and Euzenat, 2013). By mapping concepts and rela-

tionships across ontologies, entities and their proper-

ties are linked and enriched with additional informa-

tion.

Knowledge Graph Construction. Knowledge

graphs provide a structured representation of entities,

their attributes, and relationships (Hogan et al., 2021;

Noy et al., 2019). Techniques like knowledge graph

embedding and link prediction can be used to discover

new connections and enrich the graph with additional

information. While these approaches have been ex-

plored in various domains, their application to het-

erogeneous collaboration network analysis remains

largely unexplored. Challenges include the diversity

of data sources, the complexity of collaboration re-

lationships, and the need for efﬁcient, scalable solu-

tions. There are opportunities to integrate domain-

speciﬁc knowledge and heuristics, such as common

collaboration patterns, organizational structures, and

research areas, to improve the accuracy and relevance

of linking and enrichment for this domain.

3 RESEARCH DESIGN

This section describes the research methodology and

the design of an IT artifact for a system that integrates

and analyzes data from patents, research funding, and

scientiﬁc publications.

3.1 Research Methodology

The research methodology follows the principles of

design science research according to Hevner et al.

(2004) and delivers a collaboration network analy-

sis and visualization tool as an IT artifact. Further-

more, we provide a methodological contribution to

improve the exploration and analysis of research net-

works in Germany. The IT artifact addresses the prob-

lem of fragmented and heterogeneous data sources

that complicate comprehensive collaboration analy-

sis. We plan to evaluate our tool on its ability to unify

disparate data sources and reveal complex network

structures. The development is currently a work-

in-progress, with evaluation planned as future work.

Since design science requires methodical approaches

to both design and evaluation, we see our contribu-

tion as a response to the existing state of research,

proceeding iteratively until we ﬁnd an appropriate so-

lution. The primary output is a proof-of-concept im-

plementation that demonstrates the feasibility of our

proposed solution, serving as a validation of our de-

sign approach while providing practical insights into

collaboration network analysis.

3.2 IT Artifact

The design of our IT artifact addresses the challenge

of identifying and analyzing research collaboration

networks by providing a comprehensive tool that in-

tegrates data from multiple sources. The core con-

cept follows a straightforward approach: Users enter

the name of an institution, and the system reveals not

only its research activities, but also its position within

various collaboration networks. In this way, we trans-

form fragmented data on patents, publicly funded

projects, and scientiﬁc publications into actionable in-

sights about research communities and potential col-

laboration opportunities. Conceptually, the IT artifact

serves three main purposes. First, it provides a uniﬁed

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

212

Figure 2: System architecture showing the main components and data ﬂow.

view of an institution’s research activities by com-

bining data from patents, scientiﬁc publications, and

publicly funded projects. Second, it reveals collab-

oration networks by analyzing connections between

institutions across these different data sources, iden-

tifying research communities and industry clusters.

Third, it enables interactive exploration of these net-

works, allowing users to discover potential research

partners based on shared interests, technological ex-

pertise, and existing cooperation patterns. The sys-

tem architecture consists of three main components,

as shown in Figure 2: The uniﬁed data integra-

tion component forms the foundation by integrating

heterogeneous data sources from scholarly databases

(e.g., Scopus), patent databases (e.g., PATSTAT), and

project funding databases (e.g., F

orderkatalog) into

a uniﬁed structure, ensuring consistent entity reso-

lution and data quality. The automated network

analysis component serves as the central processing

unit, generating comprehensive institutional proﬁles

and performing multiple analyses. It processes patent

portfolios, maps technology areas, creates publica-

tion summaries, and identiﬁes research communities

through network analysis. The interactive explo-

ration component provides the user interface for in-

stitution queries, analysis and visualization tools for

exploring network structures, and interactive displays

of proﬁles and connections. After selecting an institu-

tion, the user is presented with a comprehensive pro-

ﬁle page (see Figure 4) that presents three key aspects:

1. Institutional Context. This section provides ba-

sic metadata about the institution, including loca-

tion and contact information, primary technology

areas, and key research areas.

2. Research Output. A structured outline presents

the institution’s patents (which indicate techno-

logical innovation capability), scientiﬁc publica-

tions (which show academic research strength),

and federally funded projects (which indicate re-

search priorities and funding success).

3. Collaboration Networks. Through visual and

interactive representations, users can explore di-

rect collaborators across all data sources, research

communities to which the institution belongs,

and potential collaboration opportunities based on

shared interests and indirect connections.

Through this integrated approach, our IT artifact

transforms the traditionally complex task of analyz-

ing multiple databases into a streamlined process for

identifying and evaluating potential research collabo-

rations. The technical implementation details of these

components are described in the following sections.

3.2.1 Data Preparation

The proposed design integrates data from three main

sources: PATSTAT (bibliographic and legal patent

data from the European Patent Ofﬁce), F

orderkatalog

(over 140,000 records of federal project funding), and

Scopus (comprehensive scientiﬁc literature database).

As part of this integration, data preparation tech-

niques enhance the quality of the combined datasets.

These include data linking (Benjelloun et al., 2009) to

link related entities across sources, and data enrich-

ment (Shvaiko and Euzenat, 2013) to combine com-

plementary information. We applied entity resolu-

tion techniques (Getoor and Machanavajjhala, 2012)

to reconcile representations of researchers, patents,

Connection Is all You Need! Mining and Linking Disparate Data Sources for Collaboration Network Analysis

213

projects, and publications, along with data cleaning

methods (Rahm and Do, 2000) to handle missing val-

ues and inconsistencies (C

e et al., 2023). Initial as-

sessments indicate signiﬁcant improvements in data

consistency and reliability, with detailed validation

processes planned for the next phase.

3.2.2 Graph Database Modeling

Neo4j

serves as our primary graph database, chosen

for its robust capabilities and Cypher query language

(Francis et al., 2018), which enables efﬁcient explo-

ration of complex relationship patterns (Angles et al.,

2017). The model implements nodes representing

Figure 3: Graph model representation of an institution.

institutions, patents, projects, and publications, with

edges representing relationships. Figure 3 shows the

core components with seven relationship types:

(a) Direct Entity Relationships

(1) HAS PATENT relates institutions to their patents

(2) PARTNER IN links institutions to their projects

(3) PUBLISHED BY connects institutions to publica-

tions

(b) Collaborative Relationships

(4) COLLABORATES ON PATENT for joint patent ac-

tivities

(5) COLLABORATES ON PROJECT for project collab-

orations

(6) COLLABORATES ON PUBLICATION for co-

authorship

(7) COLLABORATES for general institutional collab-

oration

This structure enables complex analysis such as com-

munity detection, centrality calculations, and pattern

discovery in collaboration networks.

3.2.3 Community Detection

The system uses community detection algorithms to

identify densely connected groups of nodes within

the graph database that represent research commu-

https://www.neo4j.com/ (Last accessed 2024-11-08)

nities, industry clusters, or other meaningful group-

ings. To identify communities within the collabora-

tion network, we implemented the Louvain algorithm

(Kumar and Hanot, 2021), which was chosen for its

fast processing capabilities and high quality results

on large datasets. This method efﬁciently identiﬁes

densely connected groups while allowing examina-

tion of community structures at different levels, criti-

cal for detailed analysis of cooperation patterns (Sat-

tar and Arifuzzaman, 2022). Once communities are

identiﬁed, network analysis techniques explore coop-

eration patterns and identify key players. The IT ar-

tifact uses betweenness centrality to identify inﬂuen-

tial entities and potential knowledge brokers (Valente

et al., 2008). This measure identiﬁes nodes that fre-

quently appear on the shortest paths between other

nodes, highlighting their role as information brokers.

It was chosen for its reliability and scalability, requir-

ing no predeﬁned parameters, and its ability to iden-

tify key entities that connect disparate groups (Kumar

and Hanot, 2021).

In addition, the integration of heterogeneous data

sources enables the exploration of cross-domain re-

lationships between patent activity, research collab-

orations, and scientiﬁc publications, potentially pro-

viding insights into research-to-application transla-

tion and academic-industry collaborations.

4 SHOWCASE

To validate the practical application of our PoC, this

showcase examines the identiﬁcation of potential re-

search partnerships in the aircraft manufacturing sec-

tor. The scenario demonstrates how our IT artifact

can be used to discover collaboration opportunities

through network analysis, using a speciﬁc example in

materials research and manufacturing processes.

The showcase uses a systematic approach to part-

ner identiﬁcation with the following steps:

1. Initial partner analysis using the system’s search

functionality

2. Investigation of existing collaboration networks

3. Detailed analysis of potential partner proﬁles

4. Evaluation of collaboration potential based on his-

torical project data

4.1 Initial Search

The starting point for this scenario is MTU Aero En-

gines AG, a German aero engine manufacturer that

develops, manufactures, and services military and

commercial aircraft engines. When the institution is

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

214

Figure 4: Institutional proﬁle of MTU Aero Engines AG.

selected, the system displays a comprehensive institu-

tional proﬁle in Figure 4.

At the top of the proﬁle is a structured visual-

ization of the company’s institutional data. This in-

formation includes institutional data such as location,

annual revenue, industry classiﬁcation, and core tech-

nology areas. Below the company proﬁle, the inter-

face transitions to a collaboration network section.

4.2 Network Analysis

The Collaboration Network section in Figure 4 pro-

vides structured access to collaboration data through

specialized tabs for Communities, Patents, Projects

and Scientiﬁc Publications. The Communities tab

shows institutions that have signiﬁcant collabora-

tive relationships with MTU Aero Engines. Sys-

tematic analysis of the network data identiﬁed sev-

eral institutions as potential collaboration partners.

Among these, BCT Steuerungs- und DV-Systeme

GmbH stood out due to its signiﬁcant involvement in

research-oriented manufacturing projects with MTU

Aero Engines. As shown in the collaboration proﬁle

in Figure 5, BCT and MTU Aero Engines have collab-

orated on three major research projects, all supported

Figure 5: Overview of joint research projects between BCT

Steuerungs- und DV-Systeme GmbH and MTU Aero En-

gines AG.

by Technology and Innovation funding. The projects

cover manufacturing-related topics: optimized man-

ufacturing processes for advanced compressors, pro-

duction research (ProLMD) and the industrializa-

tion of digital engineering and additive manufactur-

ing (IDEA). While the collaboration shows strong

project-based ties, there are no joint patents or sci-

entiﬁc publications, indicating a focus on applied re-

search and development. These collaborative activi-

ties demonstrate BCT’s extensive experience in man-

ufacturing processes and digital transformation in the

aerospace industry. The nature and scope of these

Connection Is all You Need! Mining and Linking Disparate Data Sources for Collaboration Network Analysis

215

projects suggest that BCT has relevant expertise for

potential research partnerships, especially given its

successful track record with MTU Aero Engines.

4.3 Discussion

The showcase demonstrates several key capabilities

of our developed IT artifact, while also highlighting

certain limitations and areas for future development.

The PoC successfully facilitated the identiﬁcation of

potential research partners through its structured ap-

proach to data analysis and visualization. Of par-

ticular note is the ability to trace collaboration net-

works across different types of relationships – patents,

projects, and publications – providing a multifaceted

view of institutional capabilities and cooperation pat-

terns. The underlying graph database model is partic-

ularly advantageous here, as it allows efﬁcient traver-

sal of relationship chains and the matching of patterns

that would be difﬁcult to achieve with conventional

relational databases. The case of BCT Steuerungs-

und DV-Systeme GmbH illustrates how the system

can uncover non-obvious opportunities for collabora-

tion by analyzing existing network relationships that

only become visible when multiple data sources are

analyzed together. While the PoC is effective in iden-

tifying collaborations, there are several areas where

the current implementation could be improved. The

current version lacks mechanisms for qualitative eval-

uation of partnerships and temporal analysis capabil-

ities to track how relationships evolve over time. A

major limitation is the lack of automated complemen-

tarity analysis between institutions – future versions

should be able to identify synergies in research capa-

bilities and technological expertise between potential

partners. In addition, the current focus on German re-

search institutions limits the usefulness of the system

for international research partnerships.

5 CONCLUSION

Our approach provides researchers, policymakers,

and industry leaders with deeper insights into col-

laboration landscapes, potentially informing strate-

gic decisions about research partnerships and innova-

tion strategies. While previous approaches have typi-

cally focused on single data sources, our IT artifact

uniquely integrates patent, project, and publication

information into a comprehensive analytical frame-

work. This holistic approach distinguishes our work

from existing solutions that often provide only partial

views of the collaboration landscape. Some of the key

features of our IT artifact include

1. Uniﬁed Data Integration Framework. Our

approach integrates heterogeneous data from

patents, projects, and publications into a compre-

hensive knowledge base, with a ﬂexible architec-

ture that allows seamless integration of additional

data sources.

2. Advanced Analysis Capabilities. The system

implements sophisticated network analysis tech-

niques, including

• Community detection using the Louvain

method to identify research clusters

• Betweenness centrality measures to reveal in-

ﬂuential institutions

• Interactive visualizations to explore collabora-

tion networks

3. Graph-based Architecture. By modeling col-

laboration networks as graphs, our IT artifact en-

ables sophisticated network analysis capabilities

that would be difﬁcult to achieve with traditional

relational databases.

While our PoC demonstrates the feasibility of this

approach, future work will focus on comprehensive

evaluation and enhancement. This will include quan-

titative evaluation of data integration accuracy, user

studies with research institutions, and integration of

additional data sources. We also aim to conduct com-

prehensive scalability analyses across different net-

work scales and domains, including comparative per-

formance evaluations with existing systems and in-

vestigation of integration challenges with real-world

IT infrastructures. We plan to develop more sophis-

ticated partnership recommendation algorithms and

temporal analysis capabilities to better understand the

evolution of research networks over time. This will

include a complementarity analysis component that

examines synergies between institutions to identify

partnerships with the greatest potential for mutual sci-

entiﬁc progress. As the complexity and importance of

research collaborations continues to grow in our inter-

connected world, tools for understanding and foster-

ing these partnerships become increasingly important.

Our work provides a foundation for transforming the

way institutions discover and evaluate potential re-

search partnerships, ultimately contributing to more

effective collaborative research ecosystems.

ACKNOWLEDGEMENTS

This work was funded by the German Federal Min-

istry of Education and Research under grant no.

01IO2208E.

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

216

REFERENCES

Ahuja, G., Soda, G., and Zaheer, A. (2012). The genesis

and dynamics of organizational networks (march, pg

434, 2012). Organization Science, 23:1211–1211.

Angles, R., Arenas, M., Barcel

o, P., Hogan, A., Reutter, J.,

and Vrgo

c, D. (2017). Foundations of modern query

languages for graph databases. ACM Computing Sur-

veys, 50(5):68:1–68:40.

Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q.,

Whang, S. E., and Widom, J. (2009). Swoosh: a

generic approach to entity resolution. The VLDB Jour-

nal, 18(1):255–276.

Bundesministerium f

ur Bildung und Forschung (2024).

orderkatalog - Database of Public Research Funding

in Germany. Federal database of publicly funded re-

search projects in Germany.

e, P.-O., Nikanjam, A., Ahmed, N., Humeniuk, D., and

Khomh, F. (2023). Data cleaning and machine learn-

ing: A systematic literature review. Publisher: [object

Object] Version Number: 1.

Elsevier (2024). Scopus. Abstract and citation database of

peer-reviewed literature.

European Patent Ofﬁce (2024). PATSTAT - Worldwide

Patent Statistical Database. Global patent database

maintained by the European Patent Ofﬁce.

Francis, N., Green, A., Guagliardo, P., Libkin, L., Lin-

daaker, T., Marsault, V., Plantikow, S., Rydberg, M.,

Selmer, P., and Taylor, A. (2018). Cypher: An evolv-

ing query language for property graphs. Proceedings

of the 2018 International Conference on Management

of Data.

Getoor, L. and Machanavajjhala, A. (2012). Entity resolu-

tion: Theory, practice & open challenges. Proceed-

ings of the VLDB Endowment, 5(12):2018–2019.

Hevner, A., R, A., March, S., T, S., Park, Park, J., Ram, and

Sudha (2004). Design science in information systems

research. Management Information Systems Quar-

terly, 28:75.

Hogan, A., Blomqvist, E., Cochez, M., D’amato, C., Melo,

G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Nav-

igli, R., Neumaier, S., Ngomo, A.-C. N., Polleres, A.,

Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda,

J., Staab, S., and Zimmermann, A. (2021). Knowledge

graphs. ACM Computing Surveys, 54(4):71:1–71:37.

Huang, Y., Li, R., Zou, F., Jiang, L., Porter, A. L., and

Zhang, L. (2022). Technology life cycle analysis:

From the dynamic perspective of patent citation net-

works. Technological Forecasting and Social Change,

181:121760.

Javed, M. A., Younis, M. S., Latif, S., Qadir, J., and Baig,

A. (2018). Community detection in networks: A mul-

tidisciplinary review. Journal of Network and Com-

puter Applications, 108:87–111.

Kumar, S. and Hanot, R. (2021). Community detection al-

gorithms in complex networks: A survey. In Thampi,

S. M., Krishnan, S., Hegde, R. M., Ciuonzo, D.,

Hanne, T., and Kannan R., J., editors, Advances in Sig-

nal Processing and Intelligent Recognition Systems,

pages 202–215. Springer.

Long, J. C., Cunningham, F. C., Carswell, P., and Braith-

waite, J. (2014). Patterns of collaboration in complex

networks: the example of a translational research net-

work. BMC Health Services Research, 14(1):225.

Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A., and

Taylor, J. (2019). Industry-scale knowledge graphs:

Lessons and challenges: Five diverse technology com-

panies show how it’s done. Queue, 17(2):Pages

20:48–Pages 20:75.

Ozcan, S. and Islam, N. (2017). Patent information re-

trieval: approaching a method and analysing nan-

otechnology patent collaborations. Scientometrics,

111(2):941–970.

Que, X., Checconi, F., Petrini, F., and Gunnels, J. A. (2015).

Scalable community detection with the louvain algo-

rithm. In 2015 IEEE International Parallel and Dis-

tributed Processing Symposium, pages 28–37. ISSN:

1530-2075.

Raghavan, U. N., Albert, R., and Kumara, S. (2007).

Near linear time algorithm to detect community struc-

tures in large-scale networks. Physical Review E,

76(3):036106.

Rahm, E. and Do, H. H. (2000). Data cleaning: Problems

and current approaches. IEEE Data Engineering Bul-

letin, 23(4):3–13.

Sattar, N. S. and Arifuzzaman, S. (2022). Scalable dis-

tributed louvain algorithm for community detection

in large graphs. The Journal of Supercomputing,

78(7):10275–10309.

Schwartz, M., Peglow, F., Fritsch, M., and G

unther, J.

(2012). What drives innovation output from subsi-

dized r&d cooperation?—project-level evidence from

germany. Technovation, 32(6):358–369.

Shvaiko, P. and Euzenat, J. (2013). Ontology match-

ing: State of the art and future challenges. IEEE

Transactions on Knowledge and Data Engineering,

25(1):158–176. Conference Name: IEEE Transac-

tions on Knowledge and Data Engineering.

Valente, T. W., Coronges, K., Lakon, C., and Costenbader,

E. (2008). How correlated are network centrality mea-

sures? Connections (Toronto, Ont.), 28(1):16–26.

Wang, L. (2017). Heterogeneous data and big data ana-

lytics. Automatic Control and Information Sciences,

3(1):8–15. Number: 1 Publisher: Science and Educa-

tion Publishing.

Connection Is all You Need! Mining and Linking Disparate Data Sources for Collaboration Network Analysis

217