Fast Document Similarity Computations using GPGPU

Parijat Shukla; Arun K. Somani

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Fast Document Similarity Computations using GPGPU

Topics: Data Analytics; Data Reduction and Quality Assessment; Mining Text and Semi-Structured Data; Pre-Processing and Post-Processing for Data Mining

In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 0IC3K, 323-331, 2018 , Seville, Spain

Authors: Parijat Shukla ¹ and Arun K. Somani ²

Affiliations: ¹ Xillinx, Inc., HITEC City, Hyderbad and India ; ² Dept. of Electrical and Computer Engineering, Iowa State University, Ames, Iowa and U.S.A.

Keyword(s): Deduplication, Semi-structured Data, NoSQL, Big Data, Parallel Processing, GPGPU, Data Shaping.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Business Analytics ; Data Analytics ; Data Engineering ; Data Reduction and Quality Assessment ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Mining Text and Semi-Structured Data ; Pre-Processing and Post-Processing for Data Mining ; Symbolic Systems

Abstract: Several Big Data problems involve computing similarities between entities, such as records, documents, etc., in timely manner. Recent studies point that similarity-based deduplication techniques are efficient for document databases. Delta encoding-like techniques are commonly leveraged to compute similarities between documents. Operational requirements dictate low latency constraints. The previous researches do not consider parallel computing to deliver low latency delta encoding solutions. This paper makes two-fold contribution in context of delta encoding problem occurring in document databases: (1) develop a parallel processing-based technique to compute similarities between documents, and (2) design a GPU-based document cache framework to accelerate the performance of delta encoding pipeline. We experiment with real datasets. We achieve throughput of more than 500 similarity computations per millisecond. And the similarity compuatation framework achieves a throughput in the range of 237-312 MB per second which is up to 10X higher throughput when compared to the hashing-based approaches. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Shukla, P. and Somani, A. K. (2018). Fast Document Similarity Computations using GPGPU. In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - KDIR; ISBN 978-989-758-330-8; ISSN 2184-3228, SciTePress, pages 323-331. DOI: 10.5220/0006960303230331

@conference{kdir18,
author={Parijat Shukla and Arun K. Somani},
title={Fast Document Similarity Computations using GPGPU},
booktitle={Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - KDIR},
year={2018},
pages={323-331},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006960303230331},
isbn={978-989-758-330-8},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - KDIR
TI - Fast Document Similarity Computations using GPGPU
SN - 978-989-758-330-8
IS - 2184-3228
AU - Shukla, P.
AU - Somani, A.
PY - 2018
SP - 323
EP - 331
DO - 10.5220/0006960303230331
PB - SciTePress