ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM

Yaël Champclaux; Taoufiq Dkaki; Josiane Mothe

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM

Topics: Information Engineering Methodologies; Systems Engineering Methodologies

In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 1: ICEIS, 279-285, 2009 , Milan, Italy

Authors: Yaël Champclaux ; Taoufiq Dkaki and Josiane Mothe

Affiliation: Université de Toulouse, France

Keyword(s): Information Retrieval, Structural similarity, Graph comparison, Term weighting.

Related Ontology Subjects/Areas/Topics: Enterprise Information Systems ; Information Engineering Methodologies ; Information Systems Analysis and Specification ; Methodologies, Processes and Platforms ; Model-Driven Software Development ; Software Engineering ; Systems Engineering

Abstract: In this paper, we present a new similarity measure in the context of Information Retrieval (IR). The main objective of IR systems is to select relevant documents, related to a user’s information need, from a collection of documents. Traditional approaches for document/query comparison use surface similarity, i.e. the comparison engine uses surface attributes (indexing terms). We propose a new method which combines the use of both surface and structural similarities with the aim of enhancing precision of top retrieved documents. In a previous work, we showed that the use of structural similarity in combination with cosine improves bare cosine ranking. In this paper, we compare our method to Okapi based on BM25 on the Cranfield collection. We show that structural similarities improve average precision and precision at top 10 retrieved documents about 50%. Experiments also address the term weighting influences on system performances.In this paper, we present a graph-based model which be longs to the vector space family. A vector space model considers each document as a vector in the term space. Each coordinate of a vector is a value representing the importance in a document or in a query of an indexing term. The vector space is defined by the set of terms that the system collects during the indexing phase. Many similarity measures such as Cosine, Jaccard, Dice… are used to determine how well a document corresponds to a query. Such measures determine local similarities between a document and a query on the basis of the terms they have in common. Our goal is to exploit another type of similarities called structural similarities. These similarities identify resemblances between elements on the basis of relationships they have. The structural relationship that we use originates from the fact that documents contain words and that words are contained in documents. The idea is to compare these documents through the similarities between the words they contain while similarities between words are themselves dependent on similarities between the documents they are contained in. In a previous paper, we have shown that the use of structural similarities alone was not sufficient to improve the performance of an IRS. In this paper, we present a different method that combines the use of both structural and surface similarities with the aim of enhancing high precision. Surface similarity is computed as an Okapi measure. Selected documents are then stored in a graph then sorted using a SimRank-based score. We call this 2-stages method OkaSim. We have performed different experiments with different term-weightings on the Cranfield Corpus and show that the structural similarities can improve an Okapi ranking. We show that those similarities can improve average precision more than 50% and precision at top 10 retrieved documents about 50% of an Okapi ranking. Tests and experiments also address the term weighting influences on system performances. (More)

In this paper, we present a new similarity measure in the context of Information Retrieval (IR). The main objective of IR systems is to select relevant documents, related to a user’s information need, from a collection of documents. Traditional approaches for document/query comparison use surface similarity, i.e. the comparison engine uses surface attributes (indexing terms). We propose a new method which combines the use of both surface and structural similarities with the aim of enhancing precision of top retrieved documents. In a previous work, we showed that the use of structural similarity in combination with cosine improves bare cosine ranking. In this paper, we compare our method to Okapi based on BM25 on the Cranfield collection. We show that structural similarities improve average precision and precision at top 10 retrieved documents about 50%. Experiments also address the term weighting influences on system performances.In this paper, we present a graph-based model which belongs to the vector space family. A vector space model considers each document as a vector in the term space. Each coordinate of a vector is a value representing the importance in a document or in a query of an indexing term. The vector space is defined by the set of terms that the system collects during the indexing phase. Many similarity measures such as Cosine, Jaccard, Dice… are used to determine how well a document corresponds to a query. Such measures determine local similarities between a document and a query on the basis of the terms they have in common. Our goal is to exploit another type of similarities called structural similarities. These similarities identify resemblances between elements on the basis of relationships they have. The structural relationship that we use originates from the fact that documents contain words and that words are contained in documents. The idea is to compare these documents through the similarities between the words they contain while similarities between words are themselves dependent on similarities between the documents they are contained in. In a previous paper, we have shown that the use of structural similarities alone was not sufficient to improve the performance of an IRS. In this paper, we present a different method that combines the use of both structural and surface similarities with the aim of enhancing high precision.
Surface similarity is computed as an Okapi measure. Selected documents are then stored in a graph then sorted using a SimRank-based score.
We call this 2-stages method OkaSim. We have performed different experiments with different term-weightings on the Cranfield Corpus and show that the structural similarities can improve an Okapi ranking. We show that those similarities can improve average precision more than 50% and precision at top 10 retrieved documents about 50% of an Okapi ranking.
Tests and experiments also address the term weighting influences on system performances.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.108

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Champclaux, Y., Dkaki, T. and Mothe, J. (2009). ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM. In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-8111-86-9; ISSN 2184-4992, SciTePress, pages 279-285. DOI: 10.5220/0002017202790285

@conference{iceis09,
author={Yaël Champclaux and Taoufiq Dkaki and Josiane Mothe},
title={ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2009},
pages={279-285},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002017202790285},
isbn={978-989-8111-86-9},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM
SN - 978-989-8111-86-9
IS - 2184-4992
AU - Champclaux, Y.
AU - Dkaki, T.
AU - Mothe, J.
PY - 2009
SP - 279
EP - 285
DO - 10.5220/0002017202790285
PB - SciTePress