Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data

Dawid Plaskowski; Szymon Skwarek; Dominika Grajewska; Maciej Niemir; Agnieszka Ławrynowicz

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data

Topics: AI and Creativity; Data Mining; Natural Language Processing

In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, 681-688, 2024 , Rome, Italy

Authors: Dawid Plaskowski ¹ ; Szymon Skwarek ¹ ; Dominika Grajewska ¹ ; Maciej Niemir ¹ and Agnieszka Ławrynowicz ²

Affiliations: ¹ Łukasiewicz - Poznań Institute of Technology, Poznań, Poland ; ² Poznań University of Technology, Poznań, Poland

Keyword(s): Language Models, Information Extraction, Opinion Mining.

Abstract: To address the challenge of extracting opinions from semi-structured webpages such as blog posts and product rankings, encoder-decoder transformer models are employed. We enhance the models’ performance by generating synthetic data using large language models like GPT3.5 and GPT-4, diversified through prompts featuring various text styles, personas and product characteristics. Different fine-tuning strategies are experimented, training both with and without domain-adapted instructions, as well as, training on synthetic customer reviews, targeting tasks such as extracting product names, pros, cons, and opinion sentences. Our evaluation shows a significant improvement in the models’ performance in both product characteristic and opinion extraction tasks, validating the effectiveness of using synthetic data for fine-tuning and signals the potential of pretrained language models to automate web scraping techniques from diverse web sources.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.59

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Plaskowski, D., Skwarek, S., Grajewska, D., Niemir, M. and Ławrynowicz, A. (2024). Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4; ISSN 2184-433X, SciTePress, pages 681-688. DOI: 10.5220/0012384900003636

@conference{icaart24,
author={Dawid Plaskowski and Szymon Skwarek and Dominika Grajewska and Maciej Niemir and Agnieszka Ławrynowicz},
title={Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={681-688},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012384900003636},
isbn={978-989-758-680-4},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data
SN - 978-989-758-680-4
IS - 2184-433X
AU - Plaskowski, D.
AU - Skwarek, S.
AU - Grajewska, D.
AU - Niemir, M.
AU - Ławrynowicz, A.
PY - 2024
SP - 681
EP - 688
DO - 10.5220/0012384900003636
PB - SciTePress