loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Dawid Plaskowski 1 ; Szymon Skwarek 1 ; Dominika Grajewska 1 ; Maciej Niemir 1 and Agnieszka Ławrynowicz 2

Affiliations: 1 Łukasiewicz - Poznań Institute of Technology, Poznań, Poland ; 2 Poznań University of Technology, Poznań, Poland

Keyword(s): Language Models, Information Extraction, Opinion Mining.

Abstract: To address the challenge of extracting opinions from semi-structured webpages such as blog posts and product rankings, encoder-decoder transformer models are employed. We enhance the models’ performance by generating synthetic data using large language models like GPT3.5 and GPT-4, diversified through prompts featuring various text styles, personas and product characteristics. Different fine-tuning strategies are experimented, training both with and without domain-adapted instructions, as well as, training on synthetic customer reviews, targeting tasks such as extracting product names, pros, cons, and opinion sentences. Our evaluation shows a significant improvement in the models’ performance in both product characteristic and opinion extraction tasks, validating the effectiveness of using synthetic data for fine-tuning and signals the potential of pretrained language models to automate web scraping techniques from diverse web sources.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.146.152.147

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Plaskowski, D.; Skwarek, S.; Grajewska, D.; Niemir, M. and Ławrynowicz, A. (2024). Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4; ISSN 2184-433X, SciTePress, pages 681-688. DOI: 10.5220/0012384900003636

@conference{icaart24,
author={Dawid Plaskowski. and Szymon Skwarek. and Dominika Grajewska. and Maciej Niemir. and Agnieszka Ławrynowicz.},
title={Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={681-688},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012384900003636},
isbn={978-989-758-680-4},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Automating Opinion Extraction from Semi-Structured Webpages: Leveraging Language Models and Instruction Finetuning on Synthetic Data
SN - 978-989-758-680-4
IS - 2184-433X
AU - Plaskowski, D.
AU - Skwarek, S.
AU - Grajewska, D.
AU - Niemir, M.
AU - Ławrynowicz, A.
PY - 2024
SP - 681
EP - 688
DO - 10.5220/0012384900003636
PB - SciTePress