Authors:
Dawid Plaskowski
1
;
Szymon Skwarek
1
;
Dominika Grajewska
1
;
Maciej Niemir
1
and
Agnieszka Ławrynowicz
2
Affiliations:
1
Łukasiewicz - Poznań Institute of Technology, Poznań, Poland
;
2
Poznań University of Technology, Poznań, Poland
Keyword(s):
Language Models, Information Extraction, Opinion Mining.
Abstract:
To address the challenge of extracting opinions from semi-structured webpages such as blog posts and product rankings, encoder-decoder transformer models are employed. We enhance the models’ performance by generating synthetic data using large language models like GPT3.5 and GPT-4, diversified through prompts featuring various text styles, personas and product characteristics. Different fine-tuning strategies are experimented, training both with and without domain-adapted instructions, as well as, training on synthetic customer reviews, targeting tasks such as extracting product names, pros, cons, and opinion sentences. Our evaluation shows a significant improvement in the models’ performance in both product characteristic and opinion extraction tasks, validating the effectiveness of using synthetic data for fine-tuning and signals the potential of pretrained language models to automate web scraping techniques from diverse web sources.