Investigating the Configurability of LLMs for the Generation of Knowledge Work Datasets

Desiree Heim; Desiree Heim; Christian Jilek; Adrian Ulges; Andreas Dengel; Andreas Dengel

doi:10.5220/0013184200003890

Investigating the Configurability of LLMs for the Generation of Knowledge Work Datasets

Desiree Heim, Desiree Heim, Christian Jilek, Adrian Ulges, Andreas Dengel, Andreas Dengel

2025

Abstract

The evaluation of support tools designed for knowledge workers is challenging due to the lack of publicly available, extensive, and complete data collections. Existing data collections have inherent problems such as incompleteness due to privacy-preserving methods and lack of contextual information. Hence, generating datasets can represent a good alternative, in particular, Large Language Models (LLM) enable a simple possibility of generating textual artifacts. Just recently, we therefore proposed a knowledge work dataset generator, called KnoWoGen. So far, the adherence of generated knowledge work documents to parameters such as document type, involved persons, or topics has not been examined. However, this aspect is crucial to examine since generated documents should reflect given parameters properly as they could serve as highly relevant ground truth information for training or evaluation purposes. In this paper, we address this missing evaluation aspect by conducting respective user studies. These studies assess the documents’ adherence to multiple parameters and specifically to a given domain parameter as an important, representative. We base our experiments on documents generated with KnoWoGen and use the Mistral-7B-Instruct model as LLM. We observe that in the given setting, the generated documents showed a high quality regarding the adherence to parameters in general and specifically to the parameter specifying the document’s domain. Hence, 75% of the given ratings in the parameter-related experiments received the highest or second-highest quality score which is a promising outcome for the feasibility of generating high-qualitative knowledge work documents based on given configurations.

Download

Paper Citation

in Harvard Style

Heim D., Jilek C., Ulges A. and Dengel A. (2025). Investigating the Configurability of LLMs for the Generation of Knowledge Work Datasets. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 821-828. DOI: 10.5220/0013184200003890

in Bibtex Style

@conference{icaart25,
author={Desiree Heim and Christian Jilek and Adrian Ulges and Andreas Dengel},
title={Investigating the Configurability of LLMs for the Generation of Knowledge Work Datasets},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={821-828},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013184200003890},
isbn={978-989-758-737-5},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Investigating the Configurability of LLMs for the Generation of Knowledge Work Datasets
SN - 978-989-758-737-5
AU - Heim D.
AU - Jilek C.
AU - Ulges A.
AU - Dengel A.
PY - 2025
SP - 821
EP - 828
DO - 10.5220/0013184200003890
PB - SciTePress