Variability-Driven User-Story Generation Using LLM and Triadic

Concept Analysis

Alexandre Bazin

, Alain Gutierrez

, Marianne Huchard

, Pierre Martin

2,3

and Yulin (Huaxi) Zhang

LIRMM, Univ. Montpellier, CNRS, Montpellier, France

CIRAD, UPR AIDA, F-34398 Montpellier, France

AIDA, Univ. Montpellier, CIRAD, Montpellier, France

EPROAD, Universit

e de Picardie Jules Verne, Amiens, France

Keywords:

Software Product Line, Variability, User-Story, Requirements, Agile Product Backlog, LLM, Formal Concept

Analysis, Triadic Concept Analysis.

Abstract:

A widely used Agile practice for requirements is to produce a set of user stories (also called “agile product

backlog”), which roughly includes a list of pairs (role, feature), where the role handles the feature for a

certain purpose. In the context of Software Product Lines, the requirements for a family of similar systems is

thus a family of user-story sets, one per system, leading to a 3-dimensional dataset composed of sets of triples

(system, role, feature). In this paper, we combine Triadic Concept Analysis (TCA) and Large Language Model

(LLM) prompting to suggest the user-story set required to develop a new system relying on the variability logic

of an existing system family. This process consists in 1) computing 3-dimensional variability expressed as a

set of TCA implications, 2) providing the designer with intelligible design options, 3) capturing the designer’s

selection of options, 4) proposing a ﬁrst user-story set corresponding to this selection, 5) consolidating its

validity according to the implications identiﬁed in step 1, while completing it if necessary, and 6) leveraging

LLM to have a more comprehensive website. This process is evaluated with a dataset comprising the user-

story sets of 67 similar-purpose websites.

1 INTRODUCTION

At the requirements stage, a widespread practice of

the Agile paradigm is to provide a set of user-stories

(also called “agile product backlog”), where a user-

story is a brief sentence expressing the fact that a ’per-

sona’ (or role) wants to perform an ’action’ (or have

access to a feature) with a certain ’purpose’ (Lucassen

et al., 2016). In the context of Software Product Lines

(SPL, (Pohl et al., 2005)), the requirements for a fam-

ily of similar systems are therefore a family of sim-

ilar user-story sets, one set per system. User-story

sets are usually stored to support product line require-

ments documentation, guide the development, and are

connected to the source code.

In this paper, we address the issue of building the

user-story set for a new system based on the vari-

ability logic of an existing system family, according

to design options provided by the system designer.

We investigate a process that combines Triadic Con-

cept Analysis (TCA) (Lehmann and Wille, 1995) and

Large Language Model (LLM) prompting with the

system designer input to suggest the user-story set for

the new system. The design options of the new sys-

tem are provided at an intermediate level of descrip-

tion (e.g. e-commerce), rather than at the feature level

(e.g. pay), to alleviate the conﬁguration work.

Our approach operates at the two stages of the tra-

ditional SPL framework (Pohl et al., 2005). It con-

tributes to the domain engineering stage by building

a variability model for requirements which is com-

posed of (1) a set of triadic implications (Ganter and

Obiedkov, 2004) and (2) an intelligible set of design

options provided by LLM. At the application engi-

neering stage, a selection of design options, made by

the software designer, leads LLM to propose an initial

user-story set. Then LLM uses the triadic implication

set to consolidate the validity of the proposed user-

story set, completing it if necessary to get a nearly

valid conﬁguration. Finally, we leverage LLM to pro-

pose user-stories related to the current user-story set,

in order to have an even more comprehensive website.

618

Bazin, A., Gutierrez, A., Huchard, M., Martin, P. and Zhang, Y.

Variability-Driven User-Story Generation Using LLM and Triadic Concept Analysis.

DOI: 10.5220/0013360500003928

In Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2025), pages 618-625

ISBN: 978-989-758-742-9; ISSN: 2184-4895

The process is evaluated through a case study us-

ing a dataset of the literature (Bazin et al., 2024).

Results are encouraging and indicate that the combi-

nation of the rigor of TCA and knowledge brought

by LLM would be beneﬁcial. This dataset is com-

posed of user-story sets of 67 similar websites in sev-

eral domains (mangas and derived products, martial

art equipment, board games and video games).

Section 2 presents Triadic Concept Analysis

(TCA) and the complexity for a software designer

to leverage such outputs. Section 3 presents the ap-

proach. It outlines the process and presents the ma-

terial and method adopted to address the case study.

Section 4 presents and discusses the results. Related

work is presented in Section 5, and we conclude in

Section 6 with a summary and future work.

2 TRIADIC CONCEPT ANALYSIS

TCA in a Nutshell. Formal Concept Analysis

(FCA) (Ganter and Wille, 1999) is a mathematical

framework that aims at structuring information found

in data in the form of binary relations. It starts with

a binary relation where objects are described by at-

tributes (see Table 1).

Table 1: A relation between systems as objects and features

as attributes, inspired from (Bazin et al., 2024).

search view comment manage cart

MyManga × ×

MangaStore ×

MangaHome × ×

In binary relations, an implication is a pattern of

the form A → B where A and B are attribute sets such

that every object described by the attributes of A is

also described by the attributes of B. For instance,

the implication {view comment} → {manage cart}

holds in Table 1 as all the systems offering the

view comment feature (only MangaHome) also offer

the manage cart feature.

The user-stories we consider are ternary relations

between systems, roles and features (see Table 2).

TCA (Lehmann and Wille, 1995) has been developed

in order to exploit the more complex information they

contain. In Table 2, a ﬁnal user can search in all sys-

tems, and view comments only in MangaHome. A

product manager can manage cart in MyManga and

MangaHome, and view comments in MangaHome.

Implications in a triadic setting are more diver-

siﬁed than in their dyadic counterpart (Ganter and

Obiedkov, 2004). Indeed, one can be interested in

implications between features, between roles, or be-

tween the allocations of speciﬁc features to speciﬁc

roles, i.e. pairs (feature,role) or symmetrically pairs

Table 2: A ternary relation between systems (MyManga,

MangaStore, MangaHome), features (search s, view com-

ment vc, manage cart mc) and roles (FinalUser,

Administrator, ProductManager) (Bazin et al., 2024).

s vc mc s vc mc s vc mc

MyManga × × × ×

MangaStore × ×

MangaHome × × × × ×

FinalUser Administrator ProductManager

(role,feature). To obtain these latter rules, triadic data

are brought back to a dyadic view: a binary relation

is created by taking the Cartesian product of the re-

quired dimension as attributes and the Cartesian prod-

uct of the other dimensions as objects. For instance,

Table 3 depicts a binary relation between systems and

the pairs (feature,role) they offer. The implication

{(s, A)} → {(s, FU )} holds and means that all the

systems that offer the search feature to administrators

also offer it to ﬁnal users. Two systems (MyManga

and MangaStore) offer the implication premise (s, A).

This number is called the support of the implication.

Table 3: A binary relation between systems and pairs com-

posed of a feature (search s, view comment vc, manage

comment mc) and a role (FinalUser FU , Administrator A,

ProductManager PM).

(s,FU ) (vc,FU) (mc,FU) (s,A) (vc,A) (mc,A) (s,PM) (vc,PM) (mc,PM)

MyManga × × ×

MangaStore × ×

MangaHome × × × × ×

In this paper, we use only implications between

pairs (feature,role), and whose premise is a singleton,

to prevent LLM from facing excessive computation

challenges in this ﬁrst study.

Handling Implications to Design a New System.

Let’s illustrate the limit of handling implications to

design a new system using a small dataset taken from

(Bazin et al., 2024), which introduces sets of user-

stories from four manga-related websites. Two impli-

cations between pairs (role, feature) are shown below:

<4> => (user;search)

<2> (communityManager;moderateComment)

=> (user;viewComment)

(...)

In this set, an implication is expressed as < n >

(r1; f 1) => (r2; f 2), where n is the support that in-

forms on the number of websites which provide the

premise (r1; f 1) of the implication. Such an implica-

tion thus means that in n websites, when role r1 can

perform feature f 1 (premise), then role r2 can per-

form feature f 2 (conclusion).

User story sets and binary implications capture a

large part of the variability logic of the Manga-related

Variability-Driven User-Story Generation Using LLM and Triadic Concept Analysis

619

Figure 1: Overview of the process within the software product line framework. In this UML activity diagram, a color informs

on the actuator: orange, violet, blue, and grey refer respectively to website designer, LLM, TCA, and prompt designer.

website family. They ﬁx the vocabulary on role names

(e.g. ’registeredUser’ corresponds to ’premium user’,

’subscriber’, etc.), and feature names (e.g. ’CRUD-

products’ corresponds to ’manageProductsDB’). The

implications indicate, for instance all websites pro-

vide search to users (1st implication, held by all 4

systems); when a community manager can moderate

comments, then users can view the comments (2nd im-

plication, held by 2 systems).

Implications being numerous, information is dif-

ﬁcult to grasp by a software designer. Nor is it us-

able, as it does not give a synthetic report of the high-

level options available (such as e-commerce or com-

munity management) and the logical dependencies to

be respected when developing a new website. This is

where LLM comes into play, with its ability to sum-

marize and leverage knowledge to recommend fea-

tures and dependencies in a more general setting.

3 APPROACH

Process Overview. Figure 1 outlines the process

within the SPL framework (Pohl et al., 2005). The

ﬁrst activities take place at the Domain engineering

stage, which focuses, in this work, on identifying

commonalities and variability in the requirements of

the systems provided as input. In a ﬁrst step, the fam-

ily of user-story sets is extracted from the system fam-

ily storage, and then communicated with a prompt

(Prompt step 1) to LLM. In return, the latter com-

putes the main design options and provides a design

summary. In parallel, the family of user-story sets is

parsed with TCA, producing a set of implications that

express logical dependencies between user-stories.

A second group of activities takes place in the Ap-

plication engineering stage, which aims to produce a

set of user-stories for the new system to be developed.

As a second step of the process, the designers select

the design options in the list proposed in the Design

summary, then LLM provides the initial user-story set

corresponding to this selection using Prompt step 2.

Prompt step 3 asks LLM to extend the initial user-

story set using the TCA implications. Finally, using

Prompt step 4, LLM is requested to add or remove

user-stories to get a more comprehensive system. All

ﬁles of the case study are available online

Tool and Dataset. The LLM adopted to conduct

the case study is the general ChatGPT 4.0 model,

to beneﬁt from its latest enhancements. The pur-

pose is to allow anyone to use or reproduce our

results. Regarding the dataset, we used the ﬁle

ALL

System Role ActingVerb.csv reported in (Bazin

et al., 2024). This dataset gathers the sets of user-

stories extracted in 2023 by students from 67 similar-

purpose websites within the domains of mangas and

derived products, sport equipment for martial arts,

board games, and video games. The extraction has

been supervised and the result has been reviewed

and standardized. This dataset contains 1546 triples

(system, role, action verb), where 67 systems, 17

roles and 30 action verbs are involved, giving 91 user-

stories. From this dataset, TCA computed 687 impli-

cations for the relation system × (role; f eature) given

in the companion repository

Prompt Design. To write the prompt, we have com-

plied with the recommendations of OpenAI

and lit-

erature (Schulhoff et al., 2024; Mondal et al., 2024;

White et al., 2023). (i) A role (persona) is assigned to

LLM in order to clarify its position in relation to the

tasks it has to perform. (ii) A context is given as a key

part of the prompt to guide LLM towards a relevant

outcome. This context outlines the aim and the frame-

work required by LLM to understand the data and the

tasks. (iii) The chat is decomposed into several tasks.

(iv) LLM is asked to review its answers. This often

helps LLM correct mistakes and achieve a better re-

sult. (v) The syntax is explained and illustrated, no-

tably when it is complex, such as implications, i.e. a

https://doi.org/10.18167/DVN1/GNJMAV/

https://doi.org/10.18167/DVN1/BWCC71

https://platform.openai.com/docs/guides/

prompt-engineering

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

620

mathematical logic formulation preceded by the im-

plication support. Clearly identifying the syntax in-

formation on the prompt greatly improves the results.

Finally, (vi) using tags allows to differentiate different

parts within the prompt (e.g. <Context>, <Syntax

of data>), and to refer data and results (e.g. <User

story data>, <Result step2>).

The ﬁnal prompt and the associated answers

are presented below in a representative chat. As

the aim of this section is to present the prompt,

long answers are cut short and the removed parts

are replaced by an ellipsis (...). To facili-

tate data transmission to LLM, two ﬁles are up-

loaded in the chat, i.e. UserStoriesInitial.csv and

ImplicationsInitial.txt that contain respectively the

user-story set family for step 1 and the implication

set computed by TCA for step 3.

Prompt step 1 asks LLM to provide a summary of

high-level design options. This prompt is presented

below.

Step 1:

<Role>: Your role is to assist software engineers

in designing websites using the principles of soft-

ware product lines. (...)

<Context>: (...) you will be guided by a provided

description of various similar websites in the tar-

geted business domain (Manga websites) to iden-

tify the high-level design options. (...)

<Syntax of data>: Each line in the raw data repre-

sents a user story of a website, expressed as a triple:

“website, role, feature.”

<User story Data>: UserStoriesInitial.csv

—

<Task 1>: Based on <User story Data>, can you

categorize these role-features into logical groups

that represent distinct website feature modules (...)

<Task 2>: Please review the groups to ensure ac-

curacy, (...)

An answer excerpt to Prompt step 1, is as follows:

(...) Proposed Feature Modules:

(...)

Payment & Subscription Management

Roles: Finaluser, Subscriber, FinancialManager

Features: pay, subscribe, unsubscribe, cancel

(...)

Promotions & Marketing

Roles: MarketingManager, SocialMediaInﬂuencer

Features: promote, advertise, campaign, evaluate

(...)

Prompt step 2 is then completed by the selected

Design options.

Step 2:

<Context>: In this step 2, you will identify the set

of needed (role;feature) pairs to implement a set of

chosen high-level design options (logical groups).

—

<Task 1>: Based on result of Task 2 of Step

1, can you list the (role;feature) pairs (i.e. the

user-stories) that are needed to implement (...)

[Payment & Subscription Management and

Promotions & Marketing].

<Task 2>: Review your result.

<Formatted result step 2> Show the plain list of

pairs (role;feature) from these groups (...)

In the answer, as presented below, LLM lists the

relevant user-stories for the selected options.

(Finaluser;pay)

(Financialmanager;validate)

(Intermediateseller;subscribe) (...)

In Prompt step 3, LLM is asked to consider the

TCA implications in order to have a user-story set re-

specting the dataset logical dependencies:

Step 3:

<Context>: In this step, you will leverage logi-

cal dependencies (...) Using them, you will recom-

mend an initial set of user stories (...)

<Binary Implications>:

In the general case, a binary implication follows

the format “<n> (r1;f1) => (r2;f2) ”, where (...)

<Implications derived for Input Data>: Implica-

tionsInitial.txt

—

<Task 1>: The result of step 2 (<Formatted result

step 2>) lists the pairs (role;feature) correspond-

ing to the user stories associated to the high-level

options chosen by the software engineer. Can you

apply the <Implications derived for input data>,

to recommend the set of user stories (pairs (r;f))

that describe this website? (...)

<Task 2>: Review your result

<Result 1 step 3> Show the plain list of applied

implications (...)

<Result 2 step 3> Show the ﬁnal plain list of

(role;feature) pairs (i.e. user stories). (...)

In its answer (see below), LLM lists the applied

implications, and updates the user-story list.

Result 1 step 3 (Applied Implications):

(Finaluser;unsubscribe) => (Finaluser;add)

(Finaluser;makeanoffer) => (Finaluser;add) (...)

Result 2 step 3 (Final Set of (Role;Feature) Pairs):

(Administrator;login) (Administrator;manage)

(Contentcreator;add) (...)

Variability-Driven User-Story Generation Using LLM and Triadic Concept Analysis

621

Prompt step 4 asks LLM to enlarge the proposal:

Step 4:

<Context> In this step, we would like to reﬁne the

previous result (...)

<Task> (...) would you add or remove any (role;

feature) pairs in <Result 2 step 3>? (...)

<Result step 4> Updated list of user-stories (...)

The ﬁnal answer of LLM completes the website

design, with explanations:

(...) Here are the adjustments made:

Additions:

(Finaluser;download): Users might want the ability

to download Manga chapters or entire volumes for

ofﬂine reading, (...)

(...)

Removals: None; (..)

Here’s the ﬁnal set of user stories for a comprehen-

sive Manga website:

Result Step 4:

Administrator;login

Administrator;manage

(...)

Investigated Questions / Aim of the Evaluation.

The case study aims to assess LLM ability: to produce

a relevant set of user-stories for a new system in the

SPL framework, and to combine its knowledge with

logical dependencies extracted from existing systems

using a logic-based method, i.e. TCA. We thus fo-

cused on these main research questions:

• (Q1) Is LLM able to properly summarize the

shared or speciﬁc high-level design options of the

existing system family?

• (Q2) Is LLM able to leverage logical dependen-

cies to derive a nearly valid set of user-stories,

starting from user-stories selected by the software

engineer?

• (Q3) Is LLM able to extend the set of user-stories

with proposals that make it more comprehensive,

while avoiding straying too far from the initial re-

quirements?

4 EVALUATION: FINDINGS AND

DISCUSSION

In this section, we present and discuss the results

obtained on twenty representative conversations with

LLM, and then we discuss threats to validity.

Step 1. Design options summary. At step 1 of each

of the 20 conversations, LLM answers a list of high-

level options (the design options summary). The de-

sign summaries contain from four to eight options and

have an average of 6.3 options. To assess the stability

and content relevance of these 20 summaries, we de-

veloped a 2-prompt conversation launched ﬁve times.

In the ﬁrst prompt, we asked LLM to generate a report

about the similarity of the 20 summaries. For evaluat-

ing this similarity, LLM has to identify common ele-

ments, based on identical names, synonyms, or terms

with close semantics. In each of the ﬁve launched

analyses, we observed that from six to eight options

appear in more than half of summaries, revealing that

most of the summaries share quite all their options. In

the second prompt, we asked LLM to analyze more

deeply the four most frequent options. The two most

frequent ones are User Management/Account Man-

agement and Content Management. Then, these three

following options appear in different orders: Interac-

tion/browsing or (exclusive) support/communication

and always Subscription/Financial/Payment. This re-

veals a stability in the summaries built by LLM. These

elements support a positive answer to (Q1).

Step 2. List of user stories for the selected design op-

tions. The result of this step is rather straightforward

to deliver for LLM, as it consists of enumerating the

user-stories corresponding to the selection of one or

more design options, that it created itself by group-

ing user-stories. Nevertheless, Table 4 shows that the

number of user-stories grouped by LLM in a design

option (at step 1) varies from one conversation to an-

other, even if they were conducted the same day (e.g.

Conversations Id 16, 17, and 18). This mitigates the

positive answer to (Q1), as this means, that, even if

nearly similar options names are presented to the sys-

tem designer, these options may correspond to differ-

ent user story groups.

Step 3. Application of the implications computed by

TCA. Tables 4 and 5 report ﬁgures about the two

results of step 3, i.e. applied implications and ob-

tained user-stories respectively. Entrusting the task

of applying the implications means that we are suf-

ﬁciently conﬁdent in LLM ability to follow the ap-

plication procedure described in the prompt and to

enrich it. The set of binary implications we use has

the property of being “direct” meaning that using the

premises as input and applying the implications all

at once provides all the user stories that can be in-

ferred. This is an important property that eases LLM

task. In order to assess our conﬁdence, we developed

a rule engine (RuleEng

) that applies TCA implica-

tions whose premise appears in step 2 result. The

https://gite.lirmm.fr/gutierre/expeimplications

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

622

output of the rule engine is the set of deduced user-

stories. We then compare the implications applied by

RuleEng with those applied by LLM (Table 4), and

the user-stories computed by RuleEng with those pro-

vided by LLM (Table 5).

Results show a difference in seven conversations

between the implications applied by RuleEng and

those effectively applied by LLM (in boldface in Ta-

ble 4). Among these seven conversations, in conver-

sations Id 11 and 14, LLM applies some new implica-

tions in addition to those applied by RuleEng. In the

other ﬁve cases, there is a signiﬁcant difference be-

tween implications applied by RuleEng and the ones

applied by LLM, e.g conversation Id 4 for which only

seven implications applied by RuleEng (on 95) were

used by LLM among ten. Identifying the cause of

such behavior raises questions (e.g. misunderstanding

of the prompt or larger use of knowledge). The same

evaluation was carried out for the computed user-

stories (Table 5). This table shows 6 conversations

where numbers differs, that also present a difference

between implications, with a similar trend, i.e. when

fewer implications are applied, fewer user-stories are

computed, and reversely. Regarding the conversation

that presents a difference in implications and not in

the user-stories (conversation Id 11), we suppose that

LLM did not apply some implications it declared to

have applied. This result gives us a relative conﬁ-

dence in the way that LLM applies the implications

and derives the user-stories, and contributes to answer

partly positively to (Q2). We observe a signiﬁcant

number of conversations with low quality of implica-

tion application by LLM (about 1/3). A learned lesson

is that, at this stage of LLM developement, after appli-

cation of step 3 in real practice, it is recommended to

compare the number of implications applied by both

LLM and RuleEng. When the difference is signiﬁ-

cant, the designer can either discard the conversation,

or try to redirect LLM.

Step 4. Upgrade of the user-stories using LLM. Ta-

ble 6 reports the user-stories improvements made by

LLM on the results from step 3. In four conversations

(Id 8, 15, 16, and 17), user-stories were removed from

step 3, meaning that LLM possibly considered some

being semantic duplicates in the list. Of these four

conversations, only one (Id 17) presents differences

in both Tables 4 and 5. For all the conversations, we

note that LLM adds user-stories. The increase ranges

between 2% and 136%, and is 38% on average. A

human reviewing conﬁrmed their added-value, while

remaining in the expected scope of the website do-

main, that fully justiﬁes the use of LLM. For these

conversations, we can conclude positively to (Q3).

Table 4: Implications (Implicat.) applied to obtain the set

of user-stories in step 3 per conversation. Selected Design

options are expressed by their acronym (e.g. T stands for

Transaction). US stands for User-stories. Values in bold

face highlight the differences between implications applied

by LLM and the ones applied by RuleEng.

Conver- Compu- Selected #initial #Implicat. #Implicat. #Implicat.

sation tation Options US from applied by applied applied by

Id date in step 2 step 2 RuleEng by LLM RuleEng and LLM

1 10/31 T 6 36 36 36

2 10/31 T/F 11 90 90 90

3 10/31 SP 8 69 69 69

4 10/31 SN/SI 13 95 10 7

5 10/31 PSM/PM 14 114 114 114

6 10/31 PSM/PM 22 104 104 104

7 11/02 FT 6 35 35 35

8 11/02 SPM 18 78 78 78

9 11/02 FT 7 67 12 3

10 11/02 MPF 8 48 4 3

11 11/02 FO 5 42 53 42

12 11/02 SPP 8 66 66 66

13 11/02 PSM 11 67 8 6

14 11/03 UIF 14 46 112 46

15 11/03 PT 8 64 64 64

16 11/03 TM 10 94 94 94

17 11/03 TM 10 94 71 0

18 11/03 TM 5 38 38 38

19 11/03 TF 9 65 65 65

20 11/03 SPM 6 63 63 63

Table 5: User-stories (US) computed at step 3 per conversa-

tion. Values in bold face highlight differences between US

computed by LLM and those computed by RuleEng.

Conver- Compu- Selected #initial #US #US #US

sation tation Options US from comput. comput. comput. By

Id Date in step 2 step 2 by RuleEng by LLM RuleEng and LLM

1 10/31 T 6 25 25 25

2 10/31 T/F 11 41 41 41

3 10/31 SP 8 36 36 36

4 10/31 SN/SI 13 40 14 14

5 10/31 PSM/PM 14 43 43 43

6 10/31 PSM/PM 22 41 41 41

7 11/02 FT 6 25 25 25

8 11/02 SPM 18 44 44 44

9 11/02 FT 7 30 16 14

10 11/02 MPF 8 27 11 8

11 11/02 FO 5 31 31 31

12 11/02 SPP 8 28 28 28

13 11/02 PSM 11 33 11 11

14 11/03 UIF 14 27 50 27

15 11/03 PT 8 28 28 28

16 11/03 TM 10 49 49 49

17 11/03 TM 10 49 33 27

18 11/03 TM 5 29 29 29

19 11/03 TF 9 28 28 28

20 11/03 SPM 6 31 31 31

Threats to Validity. Internal validity deals with

datasets and tools quality. We refer the readers to the

paper introducing the used dataset (Bazin et al., 2024)

which exposes the concerns related to its building.

The uncontrolled element of this process is the

LLM computation (e.g. summarizing), the fact that

LLM parameters cannot be set in the current ver-

sion we used, and knowledge it can bring. This cor-

responds to plausible current working condition for

many software designers. In order to assess the abil-

ity of LLM to apply implications, we developed, apart

from LLM, a rule engine named RuleEng to system-

atically apply implications and obtain the expected re-

sulting user-stories. In addition, a systematic human

review of LLM results ensured their coherency in re-

lation with the task and input data (e.g. user-stories,

implications). This systematic review also allowed

to identify abnormal results, corresponding to a loss

Variability-Driven User-Story Generation Using LLM and Triadic Concept Analysis

623

Table 6: User-stories (U S) per conversation obtained in step

3 and 4. Values in bold face highlight differences between

US obtained by LLM in step 3 and those obtained in step 4.

Conver- Compu- Selected #initial #US #US #US listed

sation tation Options US from listed listed both in

Id Date in step 2 step 2 in step 3 in step 4 step 3 and 4

1 10/31 T 6 25 37 25

2 10/31 T/F 11 41 55 41

3 10/31 SP 8 36 45 36

4 10/31 SN/SI 13 14 23 14

5 10/31 PSM/PM 14 43 52 43

6 10/31 PSM/PM 22 41 53 41

7 11/02 FT 6 25 33 25

8 11/02 SPM 18 44 52 40

9 11/02 FT 7 16 24 16

10 11/02 MPF 8 11 18 11

11 11/02 FO 5 31 38 31

12 11/02 SPP 8 28 35 28

13 11/02 PSM 11 11 17 11

14 11/03 UIF 14 50 57 50

15 11/03 PT 8 28 35 24

16 11/03 TM 10 49 50 44

17 11/03 TM 10 33 78 32

18 11/03 TM 5 29 38 29

19 11/03 TF 9 28 37 28

20 11/03 SPM 6 31 42 31

of quality of ChatGPT 4.o answers that has occurred

during a short time, due to the change of its model. By

nature of this tool, that shows randomness, we cannot

have a perfect guarantee on the stability of the results

and their repeatability.

We proposed various ways to assess the steps, i.e.

a similarity study between the delivered design sum-

maries for step 1, a comparison between the results of

the rule engine and of LLM for step 3, and checking

whether updates are of reasonable size and do not fall

outside the domain scope for step 4. Designing more

in-depth assessments remains a task for the future.

The case study deserves to be extended in several

directions before generalizing (external validity), us-

ing a richer user stories description included in (Bazin

et al., 2024), and considering other SPL domains.

Nevertheless the study allows to expect that the ap-

proach is relevant on datasets of the same size and na-

ture (commercial and community websites). We also

could have considered other LLMs, but the objective

was not to determine whether one model is better than

another, but rather to demonstrate the feasibility of us-

ing an LLM.

5 RELATED WORK

LLMs provide many opportunities for achieving soft-

ware engineering tasks, as it has been reported in a

recent systematic literature review (Hou et al., 2024).

Two works at the requirement stage are worth men-

tioning. An approach for synthesizing speciﬁcations

of software conﬁgurations from natural language texts

has been proposed in (Mandal et al., 2023). Here we

do not rely on identifying speciﬁcations, as we dis-

pose of user-stories, which are formatted expressions

of speciﬁcations. LLM is used to evaluate the quality

of user-stories in (Ronanki et al., 2024). In our present

work, we do not evaluate the user-story sets and we

consider they have a sufﬁcient quality level to serve

as a reference basis for building a new user-story set.

A comparison of two approaches (rules versus LLM)

to derive UML sequence diagrams from user stories

is presented in (Jahan et al., 2024). Here, we do not

aim to derive diagrammatic representations.

Domain models have been derived from user-

stories using approaches including LLM interaction

in (Arulmohan et al., 2023; Bragilovski et al., 2024).

In (Bragilovski et al., 2024), examples of extracted

domain concepts are personas, actions or entities.

They used the reference dataset in (Dalpiaz, 2018),

which contains user-story sets for single systems on

different topics, and has been introduced in (Dalpiaz

et al., 2019). In Step 1, we do not extract a domain

model, rather we ask LLM to categorize the roles and

features, thus to operate on this domain model to give

a synthetic view of high-level design options. The

dataset we use contains a family of user-story sets.

To our knowledge, there are few works that inte-

grate SPL and LLM. One direction consists in apply-

ing Software Product Line Engineering (SPLE) prin-

ciples to construct composite LLMs (Gomez-Vazquez

and Cabot, 2024). In another direction, LLMs are

used to achieve or assist with certain tasks of the

SPLE life cycle, as we do in this paper. E.g. ChatGPT

was used to synthesize SPL on the basis of a set of

variants in (Acher and Martinez, 2023). In this latter

paper, different types of system variants are consid-

ered: Java, UML, GraphML, state charts, and PNG.

We follow this line of research with a few differences.

Variability is identiﬁed using an exact method (i.e.

TCA). When asking LLM to identify design options

that group roles and features, the design options are a

way to annotate the user-stories, which can be consid-

ered as a part of the product line to a certain extent.

As suggested in the discussion in (Acher and Mar-

tinez, 2023), our proposal combines the use of LLM

with a deterministic approach.

6 CONCLUSION

In this paper, we investigated the combination of

LLM with a logical analysis method (TCA), applied

to a user-story sets family in order to assist software

engineers in the building of a new user-story set. The

method uses (1) the knowledge extracted from the

user-story sets family to frame the scope and guide

towards valid conﬁgurations, and (2) knowledge of

LLM to overcome the limitations inherent to the ex-

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

624

isting system family.

This work can be extended in several directions.

First, TCA provides additional kinds of implications,

not considered in this study, from which other types

of logical dependencies (e.g. mutual exclusions) can

be inferred. They can be used to ﬁne-tune the soft-

ware’s ﬁnal conﬁguration. To address higher dimen-

sions, like the purpose or the version, Polyadic Con-

cept Analysis (Voutsadakis, 2002) can be used. Sec-

ond, the process can be reﬁned to better match design-

ers’ needs. For instance, LLM can propose various

abstraction level options, or the implications provided

by the rule engine can be used without requiring LLM

to apply them. This may reduce the sensitivity of the

conﬁguration to the randomness of LLM.

ACKNOWLEDGEMENTS

This work was supported by the ANR SmartFCA

project, Grant ANR-21-CE23-0023 of the French Na-

tional Research Agency.

REFERENCES

Acher, M. and Martinez, J. (2023). Generative AI for

reengineering variants into software product lines: An

experience report. In Proc. of the 27th ACM Int. Sys-

tems and Software Product Line Conf. - Volume B,

SPLC 2023, pages 57–66. ACM.

Arulmohan, S., Meurs, M., and Mosser, S. (2023). Extract-

ing domain models from textual requirements in the

era of large language models. In 5th Ws. on Artiﬁcial

Intelligence and Model-driven Eng. @ ACM/IEEE

MODELS 2023, pages 580–587. IEEE.

Bazin, A., Georges, T., Huchard, M., Martin, P., and Tiber-

macine, C. (2024). Exploring the 3-dimensional vari-

ability of websites’ user-stories using triadic concept

analysis. Int. J. Approx. Reason., 173:109248.

Bragilovski, M., van Can, A. T., Dalpiaz, F., and Sturm,

A. (2024). Deriving domain models from user stories:

Human vs. machines. In 32nd IEEE Int. Requirements

Engineering Conf., RE 2024, pages 31–42. IEEE.

Dalpiaz, F. (2018). Requirements data sets (user stories).

Mendeley Data, V1, doi: 10.17632/7zbk8zsd8y.1.

Dalpiaz, F., Schalk, I. V. D., Brinkkemper, S., Aydemir,

F. B., and Lucassen, G. (2019). Detecting terminolog-

ical ambiguity in user stories: Tool and experimenta-

tion. Inf. Softw. Technol., 110:3–16.

Ganter, B. and Obiedkov, S. A. (2004). Implications in

triadic formal contexts. In Conceptual Structures at

Work: 12th ICCS 2004, volume 3127 of LNCS, pages

186–195. Springer.

Ganter, B. and Wille, R. (1999). Formal Concept Analysis -

Mathematical Foundations. Springer.

Gomez-Vazquez, M. and Cabot, J. (2024). Exploring the

use of software product lines for the combination of

machine learning models. In Proc. of the 28th ACM

Int. Systems and Software Product Line Conference,

SPLC ’24, page 26–29. ACM.

Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L.,

Luo, X., Lo, D., Grundy, J., and Wang, H. (2024).

Large language models for software engineering: A

systematic literature review. ACM Trans. Softw. Eng.

Methodol., 33(8).

Jahan, M., Hassan, M. M., Golpayegani, R., Ranjbaran,

G., Roy, C., Roy, B., and Schneider, K. (2024).

Automated Derivation of UML Sequence Diagrams

from User Stories: Unleashing the Power of Genera-

tive AI vs. a Rule-Based Approach. In Proc. of the

ACM/IEEE 27th Int. Conf. on Model Driven Engi-

neering Languages and Systems, MODELS ’24, page

138–148. ACM.

Lehmann, F. and Wille, R. (1995). A triadic approach to

formal concept analysis. In 3rd Int. Conf. on Concep-

tual Structures, ICCS ’95, volume 954 of LNCS, pages

32–43. Springer.

Lucassen, G., Dalpiaz, F., Van der Werf, J. M., and

Brinkkemper, S. (2016). Improving agile require-

ments: the quality user story framework and tool. Re-

quirements Engineering, 21.

Mandal, S., Chethan, A., Janfaza, V., Mahmud, S. M. F.,

Anderson, T. A., Turek, J., Tithi, J. J., and Muza-

hid, A. (2023). Large language models based au-

tomatic synthesis of software speciﬁcations. CoRR,

abs/2304.09181.

Mondal, S., Bappon, S. D., and Roy, C. K. (2024). En-

hancing user interaction in chatgpt: Characterizing

and consolidating multiple prompts for issue resolu-

tion. In 21st IEEE/ACM Int. Conf. on Mining Software

Repositories, pages 222–226. ACM.

Pohl, K., B

ockle, G., and van der Linden, F. (2005). Soft-

ware Product Line Engineering - Foundations, Prin-

ciples, and Techniques. Springer.

Ronanki, K., Cabrero-Daniel, B., and Berger, C. (2024).

Chatgpt as a tool for user story quality evaluation:

Trustworthy out of the box? In Agile Processes in

Software Engineering and Extreme Programming –

Workshops, pages 173–181, Cham. Springer Nature

Switzerland.

Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu,

A., Si, C., Li, Y., Gupta, A., Han, H., Schul-

hoff, S., Dulepet, P. S., Vidyadhara, S., Ki, D.,

Agrawal, S., Pham, C., Kroiz, G., Li, F., Tao, H.,

Srivastava, A., Costa, H. D., Gupta, S., Rogers,

M. L., Goncearenco, I., Sarli, G., Galynker, I.,

Peskoff, D., Carpuat, M., White, J., Anadkat, S.,

Hoyle, A., and Resnik, P. (2024). The prompt re-

port: A systematic survey of prompting techniques.

https://arxiv.org/abs/2406.06608.

Voutsadakis, G. (2002). Polyadic concept analysis. Order,

19(3):295–304.

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert,

H., Elnashar, A., Spencer-Smith, J., and Schmidt,

D. C. (2023). A Prompt Pattern Catalog to En-

hance Prompt Engineering with ChatGPT. CoRR,

abs/2302.11382.

Variability-Driven User-Story Generation Using LLM and Triadic Concept Analysis

625