Extensive Conformance Testing and Validation of FHIR

Data Exchange

Variabilities

Abderrazek Boufahja

1 a

and Tanmay Verma

2 b

GE HealthCare, Strasbourg, France

GE HealthCare, Bellevue, Washington, U.S.A.

{abderrazek.boufahja, tanmay.verma}@gehealthcare.com

Keywords:

HL7

, FHIR

, Conformance, Validation, Testing, API.

Abstract:

The emergence of FHIR

standard during the last years was accompanied with the development of many

FHIR

servers, some of them are commercials, and many are open-source projects, with a wide deployment

in production. The FHIR

standard deﬁnes a complete RESTful API allowing access and sharing of clinical

resources participating in dozens of healthcare workﬂows. The deﬁned API comes with a complete list of vari-

ations in CRUD operations and in search queries. For instance, every search parameter comes with multiple

searching ﬂavours, making the implementation of the hundreds of search parameters complex, and the servers

capability claims hard to verify by FHIR

clients, especially for those who use edge search capabilities. In

this paper, we used a method to test exhaustively the large number of variabilities in the RESTful FHIR

API

that can be implemented by a FHIR

server, by generating thousands of test scripts, using directly the formal

description of the FHIR

standard. The method allows validating the different search variabilities and brings

a deep view of the capabilities of the tested FHIR

servers. An implementation of the method was experi-

mented, and the generated scripts were tested with multiple FHIR

servers. The testing of different FHIR

servers highlighted the conformance of most of them to the FHIR

standard, even if some discrepancies be-

tween the claims of some FHIR

servers and their current implementations were observed and analysed. We

concluded the paper with an analysis of the search variabilities with commonly found behaviours and lim-

itations. The overall work highlights the importance of a complete and strong testing strategy for a better

integration and patient care.

1 INTRODUCTION

Fast Healthcare Interoperability Resources (FHIR

)

(Benson and Grieve, 2016; Ayaz et al., 2021) standard

is widely adopted today across the globe as the new

healthcare standard to exchange clinical data (Braun-

stein, 2022). It deﬁnes clinical concepts as resources,

and deﬁnes an API to create, manipulate and search

these resources. Dozens of resources were deﬁned,

and hundreds of search parameters can be used to

query the FHIR

API (HL7, 2023). Also, for every

search parameter, many variabilities may exist, based

on the type of the search parameter. Thus, FHIR

API counts thousands of variabilities and query pos-

sibilities. Due to this complexity and the richness of

the API variabilities, the FHIR

servers implement-

ing FHIR

standard as a fac¸ade for FHIR

resources

https://orcid.org/0000-0002-6481-2185

https://orcid.org/0009-0000-2337-7239

expose usually different supported variabilities, and

usually target implementation of the needed capabil-

ities for the use cases they are supporting and target-

ing. Many FHIR

servers come as a full FHIR

server

implementation as well, meant to be used as a reposi-

tory of FHIR

resources. These FHIR

servers imple-

ment many capabilities of the FHIR

API, and try to

cover as much as possible, to cover the most possible

use cases, and to stay agnostic to the use cases vari-

ations. FHIR

server adopters can only rely on the

declared capability statement performed by FHIR

servers providers. Many tools allow testing the ca-

pabilities of FHIR

servers, but many of them are

only testing the high-level capabilities, without go-

ing deeper on the test variations. In this paper, we

describe a method to perform extensive conformance

testing of FHIR

API, and reverse generation and val-

idation of the FHIR

server capability statement.

Boufahja, A. and Verma, T.

Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities.

DOI: 10.5220/0013370000003911

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 297-307

ISBN: 978-989-758-731-3; ISSN: 2184-4305

297

2 STATE OF THE ART

2.1 FHIR

Standard

Fast Healthcare Interoperability Resources (FHIR

)

(Benson and Grieve, 2016) is a new generation stan-

dard developed by HL7

(Health Level Seven) orga-

nization, to respond to the technology evolution of the

last years. The ﬁrst version of the standard was pro-

posed in 2011. Since then, multiple versions were re-

leased. The latest version available during this work

is the version R5 (5.0.0). FHIR

deﬁnes two principal

components: resources and APIs. Resource types de-

ﬁne the data elements, constraints, and relationships

with other resource types. They describe a modelling

of real-world clinical information. There are nearly

150 resource types deﬁned by FHIR

covering most

of the known entities used in healthcare. FHIR

stan-

dard keeps a large ﬂexibility to cover healthcare use

cases. It follows the Reuse and Composability princi-

ple, following the rule 80/20: FHIR

standard makes

the modelling of the most known elements, and keeps

the possibility for speciﬁc use cases to extend the core

resource types. FHIR

deﬁnes APIs to access the

resources, and to manipulate these resources. The

FHIR

API is following Level 2 of REST Maturity

Model (Ozdemir, 2020). The APIs deﬁned by FHIR

have three levels of interactions: APIs at the resource

instance level, APIs at the resource type level, and

APIs at the system interaction level. Instance level in-

teractions allow mainly CRUD operations, and com-

partment access for deﬁned HL7

compartments. Re-

source type level allows mainly search operations,

and the system level APIs allow batch and transaction

queries, as well as few other capabilities. The most

complicated API capability is the search at resource

type level, as every resource has sometimes dozens

of search parameters, and every search parameter can

have many searching variabilities.

FHIR

REST API is described under the page

RESTful API from the FHIR

standard, and the

search variabilities are described in the “Search” page

from the FHIR

standard (HL7, 2023). The FHIR

API variabilities are usually described under a Capa-

bilityStatement resource, exposed using the capabili-

ties interaction. A CapabilityStatement can describe

a FHIR

client or a server of resources. In our study,

we focus on server capabilities. The CapabilityState-

ment resource has three ﬂavours: instance, capability,

and requirements. CapabilityStatement can describe

different behaviours of a FHIR

server, like the se-

curity of implementation, the implemented resources,

the system level interaction, the supported resources,

the operations for every resource, and the supported

search parameters for every resource.

2.2 Search Parameters Variabilities

Search parameters variabilities are described in the

“Search” page in the FHIR

standard. A search pa-

rameter can be described in a minimal manner in the

CapabilityStatement, with just exposing the search

parameter name, or it can be a complete descrip-

tion of the search parameter variations with a link

to a SearchParameter resource, describing the sup-

ported search variabilities. Every search parame-

ter can have multiple variabilities, many of them

can be documented in the SearchParameter resource.

For instance, modiﬁers can be described as part of

the SearchParameter resource, comparators as well.

Comparators are also called preﬁxes. All the vari-

abilities related to modiﬁers and comparators are de-

scribed in the documentation “RESTful API Search”

under FHIR

standard, and most of them depend on

the type of the search parameter. FHIR

deﬁnes nine

different search parameter types: date, number, quan-

tity, reference, string, token, uri, composite, and spe-

cial. Every search parameter type has many varia-

tions, like comparators, modiﬁers, the structure of the

searched value, and its precision.

Every resource has a speciﬁc number of search pa-

rameters, deﬁned in the FHIR

standard (HL7, 2023).

Every resource of type DomainResource, can have an

extra search parameter: ‘ text’. In FHIR

R4, all the

resources extend DomainResource, except Bundle,

Parameters, and Binary. However, all the resources

extend Resource type, directly or through inheritance.

Thus, all the FHIR

R4 resources can implement any

search parameters from the Base Resource deﬁnition:

‘ content’, ‘ id’, ‘ lastUpdated’, ‘ proﬁle’, ‘ query’,

‘ security’, ‘ source’, and ‘ tag’.

Figure 1 describes the number of search param-

eters for FHIR

R2, R3, R4 and R5. We collected

the number of the search parameters that FHIR

re-

sources can support, but we excluded the common

search parameters described above, which are inher-

ited from DomainResource and Resource. The num-

ber of resources did rise between R2 and R5 release

and was slightly stable after the release R4. However,

the number of search parameters continues to rise ev-

ery release. For R4, there are more than 1600 search

parameters.

Some common search parameters are quite com-

plicated and can have dozens of variations. For ex-

ample, the search parameter ‘ has’ is described by the

notion of reverse chaining. Every other resource that

can reference the current resource through a search

parameter can be used to search on the current in-

HEALTHINF 2025 - 18th International Conference on Health Informatics

298

Figure 1: Number of search parameters per FHIR

version,

excluding common search parameters.

stance of the resource. ‘ has’ search parameter can

also be nested, and in this case, we can have complex

search parameter structures.

For a better understanding of the FHIR

API vari-

ations, we worked on a quantitative identiﬁcation

of all the variations. For every resource, for every

search parameter, we identiﬁed the possible varia-

tions that can be ﬁlled by a FHIR

server API. For

every search parameter, based on its type, we identi-

ﬁed the different variations related to that search type.

For search parameters of type reference, we identi-

ﬁed the chained search parameters as well. For some

search parameter types, we did go beyond the def-

inition and variations deﬁned in the SearchParame-

ter resource. For example, for string-based search

parameters, equality to a value was reﬁned to three

variations: strict equality, equality with case sensitive

variation, and equality with a begin-with behaviour.

In FHIR

standard, string matching without modi-

ﬁers returns results that have insensitive matching el-

ements equal or start with the searched value; how-

ever, FHIR

servers may implement one or the other

variability, on purpose or by error. For date-based

search parameters, even if the FHIR

standard de-

scribes that any precision level can be used in search-

ing values, some FHIR

servers may not support all

the variabilities in searching by year, month, day, min-

utes, or seconds. Making these capabilities as a unit

of variability within a FHIR

server allows having a

deep understanding of the capabilities of the FHIR

server, which goes beyond what FHIR

deﬁnes. We

identiﬁed for instance more than 50 variabilities for

every date-based search parameter. Every precision

can be coupled with the different preﬁxes used on

dates, and every combination of date precision and

preﬁx is a searching variation. For token-based search

parameters, the structure of the search token can be

(code), (system|code), (|code), or (system|). Also, it

can have multiple modiﬁers like :text, :not, :above,

:below, :in and :not-in. Besides chained parameters,

reference-based search parameters can have three dif-

ferent searching structures: searching by (id), by (Re-

sourceType/id), or searching by the full URL. Also,

it can support modiﬁers like :identiﬁer, or an explicit

resource type as a modiﬁer.

More than 3500000 variabilities were identiﬁed

and can be supported by FHIR

R4 servers. This

number does not include iterating on the parameters

‘ has’, ‘ include’, and ‘ revinclude’. It includes only

the ﬁrst level for chained parameters, and it does not

include combination of search parameters. This esti-

mation does not include Compartment-based variabil-

ities either. ‘ has’ variabilities represent 80% of the

overall variabilities, and more than 98% are related to

the parameters ‘ has’, ‘ revinclude’ and chained pa-

rameters. The remaining is nearly 60000 variabilities,

which is still a substantial number to test and to cover

for a FHIR

server.

2.3 CRUD Operations Variabilities

FHIR

API supports level 2 of the REST Maturity

Model (Ozdemir, 2020; Webber, 2010). In FHIR

R4, interactions with the FHIR

servers are divided in

three categories: six instance level interactions, three

type level interactions, and four system level interac-

tions. Every interaction has multiple variabilities. For

example, ‘update’ interaction in instance level can be

described with two variabilities: update of an exist-

ing resource, and update-as-create behaviour. ‘patch’

interaction can be reﬁned to 15 different variations.

HL7

deﬁned two major methods to do patching:

JSON patch and FHIR

patch, and every method has

many variations: adding, copying, moving, etc. Also,

conditional operations make every CRUD operation

reﬁned to multiple variations and use cases, based on

the used condition and its applicability. We identiﬁed

27 variations for CRUD operations on every FHIR

resource, which represent nearly 4000 variations.

2.4 Validation Methods and Tools

Multiple tools and methods exist as FHIR

valida-

tion tools, many of them are open-source. FHIR

standard and community tried to document most of

these resources in the FHIR

standard itself through

the Implementation Support module and the Valida-

tion Resources page, or in related HL7

conﬂuence

pages (HL7, 2024a; HL7, 2024b). FHIR

testing

tools can be divided in two categories: resources val-

idation tools and API exchange testing tools.

Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities

299

2.4.1 Resources Validation Tools

Resources validation principles and methods are de-

scribed in the FHIR

standard through the page Val-

idating Resources. There are mainly six methods

checking eight different aspects. FHIR

standard

comes with an XSD schema and schematrons, which

can be used by many XSD-based and schematrons-

based validation tools. Also, FHIR

standard pro-

vides a JSON schema, which can be used with many

JSON validation software. FHIR

community main-

tains a FHIR

validation tool that can be used in

command line to validate FHIR

resources against

standard FHIR

requirements, or against extended re-

quirements related to FHIR

proﬁles. FHIR

com-

munity provides as well a web-based validation tool

that can be used online to validate FHIR

resources

(Otasek, 2024). Some studies were performed as

well in order to validate FHIR

RDF presentation

of resources using Shape Expressions (ShEx) (Sol-

brig et al., 2017). This ShEx based method can be

used as well to validate XML and JSON presenta-

tions of FHIR

resources through transformation to

RDF presentation. Many provide open-source or on-

line FHIR

content validation tools. Here is a non-

exhaustive and unordered list of online content valida-

tion tools: Simpliﬁer validation tool, Firely validation

tool, Inferno validation service, Gazelle FHIR

vali-

dation tool, Infoway FHIR

Validator, Aidbox FHIR

Schema Validator and clinFHIR resource validator.

Some FHIR

sandboxes provide as well $validate op-

eration like Aegis sandbox or HAPI FHIR

online

server. We can note as well FHIR

Notepad++ Plugin

and FHIR

tools plugin for Visual Studio Code (Lag-

ger, 2023), which provide a validation capability for

FHIR

resources.

The existence of all these tools conﬁrms the matu-

rity of FHIR

content validation tools, and their wide

usage amongst FHIR-based products providers.

2.4.2 FHIR

API Exchange Testing Tools

FHIR

resources are exchanged usually through the

REST API deﬁned and described in the FHIR

stan-

dard, even if other exchange mechanisms could be

used. Most of the FHIR

servers provide testing

sandboxes to accelerate integration with FHIR

ap-

plications. For instance, most of the EMRs pro-

vide some testing sandboxes with some connectiv-

ity validation process and are particularly useful to

reach a high level of interoperability for the FHIR

clients. Some open-source FHIR

servers can be con-

ﬁgured and installed in a private network and used

to test FHIR

clients’ implementations against these

FHIR

servers. Here is a non-exhaustive and un-

ordered list of open-source FHIR

servers: HAPI

FHIR

server (Hussain et al., 2018), LinuxForHealth

FHIR

Server (Opie, 2024), Microsoft FHIR

server

(Opie, 2024), FHIR

Candle (Canessa, 2024), Pascal

FHIR

Server (HealthIntersections, 2024), and Spark

(Kramer, 2024).

FHIR

standard deﬁned TestScript resource, pro-

viding agnostic and interoperable method to share

test designs and test deﬁnitions, in executable for-

mat, with computable actions and interpretable veri-

ﬁcation instructions. Also, FHIR

deﬁnes TestReport

resource which can be used to share a summarized

testing report following the execution of a TestScript.

Other methods exist as well, deﬁning generic struc-

tures for tests and test execution steps (GITB, 2015;

Scanlon, 2024).

Many tools allow testing FHIR

resources ex-

change through the FHIR

API. Some of them are

open-source, others are proprietary or having hy-

brid access. For instance, all HTTP REST testing

tools and automation tools can be leveraged to test

FHIR

servers APIs, like Postman or SOAPUI (SOA-

PUI, 2024). We can list this non-exhaustive and un-

ordered open-source and commercial FHIR

APIs

testing tools:

• Touchstone (Walonoski et al., 2018) developed

by AEGIS, offers automated FHIR

testing for

servers and clients implementations, leveraging

FHIR

TestScript resource. It can be used to au-

tomate FHIR

data exchange testing, can be used

with pre-designed TestScript resources, and can

be used as a testing framework to add new test

scripts.

• Crucible (Walonoski et al., 2018; Scanlon, 2024)

developed by MITRE, offers a set of open-source

testing tools for FHIR

. Can be used to test

FHIR

servers data exchange conformance and

can score patient records.

• Inferno (Kramer and Moesel, 2023) is an open-

source tool that helps testing conformance to

FHIR

standard. Besides its resources’ valida-

tor, Inferno provides a web-based application to

execute online testing using many available test

kits. Every test kit is a list of deﬁned tests that can

be executed against a speciﬁc FHIR

server end-

point. FHIR

servers can use predeﬁned test kits

or can develop their own test kits using Inferno

framework.

• Caristix Test (Caristix, 2021) provides the possi-

bility to automate testing of FHIR

implementa-

tions using Scenario Editor.

• NIST FHIR

Toolkit (NIST, 2024) is an open-

source FHIR

testing tool, mainly for IHE

MHD

HEALTHINF 2025 - 18th International Conference on Health Informatics

300

FHIR

based proﬁle.

• Gazelle PatientManager tool allows simulating

some initiating actors as FHIR

client for some

FHIR-based IHE

proﬁles, like PDQm or PIXm.

Also, it can act as a FHIR

server to test some

IHE

proﬁles.

• TestScript Engine (MITRE, 2023) is an open-

source testing engine. It is able to interpret and

execute TestScript FHIR

resources and generate

TestReport resources following command line ex-

ecution.

This list is not exhaustive, we think many other

proprietary or open-source FHIR

testing tools and

frameworks may exist. Some of the tools come with

preconﬁgured test deﬁnitions, some tools provide a

framework to create test deﬁnitions, which may fol-

low TestScript resource structure, or some other pro-

prietary structures.

The vast number of variabilities makes FHIR

servers claims difﬁcult to check and to validate and

makes sometimes the integration of clients with het-

erogeneous FHIR

servers complex without staging

and testing phases. We used in this paper a method

to generate unitary veriﬁcation tests directly from the

FHIR

standard, to cover exhaustively all the possible

variations in the FHIR

API, and to provide a clear

knowledge on the FHIR

server capabilities.

3 METHOD

To test agnostically all API variabilities that can be

implemented by a FHIR

standard, a huge number

of tests needs to be written, based on the variabili-

ties analysis described above. The aim of this method

is to avoid writing test deﬁnitions or test scripts, and

to generate them automatically based on computable

artifacts from the FHIR

standard. In fact, writing

manually all the identiﬁed tests is time consuming,

and error prone, which is something we are avoiding

with our method.

From the FHIR

standard, two main artifacts are

used: the ResourceType.proﬁle.json ﬁles containing

the StructureDeﬁnition of the FHIR

resource types,

and the ﬁle search-parameters.json, containing a com-

putable deﬁnition of all the search parameters in the

FHIR

standard, as described in Figure 2. Also, the

generator engine takes a list of test script templates as

input, containing a template of test deﬁnition for ev-

ery variation for every search parameter type. Every

template is meant to be compiled using inputs related

to tested resources and tested search parameters. The

Figure 2: Test scripts generation overview.

generator engine uses these inputs as described in Fig-

ure 3.

The list of resource types is extracted from the

FHIR

standard. For every resource type, the list

of supported search parameters is extracted from the

search-parameters.json ﬁle. Every search parameter

is deﬁned with some information, like the target re-

sources, the type of the search parameter and the path

in the FHIR

resources to the elements that need to

be mapped to the search parameter. The path of the

search parameter is used to validate the returned bun-

dles following test scripts execution, it is used to com-

pare the searched values, and the returned resources

content. For every search parameter found in search-

parameters.json ﬁle, the list of variations is identi-

ﬁed based on the search parameter type. For ev-

ery variability, we use the deﬁned repository of test

scripts templates to identify the test script template

to be used for the speciﬁc identiﬁed variation. Then,

based on that template, the resource StructureDeﬁni-

tion, and the search parameter attributes, the generator

engine generates the ﬁnal test script of the variability.

For example, for the Organization resource, we have

in search-parameters.json multiple identiﬁed related

search parameters, like type, partof, and address. The

parameter type is a token, for which we identiﬁed in

our study 15 different possible variabilities. For ev-

ery variation type, a test template is deﬁned, and is

used to generate a test script meant for testing only

a speciﬁc variability on the ‘type’ search parameter

for the Organization resource type. Every variability

type for every search parameter type has its own test

script template, which contains some testing steps and

Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities

301

Figure 3: Test scripts generation process.

testing logic, including negative testing and validation

mechanism of the returned Bundle following a search

query. For instance, one of the authors has a patent

application related to this method, with more techni-

cal details. The generation of the CRUD variabilities

for every resource was following the same pattern as

well, with test script templates used to generate test

scripts for every resource type.

When executed on a FHIR

server, the generated

test scripts allow identifying the capabilities of the

tested FHIR

server through testing, instead of iden-

tifying these capabilities through only accessing the

CapabilityStatement declaration. This enables some

comparison and validation of the declared capability

statement, as described in Figure 4.

The ﬁrst step before executing the generated test

scripts is the initialization of testing variables. In fact,

to search FHIR

resources using search parameters,

we need to identify existing resources in the FHIR

server – or create new FHIR

resources – and ex-

Figure 4: CapabilityStatement generation and validation.

tract the right values for all tested search parame-

ters, using the path deﬁnition identiﬁed in the search-

parameters.json ﬁle, from FHIR

standard. When the

FHIR

server supports create interaction, the initial-

ization of the FHIR

server is done with static FHIR

resources as part of the initialization process.

During the execution of the generated test scripts

on a FHIR

server API, test results allow detecting

the different searching variabilities. These variabil-

ities results are used to generate a computed Capa-

bilityStatement which can be compared with the de-

clared CapabilityStatement from the FHIR

server.

This comparison allows us to highlight the differ-

ences between what is claimed, and what is imple-

mented. Because of the considerable number of gen-

erated test scripts, the declared CapabilityStatement

by the FHIR

server can be used to ﬁlter the tests to

be executed.

4 APPLICATION

4.1 API Testing with Inferno

We implemented the method described above using

Inferno framework (Scanlon, 2024), to generate test

HEALTHINF 2025 - 18th International Conference on Health Informatics

302

scripts for API variabilities for FHIR

R4 standard.

Inferno has a method to develop test kits, and the per-

formed implementation was through generating test

ﬁles, cascaded together to map every variability to

a generated test, executed using Inferno UI or its

API. Generation of tests related to the variabilities on

‘ has’, ‘ revinclude’ and chained parameters were ig-

nored in our testing. Over the remaining 60000 vari-

abilities, our templates and our generation of tests re-

sulted in covering more than 42000 variabilities. The

following types of search parameters were covered

completely in our testing: string, date, number and

uri. ‘reference’ type was not fully covered as chained

parameters, and ‘identiﬁer’ sufﬁx were complicated

to test, and needed further parsing and following of

references on the collected resources. Token-based

search parameters were nearly fully tested with all

variabilities, except subsumes on ‘above’ and ‘below’

sufﬁxes, as well as ‘in’ and ‘not-in’ sufﬁxes. This

is because of the complexity of verifying these op-

erations, which needs connectivity to a terminology

service. Composite based search parameters were not

tested, but their number is quite small compared to the

other types. Common search parameters were tested

in all the resource types, with all their variations. The

tested common parameters are id, lastUpdated, tag,

proﬁle, security, and source. Most CRUD opera-

tions and variabilities were tested as well. Compart-

ment based resources APIs were not tested during this

application.

Test data were deﬁned to ﬁll the tested FHIR

server with initial FHIR

resources. The deﬁnition

of the test data was one of the major steps, as these

data needs to be heterogeneous and rich enough to

create an optimal testing environment. These test data

were designed to enable testing of most of the targeted

search parameters. For every executed test, four sta-

tuses can be reported:

• Pass: the variability is supported.

• Fail: the test was failing, the FHIR

API does not

support the tested variability.

• Crash: the test was crashing during execution.

• Missing test data: the collected test data are not

sufﬁcient to test the targeted variability.

• No test deﬁned: the variability cannot be tested

nor veriﬁed.

Failed tests are usually because the asserts and

veriﬁcations as part of the tests were failing. When a

test crashes, it is mostly because the variability is not

supported. For example, the API may return a 400

Bad Request response. Tests executed and marked

as ‘missing test data’ are usually because of miss-

ing searching values. Before executing searching on

parameters of a speciﬁc resource, we need to col-

lect FHIR

resources from the server. The collected

FHIR

resources may not contain all the needed in-

formation that allows us to perform all the search pa-

rameters. Inferno framework enables for instance pa-

rameters sharing between tests, which enable search

parameters initialization through create and search

phases. The tests marked as ‘missing test data’ or ‘no

test deﬁned’ cannot conﬁrm or deny the implementa-

tion of a speciﬁc variability of the FHIR

server API.

4.2 Extensive Testing Results

We executed the generated tests on many open-source

FHIR

servers, as well as on some available sand-

boxes, many of them are described in the state of the

art. We anonymized the FHIR

server names, the goal

of the analysis is not to compare between them, but to

identify implementation variations and common be-

haviours. The overall number of executed tests was

nearly 400 000 tests.

4.2.1 Overall Testing Results

Testing the FHIR

servers was time challenging, as

the execution of more than 42000 tests per server

takes several hours, and sometimes days, to ﬁnish ex-

ecution. Every test makes many FHIR

queries to test

combinations of positive and negative test steps. The

testing results are resumed in Figure 5 and Figure 6.

During testing, we did not ﬁlter based on the declared

CapabilityStatement, we executed the full test suite.

Figure 5: FHIR

servers search variabilities coverage.

Different open-source FHIR

servers support

wide amount of conﬁguration parameters, which may

activate searching and operations capabilities, like in-

dexing of the missing elements, activation of text

search capabilities, activation of update-as-create ca-

pability, acceptance of non-resolvable references in

FHIR

resources, and many other conﬁgurations.

During the testing and conﬁguration of the tested

open-source FHIR

servers, we tried to activate the

Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities

303

Figure 6: FHIR

servers CRUD variabilities coverage.

most possible and understandable level of indexation

and capabilities, but we acknowledge that some capa-

bilities may be missed during the conﬁguration pro-

cess.

Most of the tested FHIR

servers presented a good

coverage of the different tested search capabilities.

However, we remarked that some FHIR

servers were

presenting many testing failures, highlighting non-

supported features by their APIs. In CRUD variabil-

ities testing, many tests were crashing, especially be-

cause the received responses were with an HTTP sta-

tus indicating an error, highlighting non-supported ca-

pabilities.

4.2.2 Search Variabilities Analysis

For variabilities related to search parameters, we

grouped the different tested results per variability type

and per resource type. The Figure 7 describes the

number of failures per variability type, tagged by the

types of the search parameters. The Figure 7 anal-

yses variability types where we have more than 200

encountered failures, during testing with the FHIR

servers described above. The included tests are

only related to supported search parameters by tested

servers. Every line in Figure 7 describes a tested vari-

ability, like for example searching by some speciﬁc

preﬁx for date-based search parameter, or searching

by ‘exact’ modiﬁer for a string-based search parame-

ter. We grouped all the tests here per variability type.

Figure 7: Number of failures per variability type.

We remark that all the tested search parameter

types have some high number of failures, except

quantity and number-based search parameters, which

is understandable as the number of the related search

parameters is already low in FHIR

standard.

The less supported variabilities are related to

token-based search parameters and are related to

searching by the modiﬁers ‘above’, ‘below’, ‘text’,

‘missing’, ‘not’ and ‘of-type’. The performed tests

highlight elevated level of failure on these modiﬁers,

which means a totally not supported feature in some

FHIR

servers, or a mis-implementation of the vari-

ability. Searching for equality to (system|) variabil-

ity type was failing, which concluded the low support

and implementation of this variability in token-based

search parameters.

Searching with the modiﬁers ‘above’ and ‘below’

were the most encountered failing tests for uri-based

search parameters. For reference-based search pa-

rameters, searching by the full URL of the reference

was poorly supported. An interesting, detected fail-

ure was related to searching references by resource

ID. It seems some FHIR

servers prefer searching by

Resource/ID structure instead of searching directly by

the resource ID.

Many date-based search variabilities were failing

as well. Searching with ‘ap’ preﬁx was often failing

(> 3000 tests failed), highlighting poor support of this

preﬁx. Testing with the preﬁx ‘ne’ was often failing

(∼ 2500 tests failed). Variabilities related to search-

ing with minutes precision were often failing (> 2500

tests failed). More than 1000 tests were failing related

to search preﬁxes ‘le’ and ‘ge’, which is an alarm-

ing observation: FHIR

clients commonly use these

preﬁxes, non-compliance or misinterpretation of their

meaning can have harmful impact on patient care.

We remark that string-based search parameters

were well implemented for instance in most of the

tested search parameters. Number and quantity-based

search parameters were not shown in Figure 7, as their

number was small. To have a better representation,

Figure 8 describes the rate of failure over the total

number of executions of a speciﬁc variability type

for a speciﬁc search parameter type. The included

tests are only related to supported search parameters

by tested servers. The diagram contains only failures

with a rate higher than 0.1 of failure occurrence.

Figure 8: Rate of failures per number of tests executed for

each variability type.

HEALTHINF 2025 - 18th International Conference on Health Informatics

304

Token-based search parameters still appear in Fig-

ure 8 with an important level of failures, related

to the same variabilities highlighted above: modi-

ﬁers ‘above’, ‘below’, ‘text’, ‘missing’, ‘not’ and

‘of-type’. Uri-based search parameters were also

failing tests related to variabilities ‘above’ and ‘be-

low’, which in both cases describes a poor support

of these variabilities. Date-based search parameters

and reference-based search parameters were as well

present with high number of failure percentage, still

related to the same variabilities described above.

Although the number of tests related to number-

based search parameters is low, the percentage of fail-

ure on the different searching variabilities is high.

Most of the failed tests are related to the usage of pre-

ﬁxes, which was poorly supported or implemented,

especially the preﬁxes ‘ap’, ‘eb’, and ‘sa’. The

same observation can be concluded on quantity-based

search parameters, with a smaller number of identi-

ﬁed failures. The preﬁxes ‘ap’, ‘eb’ and ‘sa’ can be

considered for instance edge cases.

5 DISCUSSIONS

SearchParameter resource covers many variations for

all search parameters and can be linked to the Capa-

bilityStatement of the FHIR

server to provide more

details about the level of implementation. Even so,

this may not be sufﬁcient to describe all the possible

implementation variabilities, which may not be con-

formant to the deﬁnition and requirements deﬁned by

FHIR

. For example, searching on string parameters

is not case sensitive, but some FHIR

API implemen-

tations may implement only a case sensitive searching

parameter, accidentally or intentionally. This kind of

variability may be difﬁcult to describe in the Search-

Parameter resource. Another example will be date-

based search parameters. FHIR

server implementa-

tions may implement searching by year, month, date,

minutes, second, or fraction of seconds capabilities,

and for every precision type, the implementation can

support the different preﬁxes that the standard deﬁned

(‘eq’, ‘gt’, ‘lt’, etc.). The number of variations for

date-based search parameters was more than 50, be-

side some common variations to all search parame-

ters, like support for ‘missing’ modiﬁer and OR/AND

searching variabilities.

We described a method that allows generating au-

tomatically all the tests for all the variabilities of the

FHIR

API, using the formal deﬁnition of the FHIR

standard. The generated tests allow us to understand

the capabilities of the FHIR

API, and to compare and

validate the claimed capabilities within the FHIR

CapabilityStatement declared by the tested FHIR

server. The method described in this study can be

extended to test as well custom resources in FHIR

servers and their search variabilities (Boufahja et al.,

2021). We implemented this method by writing tem-

plates to generate tests as part of a test kit integrated

locally with Inferno testing platform. Some tests were

not generated as part of the application due to the

complexity of the tests. We executed the generated

tests against many testing servers and sandboxes, in-

cluding open-source FHIR

servers. The goal was not

to compare FHIR

servers’ providers, but to enhance

generic knowledge on servers’ capabilities and com-

mon issues and discrepancies. The implementation

of the method was using Inferno testing framework,

which was very efﬁcient for implementing and exe-

cuting the different tests.

Many difﬁculties were observed during testing

date-based search parameters. First, the search pa-

rameters of type date can refer to multiple data types,

like dateTime, Period, and Timing, which makes the

veriﬁcation of the search operation complex. In date

comparison, there are some complexities as well in

handling the time zone. For date search that goes to

the time level, we always include the time zone as part

of the search parameter, to avoid server-based inter-

pretation of the meaning of the query. For many of the

tested FHIR

servers, many tests were failing with le,

ge, lt and gt preﬁxes, which was a surprising ﬁnding.

This can be explained by the complexity of the inter-

pretation of these preﬁxes and their temporal mean-

ing, for which we experienced this complexity during

writing the veriﬁcation code for the executed queries.

For instance, some of the found errors can become

problematic for the patient care, when the date search-

ing queries are returning wrong information, or miss-

ing information, and this enhances the importance of

clearly identifying the claimed variabilities by FHIR

server and having a strong testing process for all the

claimed variabilities. Testing ‘ap’ preﬁx was compli-

cated as well and hard to verify, especially because of

the lenient interpretation of this preﬁx, which differs

between implementers. Quantities and numbers’ pre-

cisions were not tested. A generic relative period was

considered during the veriﬁcation of the searching re-

sults related to quantities and numbers.

We performed as well testing combinations of

AND and OR values for all search parameters. The

performed tests conﬁrmed that servers may imple-

ment these variations for only a subset of the search-

ing parameters. FHIR

clients should be vigilant

regarding these kinds of variations within the same

FHIR

server.

Common search parameters were tested for every

Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities

305

resource. We remarked a strong adherence of FHIR

servers to support these common search parameters.

However, some edge case variations were failing in

some servers, like testing id:missing, id:not and

lastUpdated:missing variations.

Many FHIR

servers and sandboxes sometimes

support only combined or mixed search parameters.

Because of these kinds of requirements, results from

testing with sandboxes need to be handled carefully,

as failure of executed tests may not be because the

tested server is not supporting the tested variabilities,

but because of mandatory mixed search parameters.

Most of the tested FHIR

servers were not fully

supporting all CRUD operations variabilities, and

many tests were failing. For example, following a

DELETE operation, some FHIR

servers were re-

sponding with 404 status code for a deleted resource,

instead of 410 status code. For update-as-create oper-

ation, some FHIR

servers were responding with 200

status code, instead of 201.

Since the search parameters are generated based

on an initial collection and initialization of the FHIR

resources, returned Bundles can be different between

the different tested servers. Even within the same

sandbox, executing the generated tests twice may re-

sult in different FHIR

queries, due to the variations

on the content of the FHIR

servers and their hosted

resources. Because of this behaviour, sometimes tests

can pass because the tested data were optimal and al-

lowed to have consistent queries and results, even if

we enforced negative testing and results veriﬁcation

in all the generated test scripts. Thus, we consider

that the passing tests are only a strong indication that

the searching variabilities may be well implemented.

Even if all the executed tests were generated au-

tomatically based on the FHIR

standard, and that

we veriﬁed the relevance and the well design of the

templates, edge cases may not be fully tested. The

failed tests were not all veriﬁed manually to check the

relevance of the encountered failure, even if dozens

of manual veriﬁcations of failed tests were performed

and conﬁrmed.

The comparison between the test results and the

claimed capabilities of the FHIR

servers highlighted

discrepancies. For instance, some FHIR

servers

claim support for patch operations, without highlight-

ing which variabilities are supported. Testing their

APIs highlighted only a partial support. Also, in some

FHIR

servers, we ﬁnd that they are supporting some

extra search parameters that they are not declaring in

their FHIR

conformance statement.

This extensive testing method and application

highlighted the importance of a strong testing strat-

egy and mechanism for all the possible implemented

variabilities, which should start ﬁrst by documenting

all the different supported variabilities. Weak testing

mechanism leads to API interpretation errors and can

become dangerous for the patient care in some cases.

6 CONCLUSIONS

The FHIR

standard deﬁnes a complete API for data

access and search, with nearly 150 resources and

more than 1500 search parameters combinations. Ev-

ery search parameter is deﬁned with many variabili-

ties and usage. We identiﬁed for instance more than

3500000 possible variations in the FHIR

API, 98%

of them are related to only three kinds of search vari-

ations. Such variability highlighted the complexity

of putting in place a complete campaign for testing

all the API variations. We described in this paper a

method to generate test scripts using the formal de-

scription of the FHIR

standard in order to cover all

possible variabilities. The method uses test templates

deﬁned based on the search variabilities analysis. The

report from executing the generated test scripts can

be used to validate the FHIR

server claims declared

in their CapabilityStatement. We applied the method

using Inferno framework through implementing and

generating a subsequent number of the identiﬁed test

scripts variabilities. The generated test scripts were

executed with many available FHIR

servers. The

results of execution showed many differences in the

implementation and coverage of the tested FHIR

servers. Many variabilities were poorly supported,

highlighting that FHIR

clients should be aware of

such limitations in FHIR

servers, during developing

FHIR

based applications. Some claimed capabili-

ties in FHIR

servers were not fully supported, and

the execution of the generated tests allowed to iden-

tify clearly every supported and non-supported varia-

tion. Different commonly used variabilities were fail-

ing occasionally, like searching on date with some

common preﬁxes. Such testing failures highlight the

importance of having a complete testing suite and a

testing strategy for FHIR

servers, to provide a better

integration experience and a better patient care.

ACKNOWLEDGEMENTS

We acknowledge strong GE HealthCare support dur-

ing this study from Science and Technology Organi-

zation personnel for their feedback and support that

helped in formulating the conclusions.

HEALTHINF 2025 - 18th International Conference on Health Informatics

306

REFERENCES

Ayaz, M., Pasha, M. F., Alzahrani, M. Y., Budiarto, R., and

Stiawan, D. (2021). The Fast Health Interoperabil-

ity Resources (FHIR) Standard: Systematic Literature

Review of Implementations, Applications, Challenges

and Opportunities. JMIR Medical Informatics, 9(7).

Benson, T. and Grieve, G. (2016). Principles of Health In-

teroperability: SNOMED CT, HL7 and FHIR. Health

Information Technology Standards. Springer Interna-

tional Publishing, Cham.

Boufahja, A., Nichols, S., and Pangon, V. (2021). Custom

FHIR Resources Deﬁnition of Detailed Radiation In-

formation for Dose Management Systems:. In Pro-

ceedings of the 14th International Joint Conference on

Biomedical Engineering Systems and Technologies.

SCITEPRESS - Science and Technology Publications.

Braunstein, M. (2022). Health Informatics on FHIR: How

HL7’s API is Transforming Healthcare. Springer.

Canessa, G. (2024). fhir-candle When you need a small

FHIR. Open-Source Tooling, Webinar Series.

Caristix (2021). Caristix Test Scenario Editor. https://

caristix.com/tutorials/testing/fhir-server-capabilities/.

Accessed: 2024-09-05.

GITB (2015). CEN Workshop Agreement Global eBusi-

ness Interoperability Test Beds GITB Phase 3 | Joinup.

Technical report.

HealthIntersections (2024). Pascal FHIR Server

Reference Implementation. https://github.com/

HealthIntersections/fhirserver.

HL7 (2023). HL7 FHIR Standard. https://hl7.org/fhir/.

HL7 (2024a). FHIR Tools Registry. https://conﬂuence.hl7.

org/display/FHIR/FHIR+Tools+Registry. Accessed:

2024-12-14.

HL7 (2024b). Public FHIR Validation Services.

https://conﬂuence.hl7.org/display/FHIR/Public+

FHIR+Validation+Services. Accessed: 2024-12-14.

Hussain, M. A., Langer, S. G., and Kohli, M. (2018). Learn-

ing HL7 FHIR Using the HAPI FHIR Server and Its

Use in Medical Imaging with the SIIM Dataset. Jour-

nal of Digital Imaging, 31(3).

Kramer, E. (2024). Firely’s FHIR .NET SDK. Open-Source

Tooling, Webinar Series.

Kramer, M. A. and Moesel, C. (2023). Interoperability with

multiple Fast Healthcare Interoperability Resources

(FHIR®) proﬁles and versions. JAMIA Open, 6(1).

Lagger, Y. (2023). Vscode fhir tools. https://github.com/

laggery/vscode-fhir-tools. Accessed: 2024-12-14.

MITRE (2023). Ruby fhir testscript execution engine. https:

//github.com/fhir-crucible/testscript-engine.

NIST (2024). NIST FHIR Toolkit. https://github.

com/usnistgov/asbestos/wiki/Introduction. Accessed:

2024-09-05.

Opie, C. A. (2024). Exploring security vulnerabilities in

fhir server Implementations: a case study on ibm’s fhir

server in the context of the 21st century cures act.

Otasek, D. (2024). The FHIR Validator. Open-Source Tool-

ing, Webinar Series.

Ozdemir, E. (2020). A General Overview of RESTful Web

Services. In Applications and Approaches to Object-

Oriented Software Design: Emerging Research and

Opportunities. IGI Global Scientiﬁc Publishing.

Scanlon, R. (2024). Inferno – FHIR Conformance Testing.

Open-Source Tooling, Webinar Series.

SOAPUI (2024). Getting Started with REST Testing

in SoapUI | SoapUI. https://www.soapui.org/docs/

rest-testing/. Accessed: 2024-09-05.

Solbrig, H. R., Prud’hommeaux, E., Grieve, G., McKen-

zie, L., Mandel, J. C., Sharma, D. K., and Jiang, G.

(2017). Modeling and validating HL7 FHIR proﬁles

using semantic web Shape Expressions (ShEx). Jour-

nal of Biomedical Informatics, 67:90–100.

Walonoski, J., Scanlon, R., Dowling, C., Hyland, M., Et-

tema, R., and Posnack, S. (2018). Validation and

Testing of Fast Healthcare Interoperability Resources

Standards Compliance: Data Analysis. JMIR Medical

Informatics, 6(4).

Webber, J. (2010). REST in Practice. O’Reilly Media.

Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities

307