Extensive Conformance Testing and Validation of FHIR
®
Data Exchange
Variabilities
Abderrazek Boufahja
1 a
and Tanmay Verma
2 b
1
GE HealthCare, Strasbourg, France
2
GE HealthCare, Bellevue, Washington, U.S.A.
{abderrazek.boufahja, tanmay.verma}@gehealthcare.com
Keywords:
HL7
®
, FHIR
®
, Conformance, Validation, Testing, API.
Abstract:
The emergence of FHIR
®
standard during the last years was accompanied with the development of many
FHIR
®
servers, some of them are commercials, and many are open-source projects, with a wide deployment
in production. The FHIR
®
standard defines a complete RESTful API allowing access and sharing of clinical
resources participating in dozens of healthcare workflows. The defined API comes with a complete list of vari-
ations in CRUD operations and in search queries. For instance, every search parameter comes with multiple
searching flavours, making the implementation of the hundreds of search parameters complex, and the servers
capability claims hard to verify by FHIR
®
clients, especially for those who use edge search capabilities. In
this paper, we used a method to test exhaustively the large number of variabilities in the RESTful FHIR
®
API
that can be implemented by a FHIR
®
server, by generating thousands of test scripts, using directly the formal
description of the FHIR
®
standard. The method allows validating the different search variabilities and brings
a deep view of the capabilities of the tested FHIR
®
servers. An implementation of the method was experi-
mented, and the generated scripts were tested with multiple FHIR
®
servers. The testing of different FHIR
®
servers highlighted the conformance of most of them to the FHIR
®
standard, even if some discrepancies be-
tween the claims of some FHIR
®
servers and their current implementations were observed and analysed. We
concluded the paper with an analysis of the search variabilities with commonly found behaviours and lim-
itations. The overall work highlights the importance of a complete and strong testing strategy for a better
integration and patient care.
1 INTRODUCTION
Fast Healthcare Interoperability Resources (FHIR
®
)
(Benson and Grieve, 2016; Ayaz et al., 2021) standard
is widely adopted today across the globe as the new
healthcare standard to exchange clinical data (Braun-
stein, 2022). It defines clinical concepts as resources,
and defines an API to create, manipulate and search
these resources. Dozens of resources were defined,
and hundreds of search parameters can be used to
query the FHIR
®
API (HL7, 2023). Also, for every
search parameter, many variabilities may exist, based
on the type of the search parameter. Thus, FHIR
®
API counts thousands of variabilities and query pos-
sibilities. Due to this complexity and the richness of
the API variabilities, the FHIR
®
servers implement-
ing FHIR
®
standard as a fac¸ade for FHIR
®
resources
a
https://orcid.org/0000-0002-6481-2185
b
https://orcid.org/0009-0000-2337-7239
expose usually different supported variabilities, and
usually target implementation of the needed capabil-
ities for the use cases they are supporting and target-
ing. Many FHIR
®
servers come as a full FHIR
®
server
implementation as well, meant to be used as a reposi-
tory of FHIR
®
resources. These FHIR
®
servers imple-
ment many capabilities of the FHIR
®
API, and try to
cover as much as possible, to cover the most possible
use cases, and to stay agnostic to the use cases vari-
ations. FHIR
®
server adopters can only rely on the
declared capability statement performed by FHIR
®
servers providers. Many tools allow testing the ca-
pabilities of FHIR
®
servers, but many of them are
only testing the high-level capabilities, without go-
ing deeper on the test variations. In this paper, we
describe a method to perform extensive conformance
testing of FHIR
®
API, and reverse generation and val-
idation of the FHIR
®
server capability statement.
Boufahja, A. and Verma, T.
Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities.
DOI: 10.5220/0013370000003911
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 297-307
ISBN: 978-989-758-731-3; ISSN: 2184-4305
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
297
2 STATE OF THE ART
2.1 FHIR
®
Standard
Fast Healthcare Interoperability Resources (FHIR
®
)
(Benson and Grieve, 2016) is a new generation stan-
dard developed by HL7
®
(Health Level Seven) orga-
nization, to respond to the technology evolution of the
last years. The first version of the standard was pro-
posed in 2011. Since then, multiple versions were re-
leased. The latest version available during this work
is the version R5 (5.0.0). FHIR
®
defines two principal
components: resources and APIs. Resource types de-
fine the data elements, constraints, and relationships
with other resource types. They describe a modelling
of real-world clinical information. There are nearly
150 resource types defined by FHIR
®
covering most
of the known entities used in healthcare. FHIR
®
stan-
dard keeps a large flexibility to cover healthcare use
cases. It follows the Reuse and Composability princi-
ple, following the rule 80/20: FHIR
®
standard makes
the modelling of the most known elements, and keeps
the possibility for specific use cases to extend the core
resource types. FHIR
®
defines APIs to access the
resources, and to manipulate these resources. The
FHIR
®
API is following Level 2 of REST Maturity
Model (Ozdemir, 2020). The APIs defined by FHIR
®
have three levels of interactions: APIs at the resource
instance level, APIs at the resource type level, and
APIs at the system interaction level. Instance level in-
teractions allow mainly CRUD operations, and com-
partment access for defined HL7
®
compartments. Re-
source type level allows mainly search operations,
and the system level APIs allow batch and transaction
queries, as well as few other capabilities. The most
complicated API capability is the search at resource
type level, as every resource has sometimes dozens
of search parameters, and every search parameter can
have many searching variabilities.
FHIR
®
REST API is described under the page
RESTful API from the FHIR
®
standard, and the
search variabilities are described in the “Search” page
from the FHIR
®
standard (HL7, 2023). The FHIR
®
API variabilities are usually described under a Capa-
bilityStatement resource, exposed using the capabili-
ties interaction. A CapabilityStatement can describe
a FHIR
®
client or a server of resources. In our study,
we focus on server capabilities. The CapabilityState-
ment resource has three flavours: instance, capability,
and requirements. CapabilityStatement can describe
different behaviours of a FHIR
®
server, like the se-
curity of implementation, the implemented resources,
the system level interaction, the supported resources,
the operations for every resource, and the supported
search parameters for every resource.
2.2 Search Parameters Variabilities
Search parameters variabilities are described in the
“Search” page in the FHIR
®
standard. A search pa-
rameter can be described in a minimal manner in the
CapabilityStatement, with just exposing the search
parameter name, or it can be a complete descrip-
tion of the search parameter variations with a link
to a SearchParameter resource, describing the sup-
ported search variabilities. Every search parame-
ter can have multiple variabilities, many of them
can be documented in the SearchParameter resource.
For instance, modifiers can be described as part of
the SearchParameter resource, comparators as well.
Comparators are also called prefixes. All the vari-
abilities related to modifiers and comparators are de-
scribed in the documentation “RESTful API Search”
under FHIR
®
standard, and most of them depend on
the type of the search parameter. FHIR
®
defines nine
different search parameter types: date, number, quan-
tity, reference, string, token, uri, composite, and spe-
cial. Every search parameter type has many varia-
tions, like comparators, modifiers, the structure of the
searched value, and its precision.
Every resource has a specific number of search pa-
rameters, defined in the FHIR
®
standard (HL7, 2023).
Every resource of type DomainResource, can have an
extra search parameter: text’. In FHIR
®
R4, all the
resources extend DomainResource, except Bundle,
Parameters, and Binary. However, all the resources
extend Resource type, directly or through inheritance.
Thus, all the FHIR
®
R4 resources can implement any
search parameters from the Base Resource definition:
content’, id’, lastUpdated’, profile’, query’,
security’, ‘ source’, and ‘ tag’.
Figure 1 describes the number of search param-
eters for FHIR
®
R2, R3, R4 and R5. We collected
the number of the search parameters that FHIR
®
re-
sources can support, but we excluded the common
search parameters described above, which are inher-
ited from DomainResource and Resource. The num-
ber of resources did rise between R2 and R5 release
and was slightly stable after the release R4. However,
the number of search parameters continues to rise ev-
ery release. For R4, there are more than 1600 search
parameters.
Some common search parameters are quite com-
plicated and can have dozens of variations. For ex-
ample, the search parameter has’ is described by the
notion of reverse chaining. Every other resource that
can reference the current resource through a search
parameter can be used to search on the current in-
HEALTHINF 2025 - 18th International Conference on Health Informatics
298
Figure 1: Number of search parameters per FHIR
®
version,
excluding common search parameters.
stance of the resource. has’ search parameter can
also be nested, and in this case, we can have complex
search parameter structures.
For a better understanding of the FHIR
®
API vari-
ations, we worked on a quantitative identification
of all the variations. For every resource, for every
search parameter, we identified the possible varia-
tions that can be filled by a FHIR
®
server API. For
every search parameter, based on its type, we identi-
fied the different variations related to that search type.
For search parameters of type reference, we identi-
fied the chained search parameters as well. For some
search parameter types, we did go beyond the def-
inition and variations defined in the SearchParame-
ter resource. For example, for string-based search
parameters, equality to a value was refined to three
variations: strict equality, equality with case sensitive
variation, and equality with a begin-with behaviour.
In FHIR
®
standard, string matching without modi-
fiers returns results that have insensitive matching el-
ements equal or start with the searched value; how-
ever, FHIR
®
servers may implement one or the other
variability, on purpose or by error. For date-based
search parameters, even if the FHIR
®
standard de-
scribes that any precision level can be used in search-
ing values, some FHIR
®
servers may not support all
the variabilities in searching by year, month, day, min-
utes, or seconds. Making these capabilities as a unit
of variability within a FHIR
®
server allows having a
deep understanding of the capabilities of the FHIR
®
server, which goes beyond what FHIR
®
defines. We
identified for instance more than 50 variabilities for
every date-based search parameter. Every precision
can be coupled with the different prefixes used on
dates, and every combination of date precision and
prefix is a searching variation. For token-based search
parameters, the structure of the search token can be
(code), (system|code), (|code), or (system|). Also, it
can have multiple modifiers like :text, :not, :above,
:below, :in and :not-in. Besides chained parameters,
reference-based search parameters can have three dif-
ferent searching structures: searching by (id), by (Re-
sourceType/id), or searching by the full URL. Also,
it can support modifiers like :identifier, or an explicit
resource type as a modifier.
More than 3500000 variabilities were identified
and can be supported by FHIR
®
R4 servers. This
number does not include iterating on the parameters
has’, include’, and revinclude’. It includes only
the first level for chained parameters, and it does not
include combination of search parameters. This esti-
mation does not include Compartment-based variabil-
ities either. has’ variabilities represent 80% of the
overall variabilities, and more than 98% are related to
the parameters has’, revinclude’ and chained pa-
rameters. The remaining is nearly 60000 variabilities,
which is still a substantial number to test and to cover
for a FHIR
®
server.
2.3 CRUD Operations Variabilities
FHIR
®
API supports level 2 of the REST Maturity
Model (Ozdemir, 2020; Webber, 2010). In FHIR
®
R4, interactions with the FHIR
®
servers are divided in
three categories: six instance level interactions, three
type level interactions, and four system level interac-
tions. Every interaction has multiple variabilities. For
example, ‘update’ interaction in instance level can be
described with two variabilities: update of an exist-
ing resource, and update-as-create behaviour. ‘patch’
interaction can be refined to 15 different variations.
HL7
®
defined two major methods to do patching:
JSON patch and FHIR
®
patch, and every method has
many variations: adding, copying, moving, etc. Also,
conditional operations make every CRUD operation
refined to multiple variations and use cases, based on
the used condition and its applicability. We identified
27 variations for CRUD operations on every FHIR
®
resource, which represent nearly 4000 variations.
2.4 Validation Methods and Tools
Multiple tools and methods exist as FHIR
®
valida-
tion tools, many of them are open-source. FHIR
®
standard and community tried to document most of
these resources in the FHIR
®
standard itself through
the Implementation Support module and the Valida-
tion Resources page, or in related HL7
®
confluence
pages (HL7, 2024a; HL7, 2024b). FHIR
®
testing
tools can be divided in two categories: resources val-
idation tools and API exchange testing tools.
Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities
299
2.4.1 Resources Validation Tools
Resources validation principles and methods are de-
scribed in the FHIR
®
standard through the page Val-
idating Resources. There are mainly six methods
checking eight different aspects. FHIR
®
standard
comes with an XSD schema and schematrons, which
can be used by many XSD-based and schematrons-
based validation tools. Also, FHIR
®
standard pro-
vides a JSON schema, which can be used with many
JSON validation software. FHIR
®
community main-
tains a FHIR
®
validation tool that can be used in
command line to validate FHIR
®
resources against
standard FHIR
®
requirements, or against extended re-
quirements related to FHIR
®
profiles. FHIR
®
com-
munity provides as well a web-based validation tool
that can be used online to validate FHIR
®
resources
(Otasek, 2024). Some studies were performed as
well in order to validate FHIR
®
RDF presentation
of resources using Shape Expressions (ShEx) (Sol-
brig et al., 2017). This ShEx based method can be
used as well to validate XML and JSON presenta-
tions of FHIR
®
resources through transformation to
RDF presentation. Many provide open-source or on-
line FHIR
®
content validation tools. Here is a non-
exhaustive and unordered list of online content valida-
tion tools: Simplifier validation tool, Firely validation
tool, Inferno validation service, Gazelle FHIR
®
vali-
dation tool, Infoway FHIR
®
Validator, Aidbox FHIR
®
Schema Validator and clinFHIR resource validator.
Some FHIR
®
sandboxes provide as well $validate op-
eration like Aegis sandbox or HAPI FHIR
®
online
server. We can note as well FHIR
®
Notepad++ Plugin
and FHIR
®
tools plugin for Visual Studio Code (Lag-
ger, 2023), which provide a validation capability for
FHIR
®
resources.
The existence of all these tools confirms the matu-
rity of FHIR
®
content validation tools, and their wide
usage amongst FHIR-based products providers.
2.4.2 FHIR
®
API Exchange Testing Tools
FHIR
®
resources are exchanged usually through the
REST API defined and described in the FHIR
®
stan-
dard, even if other exchange mechanisms could be
used. Most of the FHIR
®
servers provide testing
sandboxes to accelerate integration with FHIR
®
ap-
plications. For instance, most of the EMRs pro-
vide some testing sandboxes with some connectiv-
ity validation process and are particularly useful to
reach a high level of interoperability for the FHIR
®
clients. Some open-source FHIR
®
servers can be con-
figured and installed in a private network and used
to test FHIR
®
clients’ implementations against these
FHIR
®
servers. Here is a non-exhaustive and un-
ordered list of open-source FHIR
®
servers: HAPI
FHIR
®
server (Hussain et al., 2018), LinuxForHealth
FHIR
®
Server (Opie, 2024), Microsoft FHIR
®
server
(Opie, 2024), FHIR
®
Candle (Canessa, 2024), Pascal
FHIR
®
Server (HealthIntersections, 2024), and Spark
(Kramer, 2024).
FHIR
®
standard defined TestScript resource, pro-
viding agnostic and interoperable method to share
test designs and test definitions, in executable for-
mat, with computable actions and interpretable veri-
fication instructions. Also, FHIR
®
defines TestReport
resource which can be used to share a summarized
testing report following the execution of a TestScript.
Other methods exist as well, defining generic struc-
tures for tests and test execution steps (GITB, 2015;
Scanlon, 2024).
Many tools allow testing FHIR
®
resources ex-
change through the FHIR
®
API. Some of them are
open-source, others are proprietary or having hy-
brid access. For instance, all HTTP REST testing
tools and automation tools can be leveraged to test
FHIR
®
servers APIs, like Postman or SOAPUI (SOA-
PUI, 2024). We can list this non-exhaustive and un-
ordered open-source and commercial FHIR
®
APIs
testing tools:
Touchstone (Walonoski et al., 2018) developed
by AEGIS, offers automated FHIR
®
testing for
servers and clients implementations, leveraging
FHIR
®
TestScript resource. It can be used to au-
tomate FHIR
®
data exchange testing, can be used
with pre-designed TestScript resources, and can
be used as a testing framework to add new test
scripts.
Crucible (Walonoski et al., 2018; Scanlon, 2024)
developed by MITRE, offers a set of open-source
testing tools for FHIR
®
. Can be used to test
FHIR
®
servers data exchange conformance and
can score patient records.
Inferno (Kramer and Moesel, 2023) is an open-
source tool that helps testing conformance to
FHIR
®
standard. Besides its resources’ valida-
tor, Inferno provides a web-based application to
execute online testing using many available test
kits. Every test kit is a list of defined tests that can
be executed against a specific FHIR
®
server end-
point. FHIR
®
servers can use predefined test kits
or can develop their own test kits using Inferno
framework.
Caristix Test (Caristix, 2021) provides the possi-
bility to automate testing of FHIR
®
implementa-
tions using Scenario Editor.
NIST FHIR
®
Toolkit (NIST, 2024) is an open-
source FHIR
®
testing tool, mainly for IHE
®
MHD
HEALTHINF 2025 - 18th International Conference on Health Informatics
300
FHIR
®
based profile.
Gazelle PatientManager tool allows simulating
some initiating actors as FHIR
®
client for some
FHIR-based IHE
®
profiles, like PDQm or PIXm.
Also, it can act as a FHIR
®
server to test some
IHE
®
profiles.
TestScript Engine (MITRE, 2023) is an open-
source testing engine. It is able to interpret and
execute TestScript FHIR
®
resources and generate
TestReport resources following command line ex-
ecution.
This list is not exhaustive, we think many other
proprietary or open-source FHIR
®
testing tools and
frameworks may exist. Some of the tools come with
preconfigured test definitions, some tools provide a
framework to create test definitions, which may fol-
low TestScript resource structure, or some other pro-
prietary structures.
The vast number of variabilities makes FHIR
®
servers claims difficult to check and to validate and
makes sometimes the integration of clients with het-
erogeneous FHIR
®
servers complex without staging
and testing phases. We used in this paper a method
to generate unitary verification tests directly from the
FHIR
®
standard, to cover exhaustively all the possible
variations in the FHIR
®
API, and to provide a clear
knowledge on the FHIR
®
server capabilities.
3 METHOD
To test agnostically all API variabilities that can be
implemented by a FHIR
®
standard, a huge number
of tests needs to be written, based on the variabili-
ties analysis described above. The aim of this method
is to avoid writing test definitions or test scripts, and
to generate them automatically based on computable
artifacts from the FHIR
®
standard. In fact, writing
manually all the identified tests is time consuming,
and error prone, which is something we are avoiding
with our method.
From the FHIR
®
standard, two main artifacts are
used: the ResourceType.profile.json files containing
the StructureDefinition of the FHIR
®
resource types,
and the file search-parameters.json, containing a com-
putable definition of all the search parameters in the
FHIR
®
standard, as described in Figure 2. Also, the
generator engine takes a list of test script templates as
input, containing a template of test definition for ev-
ery variation for every search parameter type. Every
template is meant to be compiled using inputs related
to tested resources and tested search parameters. The
Figure 2: Test scripts generation overview.
generator engine uses these inputs as described in Fig-
ure 3.
The list of resource types is extracted from the
FHIR
®
standard. For every resource type, the list
of supported search parameters is extracted from the
search-parameters.json file. Every search parameter
is defined with some information, like the target re-
sources, the type of the search parameter and the path
in the FHIR
®
resources to the elements that need to
be mapped to the search parameter. The path of the
search parameter is used to validate the returned bun-
dles following test scripts execution, it is used to com-
pare the searched values, and the returned resources
content. For every search parameter found in search-
parameters.json file, the list of variations is identi-
fied based on the search parameter type. For ev-
ery variability, we use the defined repository of test
scripts templates to identify the test script template
to be used for the specific identified variation. Then,
based on that template, the resource StructureDefini-
tion, and the search parameter attributes, the generator
engine generates the final test script of the variability.
For example, for the Organization resource, we have
in search-parameters.json multiple identified related
search parameters, like type, partof, and address. The
parameter type is a token, for which we identified in
our study 15 different possible variabilities. For ev-
ery variation type, a test template is defined, and is
used to generate a test script meant for testing only
a specific variability on the ‘type’ search parameter
for the Organization resource type. Every variability
type for every search parameter type has its own test
script template, which contains some testing steps and
Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities
301
Figure 3: Test scripts generation process.
testing logic, including negative testing and validation
mechanism of the returned Bundle following a search
query. For instance, one of the authors has a patent
application related to this method, with more techni-
cal details. The generation of the CRUD variabilities
for every resource was following the same pattern as
well, with test script templates used to generate test
scripts for every resource type.
When executed on a FHIR
®
server, the generated
test scripts allow identifying the capabilities of the
tested FHIR
®
server through testing, instead of iden-
tifying these capabilities through only accessing the
CapabilityStatement declaration. This enables some
comparison and validation of the declared capability
statement, as described in Figure 4.
The first step before executing the generated test
scripts is the initialization of testing variables. In fact,
to search FHIR
®
resources using search parameters,
we need to identify existing resources in the FHIR
®
server or create new FHIR
®
resources and ex-
Figure 4: CapabilityStatement generation and validation.
tract the right values for all tested search parame-
ters, using the path definition identified in the search-
parameters.json file, from FHIR
®
standard. When the
FHIR
®
server supports create interaction, the initial-
ization of the FHIR
®
server is done with static FHIR
®
resources as part of the initialization process.
During the execution of the generated test scripts
on a FHIR
®
server API, test results allow detecting
the different searching variabilities. These variabil-
ities results are used to generate a computed Capa-
bilityStatement which can be compared with the de-
clared CapabilityStatement from the FHIR
®
server.
This comparison allows us to highlight the differ-
ences between what is claimed, and what is imple-
mented. Because of the considerable number of gen-
erated test scripts, the declared CapabilityStatement
by the FHIR
®
server can be used to filter the tests to
be executed.
4 APPLICATION
4.1 API Testing with Inferno
We implemented the method described above using
Inferno framework (Scanlon, 2024), to generate test
HEALTHINF 2025 - 18th International Conference on Health Informatics
302
scripts for API variabilities for FHIR
®
R4 standard.
Inferno has a method to develop test kits, and the per-
formed implementation was through generating test
files, cascaded together to map every variability to
a generated test, executed using Inferno UI or its
API. Generation of tests related to the variabilities on
has’, ‘ revinclude’ and chained parameters were ig-
nored in our testing. Over the remaining 60000 vari-
abilities, our templates and our generation of tests re-
sulted in covering more than 42000 variabilities. The
following types of search parameters were covered
completely in our testing: string, date, number and
uri. ‘reference’ type was not fully covered as chained
parameters, and ‘identifier’ suffix were complicated
to test, and needed further parsing and following of
references on the collected resources. Token-based
search parameters were nearly fully tested with all
variabilities, except subsumes on ‘above’ and ‘below’
suffixes, as well as ‘in’ and ‘not-in’ suffixes. This
is because of the complexity of verifying these op-
erations, which needs connectivity to a terminology
service. Composite based search parameters were not
tested, but their number is quite small compared to the
other types. Common search parameters were tested
in all the resource types, with all their variations. The
tested common parameters are id, lastUpdated, tag,
profile, security, and source. Most CRUD opera-
tions and variabilities were tested as well. Compart-
ment based resources APIs were not tested during this
application.
Test data were defined to fill the tested FHIR
®
server with initial FHIR
®
resources. The definition
of the test data was one of the major steps, as these
data needs to be heterogeneous and rich enough to
create an optimal testing environment. These test data
were designed to enable testing of most of the targeted
search parameters. For every executed test, four sta-
tuses can be reported:
Pass: the variability is supported.
Fail: the test was failing, the FHIR
®
API does not
support the tested variability.
Crash: the test was crashing during execution.
Missing test data: the collected test data are not
sufficient to test the targeted variability.
No test defined: the variability cannot be tested
nor verified.
Failed tests are usually because the asserts and
verifications as part of the tests were failing. When a
test crashes, it is mostly because the variability is not
supported. For example, the API may return a 400
Bad Request response. Tests executed and marked
as ‘missing test data’ are usually because of miss-
ing searching values. Before executing searching on
parameters of a specific resource, we need to col-
lect FHIR
®
resources from the server. The collected
FHIR
®
resources may not contain all the needed in-
formation that allows us to perform all the search pa-
rameters. Inferno framework enables for instance pa-
rameters sharing between tests, which enable search
parameters initialization through create and search
phases. The tests marked as ‘missing test data’ or ‘no
test defined’ cannot confirm or deny the implementa-
tion of a specific variability of the FHIR
®
server API.
4.2 Extensive Testing Results
We executed the generated tests on many open-source
FHIR
®
servers, as well as on some available sand-
boxes, many of them are described in the state of the
art. We anonymized the FHIR
®
server names, the goal
of the analysis is not to compare between them, but to
identify implementation variations and common be-
haviours. The overall number of executed tests was
nearly 400 000 tests.
4.2.1 Overall Testing Results
Testing the FHIR
®
servers was time challenging, as
the execution of more than 42000 tests per server
takes several hours, and sometimes days, to finish ex-
ecution. Every test makes many FHIR
®
queries to test
combinations of positive and negative test steps. The
testing results are resumed in Figure 5 and Figure 6.
During testing, we did not filter based on the declared
CapabilityStatement, we executed the full test suite.
Figure 5: FHIR
®
servers search variabilities coverage.
Different open-source FHIR
®
servers support
wide amount of configuration parameters, which may
activate searching and operations capabilities, like in-
dexing of the missing elements, activation of text
search capabilities, activation of update-as-create ca-
pability, acceptance of non-resolvable references in
FHIR
®
resources, and many other configurations.
During the testing and configuration of the tested
open-source FHIR
®
servers, we tried to activate the
Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities
303
Figure 6: FHIR
®
servers CRUD variabilities coverage.
most possible and understandable level of indexation
and capabilities, but we acknowledge that some capa-
bilities may be missed during the configuration pro-
cess.
Most of the tested FHIR
®
servers presented a good
coverage of the different tested search capabilities.
However, we remarked that some FHIR
®
servers were
presenting many testing failures, highlighting non-
supported features by their APIs. In CRUD variabil-
ities testing, many tests were crashing, especially be-
cause the received responses were with an HTTP sta-
tus indicating an error, highlighting non-supported ca-
pabilities.
4.2.2 Search Variabilities Analysis
For variabilities related to search parameters, we
grouped the different tested results per variability type
and per resource type. The Figure 7 describes the
number of failures per variability type, tagged by the
types of the search parameters. The Figure 7 anal-
yses variability types where we have more than 200
encountered failures, during testing with the FHIR
®
servers described above. The included tests are
only related to supported search parameters by tested
servers. Every line in Figure 7 describes a tested vari-
ability, like for example searching by some specific
prefix for date-based search parameter, or searching
by ‘exact’ modifier for a string-based search parame-
ter. We grouped all the tests here per variability type.
Figure 7: Number of failures per variability type.
We remark that all the tested search parameter
types have some high number of failures, except
quantity and number-based search parameters, which
is understandable as the number of the related search
parameters is already low in FHIR
®
standard.
The less supported variabilities are related to
token-based search parameters and are related to
searching by the modifiers ‘above’, ‘below’, ‘text’,
‘missing’, ‘not’ and ‘of-type’. The performed tests
highlight elevated level of failure on these modifiers,
which means a totally not supported feature in some
FHIR
®
servers, or a mis-implementation of the vari-
ability. Searching for equality to (system|) variabil-
ity type was failing, which concluded the low support
and implementation of this variability in token-based
search parameters.
Searching with the modifiers ‘above’ and ‘below’
were the most encountered failing tests for uri-based
search parameters. For reference-based search pa-
rameters, searching by the full URL of the reference
was poorly supported. An interesting, detected fail-
ure was related to searching references by resource
ID. It seems some FHIR
®
servers prefer searching by
Resource/ID structure instead of searching directly by
the resource ID.
Many date-based search variabilities were failing
as well. Searching with ‘ap’ prefix was often failing
(> 3000 tests failed), highlighting poor support of this
prefix. Testing with the prefix ‘ne’ was often failing
( 2500 tests failed). Variabilities related to search-
ing with minutes precision were often failing (> 2500
tests failed). More than 1000 tests were failing related
to search prefixes ‘le’ and ‘ge’, which is an alarm-
ing observation: FHIR
®
clients commonly use these
prefixes, non-compliance or misinterpretation of their
meaning can have harmful impact on patient care.
We remark that string-based search parameters
were well implemented for instance in most of the
tested search parameters. Number and quantity-based
search parameters were not shown in Figure 7, as their
number was small. To have a better representation,
Figure 8 describes the rate of failure over the total
number of executions of a specific variability type
for a specific search parameter type. The included
tests are only related to supported search parameters
by tested servers. The diagram contains only failures
with a rate higher than 0.1 of failure occurrence.
Figure 8: Rate of failures per number of tests executed for
each variability type.
HEALTHINF 2025 - 18th International Conference on Health Informatics
304
Token-based search parameters still appear in Fig-
ure 8 with an important level of failures, related
to the same variabilities highlighted above: modi-
fiers ‘above’, ‘below’, ‘text’, ‘missing’, ‘not’ and
‘of-type’. Uri-based search parameters were also
failing tests related to variabilities ‘above’ and ‘be-
low’, which in both cases describes a poor support
of these variabilities. Date-based search parameters
and reference-based search parameters were as well
present with high number of failure percentage, still
related to the same variabilities described above.
Although the number of tests related to number-
based search parameters is low, the percentage of fail-
ure on the different searching variabilities is high.
Most of the failed tests are related to the usage of pre-
fixes, which was poorly supported or implemented,
especially the prefixes ‘ap’, ‘eb’, and ‘sa’. The
same observation can be concluded on quantity-based
search parameters, with a smaller number of identi-
fied failures. The prefixes ‘ap’, ‘eb’ and ‘sa’ can be
considered for instance edge cases.
5 DISCUSSIONS
SearchParameter resource covers many variations for
all search parameters and can be linked to the Capa-
bilityStatement of the FHIR
®
server to provide more
details about the level of implementation. Even so,
this may not be sufficient to describe all the possible
implementation variabilities, which may not be con-
formant to the definition and requirements defined by
FHIR
®
. For example, searching on string parameters
is not case sensitive, but some FHIR
®
API implemen-
tations may implement only a case sensitive searching
parameter, accidentally or intentionally. This kind of
variability may be difficult to describe in the Search-
Parameter resource. Another example will be date-
based search parameters. FHIR
®
server implementa-
tions may implement searching by year, month, date,
minutes, second, or fraction of seconds capabilities,
and for every precision type, the implementation can
support the different prefixes that the standard defined
(‘eq’, ‘gt’, ‘lt’, etc.). The number of variations for
date-based search parameters was more than 50, be-
side some common variations to all search parame-
ters, like support for ‘missing’ modifier and OR/AND
searching variabilities.
We described a method that allows generating au-
tomatically all the tests for all the variabilities of the
FHIR
®
API, using the formal definition of the FHIR
®
standard. The generated tests allow us to understand
the capabilities of the FHIR
®
API, and to compare and
validate the claimed capabilities within the FHIR
®
CapabilityStatement declared by the tested FHIR
®
server. The method described in this study can be
extended to test as well custom resources in FHIR
®
servers and their search variabilities (Boufahja et al.,
2021). We implemented this method by writing tem-
plates to generate tests as part of a test kit integrated
locally with Inferno testing platform. Some tests were
not generated as part of the application due to the
complexity of the tests. We executed the generated
tests against many testing servers and sandboxes, in-
cluding open-source FHIR
®
servers. The goal was not
to compare FHIR
®
servers’ providers, but to enhance
generic knowledge on servers’ capabilities and com-
mon issues and discrepancies. The implementation
of the method was using Inferno testing framework,
which was very efficient for implementing and exe-
cuting the different tests.
Many difficulties were observed during testing
date-based search parameters. First, the search pa-
rameters of type date can refer to multiple data types,
like dateTime, Period, and Timing, which makes the
verification of the search operation complex. In date
comparison, there are some complexities as well in
handling the time zone. For date search that goes to
the time level, we always include the time zone as part
of the search parameter, to avoid server-based inter-
pretation of the meaning of the query. For many of the
tested FHIR
®
servers, many tests were failing with le,
ge, lt and gt prefixes, which was a surprising finding.
This can be explained by the complexity of the inter-
pretation of these prefixes and their temporal mean-
ing, for which we experienced this complexity during
writing the verification code for the executed queries.
For instance, some of the found errors can become
problematic for the patient care, when the date search-
ing queries are returning wrong information, or miss-
ing information, and this enhances the importance of
clearly identifying the claimed variabilities by FHIR
®
server and having a strong testing process for all the
claimed variabilities. Testing ‘ap’ prefix was compli-
cated as well and hard to verify, especially because of
the lenient interpretation of this prefix, which differs
between implementers. Quantities and numbers’ pre-
cisions were not tested. A generic relative period was
considered during the verification of the searching re-
sults related to quantities and numbers.
We performed as well testing combinations of
AND and OR values for all search parameters. The
performed tests confirmed that servers may imple-
ment these variations for only a subset of the search-
ing parameters. FHIR
®
clients should be vigilant
regarding these kinds of variations within the same
FHIR
®
server.
Common search parameters were tested for every
Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities
305
resource. We remarked a strong adherence of FHIR
®
servers to support these common search parameters.
However, some edge case variations were failing in
some servers, like testing id:missing, id:not and
lastUpdated:missing variations.
Many FHIR
®
servers and sandboxes sometimes
support only combined or mixed search parameters.
Because of these kinds of requirements, results from
testing with sandboxes need to be handled carefully,
as failure of executed tests may not be because the
tested server is not supporting the tested variabilities,
but because of mandatory mixed search parameters.
Most of the tested FHIR
®
servers were not fully
supporting all CRUD operations variabilities, and
many tests were failing. For example, following a
DELETE operation, some FHIR
®
servers were re-
sponding with 404 status code for a deleted resource,
instead of 410 status code. For update-as-create oper-
ation, some FHIR
®
servers were responding with 200
status code, instead of 201.
Since the search parameters are generated based
on an initial collection and initialization of the FHIR
®
resources, returned Bundles can be different between
the different tested servers. Even within the same
sandbox, executing the generated tests twice may re-
sult in different FHIR
®
queries, due to the variations
on the content of the FHIR
®
servers and their hosted
resources. Because of this behaviour, sometimes tests
can pass because the tested data were optimal and al-
lowed to have consistent queries and results, even if
we enforced negative testing and results verification
in all the generated test scripts. Thus, we consider
that the passing tests are only a strong indication that
the searching variabilities may be well implemented.
Even if all the executed tests were generated au-
tomatically based on the FHIR
®
standard, and that
we verified the relevance and the well design of the
templates, edge cases may not be fully tested. The
failed tests were not all verified manually to check the
relevance of the encountered failure, even if dozens
of manual verifications of failed tests were performed
and confirmed.
The comparison between the test results and the
claimed capabilities of the FHIR
®
servers highlighted
discrepancies. For instance, some FHIR
®
servers
claim support for patch operations, without highlight-
ing which variabilities are supported. Testing their
APIs highlighted only a partial support. Also, in some
FHIR
®
servers, we find that they are supporting some
extra search parameters that they are not declaring in
their FHIR
®
conformance statement.
This extensive testing method and application
highlighted the importance of a strong testing strat-
egy and mechanism for all the possible implemented
variabilities, which should start first by documenting
all the different supported variabilities. Weak testing
mechanism leads to API interpretation errors and can
become dangerous for the patient care in some cases.
6 CONCLUSIONS
The FHIR
®
standard defines a complete API for data
access and search, with nearly 150 resources and
more than 1500 search parameters combinations. Ev-
ery search parameter is defined with many variabili-
ties and usage. We identified for instance more than
3500000 possible variations in the FHIR
®
API, 98%
of them are related to only three kinds of search vari-
ations. Such variability highlighted the complexity
of putting in place a complete campaign for testing
all the API variations. We described in this paper a
method to generate test scripts using the formal de-
scription of the FHIR
®
standard in order to cover all
possible variabilities. The method uses test templates
defined based on the search variabilities analysis. The
report from executing the generated test scripts can
be used to validate the FHIR
®
server claims declared
in their CapabilityStatement. We applied the method
using Inferno framework through implementing and
generating a subsequent number of the identified test
scripts variabilities. The generated test scripts were
executed with many available FHIR
®
servers. The
results of execution showed many differences in the
implementation and coverage of the tested FHIR
®
servers. Many variabilities were poorly supported,
highlighting that FHIR
®
clients should be aware of
such limitations in FHIR
®
servers, during developing
FHIR
®
based applications. Some claimed capabili-
ties in FHIR
®
servers were not fully supported, and
the execution of the generated tests allowed to iden-
tify clearly every supported and non-supported varia-
tion. Different commonly used variabilities were fail-
ing occasionally, like searching on date with some
common prefixes. Such testing failures highlight the
importance of having a complete testing suite and a
testing strategy for FHIR
®
servers, to provide a better
integration experience and a better patient care.
ACKNOWLEDGEMENTS
We acknowledge strong GE HealthCare support dur-
ing this study from Science and Technology Organi-
zation personnel for their feedback and support that
helped in formulating the conclusions.
HEALTHINF 2025 - 18th International Conference on Health Informatics
306
REFERENCES
Ayaz, M., Pasha, M. F., Alzahrani, M. Y., Budiarto, R., and
Stiawan, D. (2021). The Fast Health Interoperabil-
ity Resources (FHIR) Standard: Systematic Literature
Review of Implementations, Applications, Challenges
and Opportunities. JMIR Medical Informatics, 9(7).
Benson, T. and Grieve, G. (2016). Principles of Health In-
teroperability: SNOMED CT, HL7 and FHIR. Health
Information Technology Standards. Springer Interna-
tional Publishing, Cham.
Boufahja, A., Nichols, S., and Pangon, V. (2021). Custom
FHIR Resources Definition of Detailed Radiation In-
formation for Dose Management Systems:. In Pro-
ceedings of the 14th International Joint Conference on
Biomedical Engineering Systems and Technologies.
SCITEPRESS - Science and Technology Publications.
Braunstein, M. (2022). Health Informatics on FHIR: How
HL7’s API is Transforming Healthcare. Springer.
Canessa, G. (2024). fhir-candle When you need a small
FHIR. Open-Source Tooling, Webinar Series.
Caristix (2021). Caristix Test Scenario Editor. https://
caristix.com/tutorials/testing/fhir-server-capabilities/.
Accessed: 2024-09-05.
GITB (2015). CEN Workshop Agreement Global eBusi-
ness Interoperability Test Beds GITB Phase 3 | Joinup.
Technical report.
HealthIntersections (2024). Pascal FHIR Server
Reference Implementation. https://github.com/
HealthIntersections/fhirserver.
HL7 (2023). HL7 FHIR Standard. https://hl7.org/fhir/.
HL7 (2024a). FHIR Tools Registry. https://confluence.hl7.
org/display/FHIR/FHIR+Tools+Registry. Accessed:
2024-12-14.
HL7 (2024b). Public FHIR Validation Services.
https://confluence.hl7.org/display/FHIR/Public+
FHIR+Validation+Services. Accessed: 2024-12-14.
Hussain, M. A., Langer, S. G., and Kohli, M. (2018). Learn-
ing HL7 FHIR Using the HAPI FHIR Server and Its
Use in Medical Imaging with the SIIM Dataset. Jour-
nal of Digital Imaging, 31(3).
Kramer, E. (2024). Firely’s FHIR .NET SDK. Open-Source
Tooling, Webinar Series.
Kramer, M. A. and Moesel, C. (2023). Interoperability with
multiple Fast Healthcare Interoperability Resources
(FHIR®) profiles and versions. JAMIA Open, 6(1).
Lagger, Y. (2023). Vscode fhir tools. https://github.com/
laggery/vscode-fhir-tools. Accessed: 2024-12-14.
MITRE (2023). Ruby fhir testscript execution engine. https:
//github.com/fhir-crucible/testscript-engine.
NIST (2024). NIST FHIR Toolkit. https://github.
com/usnistgov/asbestos/wiki/Introduction. Accessed:
2024-09-05.
Opie, C. A. (2024). Exploring security vulnerabilities in
fhir server Implementations: a case study on ibm’s fhir
server in the context of the 21st century cures act.
Otasek, D. (2024). The FHIR Validator. Open-Source Tool-
ing, Webinar Series.
Ozdemir, E. (2020). A General Overview of RESTful Web
Services. In Applications and Approaches to Object-
Oriented Software Design: Emerging Research and
Opportunities. IGI Global Scientific Publishing.
Scanlon, R. (2024). Inferno – FHIR Conformance Testing.
Open-Source Tooling, Webinar Series.
SOAPUI (2024). Getting Started with REST Testing
in SoapUI | SoapUI. https://www.soapui.org/docs/
rest-testing/. Accessed: 2024-09-05.
Solbrig, H. R., Prud’hommeaux, E., Grieve, G., McKen-
zie, L., Mandel, J. C., Sharma, D. K., and Jiang, G.
(2017). Modeling and validating HL7 FHIR profiles
using semantic web Shape Expressions (ShEx). Jour-
nal of Biomedical Informatics, 67:90–100.
Walonoski, J., Scanlon, R., Dowling, C., Hyland, M., Et-
tema, R., and Posnack, S. (2018). Validation and
Testing of Fast Healthcare Interoperability Resources
Standards Compliance: Data Analysis. JMIR Medical
Informatics, 6(4).
Webber, J. (2010). REST in Practice. O’Reilly Media.
Extensive Conformance Testing and Validation of FHIR Data Exchange Variabilities
307