and error” guided formulation design, supported ex-
clusively by the experience, expertise and knowledge
of drug development scientists (Aguilar, 2013).
In the last fifteen years, many efforts have been
made in order to develop computational methods that
can provide a tool for the prediction/simulation of
DPs. Mechanistic and data-driven (phenomenologi-
cal) models have been used for this purpose (Siep-
mann and Siepmann, 2013) (Mendyk et al., 2015).
Mechanistic approaches are a more elegant and ac-
curate way to model the dynamic interaction among
the variables. However, they include many parame-
ters that are difficult to estimate and require a deep
understanding of every law governing the interac-
tion among all the variables involved in the disso-
lution process, and much of them are still unknown
(Aguilar, 2013). On the other hand, computational
intelligence methods, and more specifically machine
learning (ML) techniques, are able to generate mod-
els trough a data-driven paradigm, with the advan-
tage that no a priori knowledge about the interactions
among the variables is required (Ibri´c et al., 2012).
Most of the ML-based approaches for DP prediction
focus their analysis in the use of Artificial Neural Net-
works (ANN) with different topologies. e.g., in (Shao
et al., 2007) a comparison between neurofuzzy logic
and a basic ID3 decision tree approaches is presented.
A total of 14 variables were included (4 formulation
variables, 2 process variables, and 8 tablet proper-
ties). No information about the number of AIs in-
cluded in the experiments is given by the authors. The
paper concluded that both models are able to provide
useful knowledge about the cause-effect relationships
among the variables and the quality of the product.
In another case study (Ibri´c et al., 2012), a review
of the application of ANNs in the formulation and
evaluation of modified release dosage forms is pre-
sented. Multi-layer perceptron and Elman neural net-
works are the most employed methods according to
the revision. In all the cases cited, the models are used
to predict DPs in highly controlled environments, i.e.
the data contain only one AI and few design varia-
bles, specially features related to formula composi-
tion. A more recent approach presented in (Mendyk
et al., 2015), compares the performance of ANN and
Genetic Programming (GP) in the modeling of drug
dissolution from the dosage form. The data set con-
tained results of dissolution tests carried out for 5 var-
ious formulations of lipid extrudates. Only two varia-
bles were included in the analysis. The authors found
GP to be the most robust model for DP prediction.
Bearing this in mind, the main limitation of the
computer-aided dissolution profiles prediction sys-
tems, to become in useful tools that really support the
design of a wider kind of SOPFs, is that they often
focus on specific dissolution phenomena, including
very scarce features and trained using data from only
one or maximum three different AIs. Therefore, they
serve limited purpose and their use in real develop-
ment environments could be considered narrow.
Building a ML-based tool able to simulate DPs of
different SOPFs, requires to consider a larger num-
ber of variables involved in the dissolution process,
because the dynamic response (DP) of some drugs
can drastically change by variations in features that
does not affect other kind of drugs. Nevertheless, by
increasing the number of variables to be considered
by complex data-driven models (such as ANN, whose
parameters increase exponentially with respect to the
number of variables), the model trained can likely be
affected by the curse of dimensionality, and overfits
to the training data.
From an statistical point of view, the prediction
of a DP from a set of formulation variables, corre-
sponds to a functional regression problem known as
Function-on-Scalar Regression (FoSR) (Reiss et al.,
2010), i.e., a regression problem where the respon-
ses are functions and the predictors are scalars. In
order to overcome the curse of dimensionality in the
DP prediction problem, a method for dimensionality
reduction on FoSR is required; however this is an al-
most unexplored field in the state of the art, especially
for cases where the sampling times are not uniform
among the samples, which is precisely the case of
DPs, since the sampling times typically used in the
dissolution tests, are not uniform trough time and de-
pend on the duration or desired effect of the specific
drug being designed.
Bearing this in mind, an alternative way to ad-
dress the DP prediction problem, is to use a multi-
output ML-based approach, where the different va-
riables involved in the drug design and test, along
with the target dissolution times are used as inputs,
and the percentages of dissolution for the same tar-
get times are considered as outputs. This alternative
is suggested in (Contia and O’Hagan, 2010) for com-
plex non-uniform sampled dynamic models. Such a
model can be used as a wrapper criterion for feature
selection techniques, in order to reduce the number
of variables analyzed, avoid the overfitting of the pre-
diction model, and select the most relevant features
for the DP prediction problem.
In this sense, the present work explores the use
of heuristic-based methods, in order to address the
dimensionality reduction in the SOPFs’ DP predic-
tion problem. All the drugs considered correspond
to rapid release SOPFs, which have similar pharma-
cokinetics and are the most frequently type of SOPFs