to the spectrum size was not reasonable here, as q is a
size-independent variable. Splitting the task into sub-
tasks for semicircular (q < 5) and non-semicircular
spectra (q > 5), on the other hand, was tested but did
not solve the problem (cf. Table 2).
As an intuitive alternative, a logarithmic split of
the target domain into subdomains of varying range
was tested. This approach is based on the rationale
that spectra with very large q are less frequently ob-
served in practice as well as in the modeled datasets.
Compared to initial results (Figure 2), significantly
lower error rates were observed when using decision
trees (Table 3) and reasonable error rates in 3 out of 5
subtasks (q < 1000) when using random forests (Ta-
ble 5 and 6). For the lowest target range (1 < q < 10),
random forests showed a satisfying mean error of
mostly less than ten percent, when applied to subset
C of even less than five percent.
4.2 Predictiveness of Feature Subsets
Evaluation of the 13 potentially predictive feature
subsets identified by the three distinct approaches
showed that at a number of five features (subsets A,
D.5, E.5) is likely not enough for precise predictions.
For ANNs as well as for Random Forests, using sub-
sets with ten or more features showed lesser mean er-
ror (cf. Table 4-8). We take this, again, as an indicator
that the given task is of complex nature.
Training Random Forests with subset C, a rela-
tively high precision of predictions is observed for the
target range 1 < q < 10. Not only is the mean error
lesser than five percent, but also is the third quantile
lesser than ten percent (Table 6). Similarily to obser-
vations on other feature subsets, however, the max-
imum error was considerably larger than that (here:
93.4 percent). While this deviation is currently too ex-
treme for practical applications, we are convinced that
such extreme errors can be reduced in future work.
An obvious trend among all predictions was the
fact that Random Forests did perform generally better
than ANNs. While some improvement might be pos-
sible here with more complex ANN architectures, we
take the obtained results as a trend indicating advan-
tages in using Random Forests.
4.3 Target-specific versus Feature
Subset-specific Task Differentiation
Clustering based on statistical features of impedance
spectra is possible for subsets A, B, C, D, E (Fig-
ure 3a) as well as for the initial feature sets S (Figure
3b). While the variance of the within groups sum of
squares is in general greater among the subsets than
among the sets, even the best performing subset (top-
ranked 15 features of subset D, D.15) shows a rela-
tively large within groups sum of squares.
For subset D.15, training and testing individually
for each cluster yielded considerably larger mean er-
rors than using logarithmic subdomains of q (Figure
3c). This effect seems not to dependent on the number
of clusters and was observed similarly when training
individually for each cluster of the best-performing
feature set (median-normalized S
∆φ
, Figure 3d). For
the given regression task, deriving subtasks based on
the target q therefore appears to be more fruitful than
deriving subtasks from the feature domain. In prac-
tice, however, matching measured spectra to such
subtasks would require a previous classification step.
5 CONCLUSIONS
With the present study, we aimed at understanding the
nature of the relation between the τ quotient q and
the shape of an impedance spectrum obtained from
the given five-parameters electric circuit (Figure 1b).
Based on ideally modeled spectra, we found that the
task of predicting q from statistical features inherent
in each spectrum is of such complex nature that is has
to be further differentiated. Our results imply that de-
riving substasks by splitting the target domain is more
effective than deriving subtasks with respect to fea-
ture subset clusters.
When dividing the target domain into logarithmic
subdomains, we found that a relatively small num-
ber of statistical features is sufficient for reasonable
predictions of q values <1000. Moreover, we could
show for values <10 that q can be estimated with a
satisfying mean error of less than five percent. As q
indicates in this particular range whether the spectrum
possesses a semicircular or a non-semicircular shape,
this result provides a basis for an automated discrim-
ination between these two spectrum types.
REFERENCES
Arras, M. K. and Mohraz, K. (1996). FORWISS Artifi-
cial Neural Network Simulation Toolbox v.2.2. Bay-
erisches Forschungszentrum f
¨
ur wissensbasierte Sys-
teme, Erlangen, Germany.
Breiman, L. (2001). Random forests. Machine learning,
45(1):5–32.
G
¨
unzel, D., Zakrzewski, S. S., Schmid, T., Pangalos, M.,
Wiedenhoeft, J., Blasse, C., Ozboda, C., and Krug,
S. M. (2012). From ter to trans-and paracellular resis-
tance: lessons from impedance spectroscopy. Annals
AutomatedQuantificationoftheRelationbetweenResistor-capacitorSubcircuitsfromanImpedanceSpectrum
147