Automated Quantiﬁcation of the Relation between Resistor-capacitor

Subcircuits from an Impedance Spectrum

Thomas Schmid

, Dorothee G

unzel

and Martin Bogdan

Department of Computer Engineering, Universit

at Leipzig, Augustusplatz 10, D-04109 Leipzig, Germany

Institute of Clinical Physiology, Campus Benjamin Franklin, Charit

e, D-12220 Berlin, Germany

Keywords:

Impedance Spectroscopy, Epithelia, HT-29/B6, IPEC-J2, Machine Learning, Feature Selection, Decision

Trees, Artiﬁcial Neural Networks, Random Forests.

Abstract:

In epithelial physiology, it is common to use an equivalent electric circuit with two resistor-capacitor (RC)

subcircuits in series as a model for the electrical behavior of body cells. The relation between these two

subcircuits can be quantiﬁed by a quotient of their time constants τ. While this quotient is a direct indicator

of the shape of impedance spectra, its value cannot be determined directly. Here, we suggest a machine

learning-based approach to predict the τ quotient from impedance spectra. We perform systematic extraction of

statistical features, algorithmic feature ranking and dimension reduction on model impedance spectra derived

from tissue-equivalent electric circuits. Our results demonstrate that this quotient can be predicted reliably

enough from implicit features to discriminate semicircular against non-semicircular impedance spectra.

1 INTRODUCTION

Characterization of current through a given sample is

of interest not only in electric engineering, but also in

many biomedical applications. Often, like in physi-

ological analyses of epithelial cells, this is achieved

by applying alternating current (AC) and measur-

ing the opposition to this current, called impedance.

As the applied frequencies are varied across a given

spectrum, this concept is commonly refered to as

impedance spectroscopy.

When applied to epithelial cell layers, this allows

to discriminate alternative current pathways (Krug

et al., 2009). While these layers form barriers be-

tween compartments of an organism, they also reg-

ulate exchange of ions, water and nutrients. At their

apical side, epithelia are joined by the tight junction

(TJ), which regulates ion transport between neigh-

bouring cells (paracellularly). Alternatively, ions may

be transported through the cells (transcellularly), i.e.

across the apical and basolateral cell membrane. Both

pathways may be altered under physiological and

pathophysiological conditions.

For simple epithelia that consist of a single layer

of cells, electrical properties can be described by an

equivalent electric circuit as depicted in Figure 1a

unzel et al., 2012). Within this 5-parameter circ-

uit, the TJ is represented by a resistor R

, whereas

each side of the epithelium is characterized by an

RC subcircuit, a or b, respectively. Each subcircuit

is readily described by its time constant τ

= R

·C

and τ

= R

·C

respectively. The ratio between the

numerically larger time constant and the numerically

smaller time constant establishes a good quantiﬁca-

tion of differing electrical properties of the apical and

basolateral membrane, as e.g. caused by the activa-

tion of ion channels. With known resistor and capaci-

tor values, this relation is given by the quotient q =

where τ

,τ

∈ {τ

,τ

} and τ

> τ

For measured impedance spectra, however, nei-

ther the time constants nor q are known. Due to the

symmetric organization of the subcircuits and the am-

bivalence of the resulting spectra, values of the under-

lying resistors and capacitors cannot be determined

reliably from a single spectrum. And while q serves

as a direct indicator whether the spectrum is of semi-

circular or non-semicircular shape, this exact relation

has not been understood so far.

In previous work, we have modeled realistic

impedance spectra for two distinct epithelial cell lines

(Schmid et al., 2013a). Thereby, relevant electrical

properties were determined by predicting x-axis in-

tercepts from explicitly measured data. In a next step,

we extracted implicit statistical features of ideal spec-

tra, applied algorithmic feature selection and eval-

uated feature subsets with artiﬁcial neural networks

141

Schmid T., Günzel D. and Bogdan M..

Automated Quantiﬁcation of the Relation between Resistor-capacitor Subcircuits from an Impedance Spectrum.

DOI: 10.5220/0004746501410148

In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS-2014), pages 141-148

ISBN: 978-989-758-011-6

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

(a) (b)

Figure 1: (a) Equivalent electric circuit discriminating between apical (τ

= R

) and basolateral (τ

= R

) properties of

an epithelial cell layer. (b) An impedance spectrum reﬂecting AC application at 42 frequencies between 1.3 and 16,000 Hz on

an epithelial cell layer with low resistance R = R

+ R

)/(R

+ R

). In contrast to physiological conditions (where

≈τ

), here R

is decreased considerably by drug application. Thus, τ

decreases and a non-semicircular shape is obtained.

Impedances Z can be displayed as complex numbers (ℜ,ℑ) or in polar coordinates (magnitude r, phase φ).

(ANNs) (Schmid et al., 2013b). Here, we continue to

further systemize this approach. For the more com-

plex task of predicting q, we extract more implicit

features, perform dimension reduction and task dif-

ferentiation, and compare concurring machine learn-

ing techniques. As no conventional way to estimate q

exists, we use decision trees as baseline method.

2 METHODS

2.1 Modeling Impedance Spectra and

Extracting Statistical Features

The complex impedance Z of the tissue-equivalent

electric circuit (Figure 1a) at an angular frequency

ω can be derived by Kirchhoff’s laws from the

impedances of its components R

, C

, R

, C

and R

Z(ω) =

+ R

) +iω[R

+ R

)]

+ R

(1 −ω

) + iω[R

(τ

+ τ

) + R

+ R

]

(1)

where i =

√

−1, and τ

= R

and τ

= R

In measurements, a spectrum of n impedances

Z(ω

),...,Z(ω

n−1

), i.e. n tupels of real and imaginary

parts ((ℜ(ω

),ℑ(ω

)),... , (ℜ(ω

n−1

), ℑ(ω

n−1

))), is

obtained by applying AC at n frequencies. Alterna-

tively, the complex impedances can be transformed

into polar coordinates, i.e. into phase φ and magni-

tude r (((φ(ω

),r(ω

)),... , (φ(ω

n−1

),r(ω

n−1

)))).

In the following, phases and magnitudes of a spec-

trum are handled as separate feature sets S

and S

= {φ(ω

),...,φ(ω

n−1

)} (2)

= {r(ω

),...,r(ω

n−1

)} (3)

Analogously for real and imaginary parts:

ℜ

= {ℜ(ω

),... , ℜ(ω

n−1

)} (4)

ℑ

= {ℑ(ω

),... , ℑ(ω

n−1

)} (5)

From these four feature sets, new sets of inherent

features are created that represent the n −1 distances

between two consecutive features of the original sets:

∆φ

= {∆φ|∆φ

= φ(ω

i+1

) −φ(ω

),0 ≤i < n −1} (6)

∆r

= {∆r|∆r

= r(ω

i+1

) −r(ω

),0 ≤i < n −1} (7)

∆ℜ

= {∆ℜ|∆ℜ

= ℜ(ω

i+1

) −ℜ(ω

),0 ≤i < n −1} (8)

∆ℑ

= {∆ℑ|∆ℑ

= ℑ(ω

i+1

) −ℑ(ω

),0 ≤i < n −1} (9)

Additionally, forward (

−→

δ ) and backward (

←−

δ )

differential quotients were calculated from real and

imaginary parts of neighbouring impedances:

−→



−→

δ |

−→

ℑ(ω

i+1

) −ℑ(ω

)

ℜ(ω

i+1

) −ℜ(ω

)

,0 ≤i < n −1, ω

< ω

i+1



←−



←−

δ |

←−

ℑ(ω

i+1

) −ℑ(ω

)

ℜ(ω

i+1

) −ℜ(ω

)

,0 ≤i < n −1, ω

> ω

i+1



(10)

(11)

For each of these sets of explicit spectrum fea-

tures, a total of 16 univariate statistical parameters

were calculated (see Appendix A for details). While

the explicit features were discarded at this point, the

extracted sets of implicit features were used in three

variants: in absolute values, normalized to the respec-

tive mean, and normalized to the respective median;

due to the fact that differential quotients are already

BIOSIGNALS2014-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

142

Table 1: Characteristics of the modeled datasets.

Dataset Cases Target q

Minimum 1st Quantile Median Mean 3rd Quantile Maximum

HT-29/B6 unaltered 299,112 2.02 15.64 33.00 2983.09 109.19 89310.1

drugged 151,043 1.55 15.17 31.88 1988.03 101.85 88425.9

IPEC J2 unaltered 458,326 1.00 2.18 3.32 4.38 5.35 31.74

drugged 279,268 1.01 2.18 3.31 4.38 5.34 31.39

Combined 1,187,750 1.00 2.79 5.60 1006.77 20.68 89310.1

Training 791,834 1.00 2.79 5.61 1005.00 20.78 89310.0

Test 395,916 1.01 2.79 5.58 1011.00 20.50 86220.0

relative parameters by nature, datasets S

−→

and S

←−

were only used in absolute values. This yielded a to-

tal of 26 extracted feature sets, consisting of a total of

416 implicit spectrum features.

By varying the underlying ﬁve circuit parame-

ters, measurements on distinct epithelial cell lines un-

der a variety of experiment conditions can be mim-

icked (Schmid et al., 2013b; Schmid et al., 2013a).

Here, we imitated the epithelial cell lines HT-29/B6

and IPEC-J2, for which we have described electrical

properties before and after application of parameter-

altering drugs previously (Schmid et al., 2013a).

These distinct data sets from each cell line under each

condition were combined before the following analy-

ses in order to gain cell line-independet results (Table

1). Note that for simplicity, the scatter resulting from

real-world measurements is not modeled here.

2.2 Assessing the Complexity of the

Regression Task

As baseline method for predicting q and to obtain a

ﬁrst glance at the complexity of regression task, we

applied decision trees

. The combined data was split

into a training dataset (66 percent) and a test dataset

(33 percent), where both datasets showed compara-

ble statistical characteristics (Table 1). For complex-

ity analysis, the number of cases from the training

dataset used for building the tree was increased step-

wise (starting at 1 percent of the training dataset)

while the test dataset was left unaltered.

Further, we investigated whether complexity of

the regression task can be reduced by splitting it into

intuitive subtasks. Based on the target domain, two

splittings were tested. In the one variant, we dis-

criminated between semicircular (q < 5) and non-

semicircular (q > 5) spectra (Schmid et al., 2013b).

In the other variant, we divided the target domain into

ﬁve distinct logarithmic target domains.

All decision tree tasks were performed with R and the

standard package tree by B. Ripley.

2.3 Searching for Predictive Feature

Subsets

As no ideal feature selection approach is known for

this speciﬁc task, three alternatives were tested:

• Variables used by decision trees were taken as fea-

ture subsets. For a tree built without splitting, this

implied a set of 5 features (termed subset A). For

trees built for the two-fold splitting, both feature

subsets were merged, yielding a set of 13 unique

features (subset B). For trees built for the ﬁve-fold

splitting, merging all feature subsets yielded a set

of 23 unique features (subset C).

• All 416 features were ranked by a nearest

neighbor-based algorithm. In contrast to ﬁlter

methods, which rank features individually, the

here used ”regression gradient feature selection”

evaluated the performance of a feature within a

feature subset (Navot et al., 2005). In order to de-

termine an optimal subset size, subsets of the 5,

10, 15, 20 and 25 top-ranked features were evalu-

ated. These features are referred to as subset D.

• Random forests were used to determine impor-

tance of all 416 variables

. Aggregating output of

a large number of decision trees, Random Forests

provide relatively unbiased predictions (Breiman,

2001). Again, subsets of the 5, 10, 15, 20 and 25

top-ranked features were evaluated (subset E).

For all candidate feature subsets (A, B, C, D, E),

the combined dataset (Table 1) was reduced to the

respective features, split into logarithmic target sub-

domains and assessed by decision trees, multi-layer

perceptrons (MLPs)

, and random forests.

All RGS rankings were performed with MATLAB and

MATLAB code provided by the algorithm authors online at

www.cs.huji.ac.il/labs/learning/code/fsr/

All random forest tasks were performed with R and the

package randomForest (Liaw and Wiener, 2002).

For all MLPs, we use a n-2-1 architecture with one hid-

den layer consisting of two hidden units. Training was per-

formed by backpropagation and using the FORWISS Arti-

ﬁcial Neural Network Toolbox (Arras and Mohraz, 1996).

AutomatedQuantificationoftheRelationbetweenResistor-capacitorSubcircuitsfromanImpedanceSpectrum

143

2.4 Searching for Subtasks by Feature

Subset-Speciﬁc Clustering

As an alternative to relying on q for task differentia-

tion (Section 2.2), we also tested splitting the regres-

sion task into subtasks solely with respect to individ-

ual feature subsets. Due to the multi-dimensionality

of the feature domains, however, such splittings can-

not be deﬁned intuitively or by domain knowledge.

Therefore, for each feature subset, we performed k-

means clustering based on the respective features

. In

all clusterings, the within groups sum of squares was

determined for n = {2,...,15} clusters.

In parallel, such analyses were also performed for

those initial feature sets S ∈ {S

ℜ

ℑ

,...} that

could be considered relevant based on the previous

feature selection and ranking. Feature sets were as-

sumed to be relevant for predicting q if two or more

of its features did appear in any of the previously iden-

tiﬁed feature subsets (Section 2.3).

For the feature subset and feature set that showed

the lowest group sums of squares, clusterings were as-

sessed in more detail. Analogously to previous eval-

uations, spectra of each cluster were separated into

training (66 percent) and test (33 percent) data and

evaluated by Random Forests. By this, mean test er-

rors of the various clusterings could be compared.

3 RESULTS

3.1 Complexity Analysis

With increasing number of training cases, no consid-

erable improvement of the test error was observed. In

particular, even the mean absolute derivation from the

target q, was measured throughout to be considerably

larger than a hundred percent of the target (Figure 2).

Complexity is here measured as number of variables

used by the respective decision tree. While no de-

crease of the complexity was observed relative to us-

ing ﬁve percent of the training datasets, the actually

used variables changed with increasing percentage.

When discriminating between semicircular and

non-semicircular spectra, a much lower mean abso-

lute derivation from the target was observed for semi-

circular spectra (Table 2); the same holds true for the

maximum derivation. Logarithmically splitting the

target domain yielded mean absolute derivations be-

tween 18 and 49 percent (Table 3); maxium deriva-

tions, however, lay between 215 and 420 percent.

All k-means clusterings were performed with R and its

built-in function kmeans.

100

200

300

400

500

600

700

10 20 30 40 50 60 70 80 90 100

Mean +/- error [%]

Complexity [# of used variables]

Fraction of training dataset [%]

Error

Complexity

Figure 2: Development of mean error [±%] and complexity

(number of used variables) by size of training data.

Table 2: Error for shape-based splitting (target domain).

Target q Test Error [+/- %]

Range Minimum Mean Maximum

1.0 - 5.0 0.0 16.3 164.9

5.0 - 89,310 0.0 531.0 333,400.0

Table 3: Error for logarithmic splitting (target domain).

Target q Test Error [+/- %]

Range Minimum Mean Maximum

1 - 10 0.0 23.2 238.1

10 - 100 0.0 26.7 419.5

100 - 1,000 0.0 19.0 272.7

1,000 - 10,000 0.0 48.7 410.3

10,000 - 89,310 0.0 37.9 216.5

Note that splitting was applied to the combined

data (Table 1); the results were, again, divided into

training (66 percent) and test (33 percent) data.

3.2 Predictiveness of Feature Subsets

As described, a total of 13 differing potential predic-

tors were tested with decision trees, ANNs and ran-

dom forests. Tests were performed separately for each

of the ﬁve logarithmic ranges of the target q (cf. Ta-

ble 3). As decisions trees, however, did in all applica-

tions not perform notably better than previously with

all 416 features (cf. Table 3), we only show ANN

and random forest results for feature subsets A, B, C

and the best D and best E subset (Tables 4-8). Error

of the predictions is given as the absolute derivation

from the target relative to the respective target.

Neither MLPs nor random forests reached test er-

ror of less than ten percent for the full target range,

i.e. not under all ﬁve conditions tested. Using MLPs,

a mean error of less than ten percent could not be

achieved with any of the feature subsets; best mean

errors were observed for feature subset B in the target

range 100 < q < 1000 (16.4 percent) and for feature

subset C in the target range 100 < q < 1000 (14.6 per-

cent).

BIOSIGNALS2014-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

144

Table 4: Absolute deviation (+/-) from the target q for feature subset A [%].

Target ANN Random Forest

Range Min. 1.Qrt. Med. Avg. 3.Qrt. Max. Min. 1.Qrt. Med. Avg. 3.Qrt. Max.

1-10 0.0 18.2 37.2 55.1 74.2 420.7 0.0 4.1 8.8 12.4 15.7 209.7

10-100 0.0 15.1 32.4 40.9 56.6 353.7 0.0 6.9 15.8 20.4 29.0 170.7

100-1000 0.0 8.0 18.0 30.5 32.5 477.4 0.0 3.1 6.7 9.3 12.1 261.9

1000-10000 0.0 15.4 33.5 59.7 72.7 511.5 0.0 12.8 30.8 44.9 55.0 459.7

>10000 0.0 15.5 31.8 41.5 56.9 188.4 0.0 14.3 29.0 36.1 46.0 337.8

Table 5: Absolute deviation (+/-) from the target q for feature subset B [%].

Target ANN Random Forest

Range Min. 1.Qrt. Med. Avg. 3.Qrt. Max. Min. 1.Qrt. Med. Avg. 3.Qrt. Max.

1-10 0.0 7.7 17.9 29.6 37.9 435.2 0.0 1.5 3.4 5.1 6.7 181.5

10-100 0.0 12.6 26.5 39.8 49.2 558.7 0.0 4.3 10.0 14.1 19.7 156.4

100-1000 0.0 6.3 13.0 16.4 21.5 140.4 0.0 0.4 1.0 3.9 2.6 212.9

1000-10000 0.0 14.7 34.1 59.3 70.8 428.6 0.0 12.2 29.7 44.7 55.3 575.6

>10000 0.0 14.9 28.4 37.2 45.5 361.4 0.0 14.2 28.8 36.6 46.2 339.0

Table 6: Absolute deviation (+/-) from the target q for feature subset C [%].

Target ANN Random Forest

Range Min. 1.Qrt. Med. Avg. 3.Qrt. Max. Min. 1.Qrt. Med. Avg. 3.Qrt. Max.

1-10 0.0 8.2 18.7 30.5 38.1 525.3 0.0 1.5 3.3 4.9 6.4 93.4

10-100 0.0 9.7 21.5 29.0 38.1 878.8 0.00 3.9 9.2 13.5 18.9 127.0

100-1000 0.0 5.5 11.3 14.6 19.0 163.8 0.0 0.6 1.4 4.1 3.3 178.1

1000-10000 0.0 13.7 32.9 56.4 63.8 399.6 0.0 11.9 28.6 43.5 52.9 569.6

>10000 0.0 14.6 28.9 37.1 44.9 413.6 0.0 13.2 27.1 34.0 43.3 365.6

Table 7: Absolute deviation (+/-) from the target q for the top 15 features of subset D (D.15) [%].

Target ANN Random Forest

Range Min. 1.Qrt. Med. Avg. 3.Qrt. Max. Min. 1.Qrt. Med. Avg. 3.Qrt. Max.

1-10 0.0 9.1 19.6 32.3 36.1 507.6 0.0 2.4 5.1 6.6 9.1 230.4

10-100 0.0 10.0 22.5 32.0 41.5 383.6 0.0 4.6 10.3 14.6 20.1 155.6

100-1000 0.0 5.2 10.6 14.5 17.3 162.8 0.0 0.4 0.9 3.7 2.4 205.1

1000-10000 0.0 13.9 33.0 57.6 65.4 385.4 0.0 12.1 29.7 44.9 54.7 557.9

>10000 0.0 15.1 29.2 37.7 45.8 354.3 0.0 14.2 28.6 36.6 46.1 352.4

Table 8: Absolute deviation (+/-) from the target q for the top 20 features of subset E (E.20) [%].

Target ANN Random Forest

Range Min. 1.Qrt. Med. Avg. 3.Qrt. Max. Min. 1.Qrt. Med. Avg. 3.Qrt. Max.

1-10 0.0 11.5 24.6 37.1 47.8 466.5 0.0 2.5 5.3 7.4 9.8 277.6

10-100 0.0 9.8 21.9 28.4 38.2 463.8 0.0 4.5 10.5 14.4 20.0 184.1

100-1000 0.0 6.9 14.1 17.4 21.8 222.9 0.0 1.0 2.1 4.9 4.5 167.2

1000-10000 0.0 14.0 32.9 57.7 66.3 550.3 0.0 12.3 29.0 43.7 53.2 553.2

>10000 0.0 15.0 29.1 37.6 45.9 276.1 0.0 13.6 27.2 34.1 43.2 374.5

While random forests performed comparably to

MLPs in the two largest target ranges (q > 1000), test

error was constantly lower for the remaining ranges

(q < 1000). Except for feature subset A, mean er-

ror for these three subtasks lay constantly below 15

percent. Best overall mean errors were observed for

subset C with 4.9 percent (target range 1 < q < 10),

13.5 percent (10 < q < 100) and 4.1 percent (100 <

q < 1000).

3.3 Regression Subtasks by Clustering

Feature Subsets

Cluster analysis of the feature domain showed that

for all feature subsets (Figure 3a) and relevant fea-

ture sets (Figure 3b) the within-group-sum-of-squares

decreases rapidly (Figure 3a, 3b). Using more than

ten clusters, this did not decrease notably further for

any of the subsets or sets. Among subsets, the low-

est within-group-sum-of-squares is observed for sub-

set D.5, containing the 5 features top-ranked by the

AutomatedQuantificationoftheRelationbetweenResistor-capacitorSubcircuitsfromanImpedanceSpectrum

145

5e+06

1e+07

1.5e+07

2e+07

2.5e+07

3e+07

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Within Groups Sum of Squares

Number of Clusters

D.5

D.10

D.15

D.20

D.25

E.5

E.10

E.15

E.20

E.25

(a) (b)

100

200

300

400

500

600

700

800

900

0 1 2 3 4 5 6 7 8 9 10

Mean Absolute Deviation from Target [%]

Number of Clusters

(c)

100

200

300

400

500

600

700

800

900

0 1 2 3 4 5 6 7 8 9 10

Mean Absolute Deviation from Target [%]

Number of Clusters

(d)

Figure 3: (a) Within group sum of squares for clusters of feature subsets A, B, C, D and E plotted against the number of

clusters n. (b) Within group sum of squares for clusters of initial feature sets S where at least two features were part of the

feature subsets A, B or C, or were top-ranked by RGS (subset D) or Random Forests (subset E). (c) Error of predictions when

clustering training data consisting of the ﬁve top-ranked features of subset D (D.5) into n clusters. Each cluster is trained and

tested individually by Random Forests. Mean deviations from target q [±%] are displayed per cluster. (d) Error of predictions

when clustering training data based consisting of the median-normalized features of feature set S

∆φ

into n clusters. Each

cluster is trained and tested individually by Random Forests. Mean deviations from target q [±%] are displayed per cluster.

RGS algorithm. Among sets, none gained a value as

low as feature subset D.5. The best-performing set

was median-normalized S

∆φ

Training and predicting q based on individual

clusters of feature subset D.5 (Figure 3c) or of

median-normalized feature set S

∆φ

(Figure 3d) re-

spectively, showed no convergence of the test errors

per clustering. In particular, for all tested clusterings

at least one of the clusters exhibited a mean test er-

ror of more than hundred percent. Maximum error

of an individual cluster was 845.5 percent for D.5

with n = 10 clusters and 225.5 percent for median-

normalized feature set S

∆φ

with n = 9 clusters.

4 DISCUSSION

4.1 Complexity of the Regression Task

Using an increasing portion of the training data to

predict q with decision trees showed that even with

a very large number of training vectors, q can not

be predicted with satisfying precision (Figure 2). As

we had to split similar regression tasks on impedance

data into size-dependent (Schmid et al., 2013a) or

shape-dependent (Schmid et al., 2013b) subtasks in

previous work, this is not particularily surprising. For

the present task, however, neither of these approaches

were adequate. On the one hand, splitting according

BIOSIGNALS2014-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

146

to the spectrum size was not reasonable here, as q is a

size-independent variable. Splitting the task into sub-

tasks for semicircular (q < 5) and non-semicircular

spectra (q > 5), on the other hand, was tested but did

not solve the problem (cf. Table 2).

As an intuitive alternative, a logarithmic split of

the target domain into subdomains of varying range

was tested. This approach is based on the rationale

that spectra with very large q are less frequently ob-

served in practice as well as in the modeled datasets.

Compared to initial results (Figure 2), signiﬁcantly

lower error rates were observed when using decision

trees (Table 3) and reasonable error rates in 3 out of 5

subtasks (q < 1000) when using random forests (Ta-

ble 5 and 6). For the lowest target range (1 < q < 10),

random forests showed a satisfying mean error of

mostly less than ten percent, when applied to subset

C of even less than ﬁve percent.

4.2 Predictiveness of Feature Subsets

Evaluation of the 13 potentially predictive feature

subsets identiﬁed by the three distinct approaches

showed that at a number of ﬁve features (subsets A,

D.5, E.5) is likely not enough for precise predictions.

For ANNs as well as for Random Forests, using sub-

sets with ten or more features showed lesser mean er-

ror (cf. Table 4-8). We take this, again, as an indicator

that the given task is of complex nature.

Training Random Forests with subset C, a rela-

tively high precision of predictions is observed for the

target range 1 < q < 10. Not only is the mean error

lesser than ﬁve percent, but also is the third quantile

lesser than ten percent (Table 6). Similarily to obser-

vations on other feature subsets, however, the max-

imum error was considerably larger than that (here:

93.4 percent). While this deviation is currently too ex-

treme for practical applications, we are convinced that

such extreme errors can be reduced in future work.

An obvious trend among all predictions was the

fact that Random Forests did perform generally better

than ANNs. While some improvement might be pos-

sible here with more complex ANN architectures, we

take the obtained results as a trend indicating advan-

tages in using Random Forests.

4.3 Target-speciﬁc versus Feature

Subset-speciﬁc Task Differentiation

Clustering based on statistical features of impedance

spectra is possible for subsets A, B, C, D, E (Fig-

ure 3a) as well as for the initial feature sets S (Figure

3b). While the variance of the within groups sum of

squares is in general greater among the subsets than

among the sets, even the best performing subset (top-

ranked 15 features of subset D, D.15) shows a rela-

tively large within groups sum of squares.

For subset D.15, training and testing individually

for each cluster yielded considerably larger mean er-

rors than using logarithmic subdomains of q (Figure

3c). This effect seems not to dependent on the number

of clusters and was observed similarly when training

individually for each cluster of the best-performing

feature set (median-normalized S

∆φ

, Figure 3d). For

the given regression task, deriving subtasks based on

the target q therefore appears to be more fruitful than

deriving subtasks from the feature domain. In prac-

tice, however, matching measured spectra to such

subtasks would require a previous classiﬁcation step.

5 CONCLUSIONS

With the present study, we aimed at understanding the

nature of the relation between the τ quotient q and

the shape of an impedance spectrum obtained from

the given ﬁve-parameters electric circuit (Figure 1b).

Based on ideally modeled spectra, we found that the

task of predicting q from statistical features inherent

in each spectrum is of such complex nature that is has

to be further differentiated. Our results imply that de-

riving substasks by splitting the target domain is more

effective than deriving subtasks with respect to fea-

ture subset clusters.

When dividing the target domain into logarithmic

subdomains, we found that a relatively small num-

ber of statistical features is sufﬁcient for reasonable

predictions of q values <1000. Moreover, we could

show for values <10 that q can be estimated with a

satisfying mean error of less than ﬁve percent. As q

indicates in this particular range whether the spectrum

possesses a semicircular or a non-semicircular shape,

this result provides a basis for an automated discrim-

ination between these two spectrum types.

REFERENCES

Arras, M. K. and Mohraz, K. (1996). FORWISS Artiﬁ-

cial Neural Network Simulation Toolbox v.2.2. Bay-

erisches Forschungszentrum f

ur wissensbasierte Sys-

teme, Erlangen, Germany.

Breiman, L. (2001). Random forests. Machine learning,

45(1):5–32.

unzel, D., Zakrzewski, S. S., Schmid, T., Pangalos, M.,

Wiedenhoeft, J., Blasse, C., Ozboda, C., and Krug,

S. M. (2012). From ter to trans-and paracellular resis-

tance: lessons from impedance spectroscopy. Annals

AutomatedQuantificationoftheRelationbetweenResistor-capacitorSubcircuitsfromanImpedanceSpectrum

147

of the New York Academy of Sciences, 1257(1):142–

151.

Krug, S. M., Fromm, M., and G

unzel, D. (2009). Two-

path impedance spectroscopy for measuring paracel-

lular and transcellular epithelial resistance. Biophys J,

97(8):2202–2211.

Liaw, A. and Wiener, M. (2002). Classiﬁcation and regres-

sion by randomforest. R news, 2(3):18–22.

Navot, A., Shpigelman, L., Tishby, N., and Vaadia, E.

(2005). Nearest neighbor based feature selection

for regression and its application to neural activity.

Advances in Neural Information Processing Systems

(NIPS), 19.

Schmid, T., Bogdan, M., and G

unzel, D. (2013a). Discern-

ing apical and basolateral properties of ht-29/b6 and

ipec-j2 cell layers by impedance spectroscopy, mathe-

matical modeling and machine learning. PLOS ONE,

8(7).

Schmid, T., G

unzel, D., and Bogdan, M. (2013b). Efﬁcient

prediction of x-axis intercepts of discrete impedance

spectra. In Proceedings of the 21st European Sym-

posium on Artiﬁcial Neural Networks, Computational

Intelligence and Machine Learning (ESANN).

APPENDIX

A: Univariate Statistical Parameters

As described in the methods section, for each of the

initial feature sets (S

, S

ℜ

, S

ℑ

) and therefrom de-

rived feature sets (S

∆φ

, S

∆r

, S

∆ℜ

, S

∆ℑ

, S

) 16 uni-

variate parameters were calculated:

1. minimum

2. ﬁrst quartile

3. median

4. average

5. third quartile

6. maximum

7. standard deviation

8. variance

9. range

10. distance between median and average

11. interquartile distance

12. ﬁrst percentile

13. ninth percentile

14. interpercentile

15. geometric mean

16. harmonic mean

B: Features of Subset C

In order to complete our report, we list the 23 features

of subset C, which performed best among all subsets

for the target range 1 to 10:

1. ﬁrst quartile of S

ℜ

2. maximum of S

∆ℜ

3. maximum of S

ℑ

4. range of S

5. third quartile of S

∆φ

6. maximum of S

∆φ

7. variance of S

∆φ

8. range of S

∆φ

9. distance between mean and median of S

∆φ

10. distance between ﬁrst and ninth percentile of S

∆φ

11. maximum of mean-normalized S

12. ninth percentile of mean-normalized S

13. maximum of mean-normalized S

∆φ

14. interquartile range of mean-normalized S

∆φ

15. range of median-normalized S

∆φ

16. minimum of S

←−

17. ﬁrst quartile of S

←−

18. third quartile of S

←−

19. maximum of S

←−

20. standard deviation of S

←−

21. range of S

←−

22. ﬁrst percentile of S

←−

23. interquartile range of S

←−

BIOSIGNALS2014-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

148