ECG Biometrics Using a Dissimilarity Space Representation

Francisco Marques

, Carlos Carreiras

, Andr

e Lourenc¸o

2,3

, Ana Fred

1,2

and Rui Ferreira

Instituto Superior T

ecnico, Av. Rovisco Pais 1, Lisboa, Portugal

Instituto de Telecomunicac¸

oes, Av. Rovisco Pais 1, Lisboa, Portugal

Instituto Superior de Engenharia de Lisboa, R. Cons. Em

ıdio Navarro 1, Lisboa, Portugal

Hospital de Santa Marta, R. de Santa Marta 50, Lisboa, Portugal

Keywords:

ECG, Segmentation, Heartbeat, Feature Space, Dissimilarity Space, Dissimilarity Representation, Nearest

Neighbor, Authentication, Identiﬁcation, Biometrics.

Abstract:

Electrocardiogram (ECG) biometrics are a relatively recent trend in biometric recognition, with at least 13

years of development in peer-reviewed literature. Most of the proposed biometric techniques perform classiﬁ-

cation on features extracted from either heartbeats or from ECG based transformed signals. The best represen-

tation is yet to be decided. This paper studies an alternative representation, a dissimilarity space, based on the

pairwise dissimilarity between templates and subjects’ signals. Additionally, this representation can make use

of ECG signals sourced from multiple leads. Conﬁgurations of three leads will be tested and contrasted with

single-lead experiments. Using the same k-NN classiﬁer the results proved superior to those obtained through

a similar algorithm which does not employ a dissimilarity representation. The best Authentication EER went

as low as 1.53% for a database employing 503 subjects. However, the employment of extra leads did not prove

itself advantageous.

1 INTRODUCTION

1.1 ECG Basics

Humans possess an intrinsic ability to recognize pat-

terns and observations as part of a certain class. Often

the class-assigning process is immediate and deemed

obvious. However, when asked to explain how that

conclusion was achieved, the human observer is un-

able to detail the iterations that led him to it.

A biometric system attempts to replicate this be-

haviour and pursue an even more demanding goal –

subject identiﬁcation. For that purpose a speciﬁc pro-

cedure must be designed which will employ sensors

to measure the data requested by it.

In what concerns the ﬁeld of Electrocardiogram

(ECG) Biometrics the sensors record the heart’s elec-

trical activity – the ECG itself – and build a proce-

dure for feature extraction and classiﬁcation. An ECG

heartbeat is composed by three main components: P

wave, QRS complex, and T wave, illustrated in Figure

The measurement of an ECG is obtained via cor-

rect placement of a number of electrodes in speciﬁc

parts of the body and by tracing voltages between

Figure 1: A labeled ECG waveform.

them. Speciﬁc linear combinations between electrode

potentials result in different leads. For instance, lead

I corresponds to the voltage between the electrodes in

the right and left arms, as stated by Einthoven’s Tri-

angle (Conover, 2003) shown in Figure 2.

An ECG biometric system works just like any

other biometric system or any supervised learning

algorithm. Its functioning can be divided into two

modes of operation – enrolment and classiﬁcation.

350

Marques F., Carreiras C., Lourenço A., Fred A. and Ferreira R..

ECG Biometrics Using a Dissimilarity Space Representation.

DOI: 10.5220/0005289303500359

In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS-2015), pages 350-359

ISBN: 978-989-758-069-7

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 2: Figure representing Einthoven’s Triangle and re-

spective leads. The lead voltages are taken from the plus to

the minus signs.

During enrolment the system is given a user’s ID

and extracts his/her information – the training data.

From this data the classiﬁer generates the templates

which best represent the user and build the system’s

database. More enrolled subjects imply greater dif-

ﬁculty for identiﬁcation as the number of possible

choices increases (and the probability of having an

individual with similar data also increases). This ef-

fect can be countered by storing more templates for a

given subject.

The second mode, classiﬁcation extracts a current

user’s data and compares it to the registered training

data. It then outputs the best match, i.e. the correct

user in the case of a biometric system. Classiﬁca-

tion can be divided into two procedures, authentica-

tion and identiﬁcation.

Authentication

The system is provided an ID and will conﬁrm

or deny that a given user is who he/she claims to

be. For that purpose the classiﬁer performs only

comparisons with respect to the given ID’s stored

templates.

Identiﬁcation

In this case, the classiﬁer is not given any ID

by the user. As a result, it must search its en-

tire database for the subject whose templates best

match the input data.

1.2 ECG Biometrics

All the studies on ECG biometrics are based on one-

lead, two-lead, three-lead, or 12-lead ECG signals,

mirroring what is most commonly employed in clin-

ical situations (Odinaka et al., 2012). The most

widespread lead conﬁguration used in ECG biomet-

rics is by far the one-lead conﬁguration as (Biel et al.,

2001) demonstrated that a signal from one lead con-

tains enough information to form the basis for a bio-

metric system. Nonetheless, some studies have re-

sorted to multiple-lead ECG signals in their biometric

strategy. (Ye et al., 2010) utilized two leads (Fang and

Chan, 2009) (W

ubbeler et al., 2007) and (Labati et al.,

2013) opted for three leads and (Biel et al., 2001) and

(Agraﬁoti and Hatzinakos, 2008) used 12 leads. The

technique proposed in this article will be tested in a

single and triple-lead scenario, employing the limb-

lead I or all the limb-leads I, II and III respectively.

With an ECG signal in hand, regardless of the

lead, there are several existing techniques for carrying

out the feature extraction procedure. These fall into

three different categories in what concerns the type of

features they employ (Odinaka et al., 2012).

Fiducial Features

These methods use characteristic points of an

ECG heartbeat and/or relationships between them

as features – the ﬁducial features. Characteristic

points include the peak of the R-wave, while a re-

lationship between points can be the temporal du-

ration of a QRS impulse. Several combinations of

the four types of ﬁducial features have been used

in the literature (Singla and Sharma, 2010) – tem-

poral, amplitude, angle and dynamic (R-R inter-

vals) (Odinaka et al., 2012).

Non-Fiducial Features

Techniques based on non-ﬁducial features do

not use characteristic points as features. De-

spite that, most of them rely on some charac-

teristic points for heartbeat segmentation (Chan

et al., 2008) (Fatemian and Hatzinakos, 2009)

while others simply create windows from the

ECG recording (Wang et al., 2013) (Plataniotis

et al., 2006) (Agraﬁoti and Hatzinakos, 2008) (Li

and Narayanan, 2010). Afterwards these seg-

ments/windows are transformed into another do-

main so as to extract features from the resulting

signals.

Hybrid Features

Algorithms in this group resort to both ﬁducial

and non-ﬁducial features for their biometric sys-

tem. Some use a combination of them as features

(Wang et al., 2008) (Silva et al., 2007). Others

design two classiﬁers where the ﬁrst uses non-

ﬁducial features to reduce the match set and the

second outputs the classiﬁcation while being fed

with the ﬁducial features (Shen and Hu, 2011)

(Shen et al., 2002).

Having assembled the feature space some algo-

rithms additionally perform its dimensionality reduc-

tion. Afterwards, the reduced or raw feature space is

then directly used for the training of classiﬁers. Re-

gardless, it must be emphasised that the ideal feature

representation is yet to be found – all the existing ones

present pros and cons in relation to one another.

ECGBiometricsUsingaDissimilaritySpaceRepresentation

351

It is on these grounds that the current article intro-

duces the concept of a feature space based on a Dis-

similarity Representation for ECG biometrics by re-

sorting to comparisons between signals as inspired by

the study in (Duin and Pe¸kalska, 2011). The current

study explores this possibility by designing a multi-

lead conﬁguration and comparing it with the single-

lead version. Dissimilarity Representations will be

explained in Section 3.

Moreover Section 2 describes the notation used in

this paper. Section 4 then outlines the pre-processing

feature extraction and template generation carried out

on the ECG signals, as well as the utilized classiﬁer.

Section 5 outlines the executed experiments and their

results. This paper is concluded in Section 6 which

draws the main ﬁndings and conclusions.

2 NOTATION

In order to better understand the methodology pro-

posed in this paper as well as the process of building

the dissimilarity representation, the notation that will

be employed throughout the paper is here presented.

The basic element behind the proposed method’s

biometric system is the ECG heartbeat. Heartbeats

will be employed as templates and dissimilarities will

be calculated through comparisons between them.

The current work utilized a Sampling Frequency of

500 Hz. With this in mind we shall consider:

• A population of S existing subjects;

• A percentage p which is considered sufﬁcient to

represent the whole population variation. From

this percentage, a number of subjects S

will be

randomly chosen from S. For this work p is set to

15%.

• A set C of leads. For the scope of this work, C’s

elements will be either the single-lead [I] or the 3

limb-leads [I, II, III]. The variable L is deﬁned as

L = |C|.

• N

as the number of extracted heartbeats from sub-

ject i’s ECG, i = 1, . . . ,S.

• N =

∑

i=1

as the number of all extracted heart-

beats over all subjects.

• A heartbeat belonging to subject i is denoted by

with j = 1, . . . , N

and l ∈ C. The variable h

implies a heartbeat originating from lead I. Beats

are represented by a 600 ms window which is

built having the R-peak as a reference at position

200 ms. Thus, for the employed Sampling Fre-

quency, they are composed of 300 samples.

• A reference lead, R, which for the scope of this

work will be lead I. Note that h

will correspond

to signals taken from lead R.

• A Feature Space F, represented by an N × M ma-

trix. M consists of the 300 samples present in a

heartbeat.

Two metrics have been employed in various steps

of the proposed methodology – the euclidean distance

and the cosine similarity. For the sake of clarity they

are respectively deﬁned here in Equations 1 and 2.

D(h

, h

) =

− h

)

− h

) (1)

D(h

, h

) = 1 −

· h

kkh

(2)

where k.k represents the euclidean norm.

3 DISSIMILARITY

REPRESENTATION

A Dissimilarity Representation is built on the fact that

similarity between objects plays a crucial role in class

formation, i.e. a class is a set of similar objects (Duin

and Pe¸kalska, 2012). A universal object similarity,

however, does not exist and always depends on the

classiﬁcation context, procedure and/or the domain of

study. Moreover, the presence of other classes will in-

ﬂuence the degree to which an object should or should

not be assigned to a particular class.

This paper puts forward the notion of dissimilar-

ity between ECG elements. Calculating a dissimilar-

ity is simply comparing elements, pairwise, according

to some pre-deﬁned rules (Duin and Pe¸kalska, 2012).

Metrics, for instance, ﬁt this criteria. The present

study will explore two different ones – euclidean dis-

tance and cosine similarity.

A dissimilarity-based representation can be con-

structed from any type of elements which can be any

kind of feature array possible. Also, they can be built

from as many comparisons between elements as one

wishes. Consequently, note that this process is eas-

ily extensible to using more than one-lead by simply

comparing one lead’s elements with another lead’s el-

ements.

A dissimilarity space intends to take the original

feature space F and output another one, F

, by taking

pairwise distances between ECG elements i, where

i = 1, . . . , N. This paper proposes two different ap-

proaches for deﬁning the dissimilarity space. Subsec-

tion 4.4 details their development.

BIOSIGNALS2015-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

352

4 PROPOSED METHODOLOGY

The whole methodology for the proposed technique

is summarized in Figure 3.

Figure 3: Block diagram of the proposed ECG biometric

system. Note that lead C simply refers to an arbitrary lead.

Also, the suspension points underline the possibility of us-

ing additional leads.

4.1 Pre-processing

Due to the presence of several noise sources dur-

ing measurement – power line interference, electrode

contact loss, baseline drift due to respiration and mo-

tion artefacts, for example (Friesen et al., 1990)– it is

imperative to ﬁlter the signal to facilitate the proce-

dure of feature extraction.

In order to remove baseline drift from raw ECG

signals we apply the method by (de Chazal et al.,

2004). It applies two median ﬁlters (of 200 ms and

600 ms respectively) to the raw ECG signal and then

proceeds to subtract the resulting waveform to the raw

ECG.

Afterwards, an additional ﬁltering stage elimi-

nates the effect of other noise sources through band-

pass ﬁltering. The ﬁlter consists of a 5 → 20 Hz band-

pass FIR ﬁlter with order 150. This frequency band

removes a substantial part of the information present

in heartbeats but compensates by successfully clean-

ing the signal of any motion artefacts and power line

interference as well as reducing electromyographic

noise effects.

4.2 Segmentation

In order to segment the ECG signal into heartbeats,

R-peak detection is ﬁrst carried out as in (Hamil-

ton, 2002). Beat detection is performed over a trans-

formed signal which is obtained according to the dia-

gram in Figure 4. Its basic detection rules are as fol-

lows:

Figure 4: Transformation process applied to the ECG signal

so as to perform beat detection.

1. Ignore all peaks that precede or follow larger

peaks by less than 200 ms.

2. If the peak occurred within 360 ms of a previ-

ous detection and had a maximum slope less than

0.7×the maximum slope of the previous detection

assume it is a T-wave.

3. If the peak is larger than the detection threshold

– DT – call it a QRS complex; otherwise call it

noise.

4. If an interval equal to 1.5 times the average R-to-

R interval has elapsed since the most recent detec-

tion, check for a peak larger than

within that

interval. If the peak followed the preceding detec-

tion by at least 360 ms then classify that peak as a

QRS complex.

The detection threshold – DT mentioned in

rules 3 and 4 – resorts to the following data

structures:

QRS-peak buffer

Stores the 8 most recent R-peak values. Its entries

are used in the detection threshold – DT – calcula-

tion. It is initialized with the highest-valued peaks

in one second intervals for 8 seconds;

Noise-peak buffer

Behaves like the previous structure but stores the 8

most recent noise-peak values instead. It is, how-

ever, initialized at 0;

RR-interval buffer

Stores the 8 most recent interval between R-peaks.

These are initialized at a value corresponding to a

1 s interval.

The DT threshold is computed as:

DT = Noise Peak Bu f f

med

+ T H(QRS Peak Bu f f

med

− Noise Peak Bu f f

med

) (3)

where Noise

Peak Bu f f

med

and QRS Peak Bu f f

med

are the median of the Noise-peak buffer and QRS-peak

buffer arrays respectively. TH = 0.45 was empirically

found most suitable for the database used in testing.

ECGBiometricsUsingaDissimilaritySpaceRepresentation

353

However, the peaks detected on the transformed

signal do not map directly to the R-peaks in the ECG

signal. A post-processing phase is necessary so as

to perform adequate R-peak detection. The points

found are then mapped to the exact R-peak location

by searching within ±150 ms and choosing the high-

est sloped peak of the two largest ones found in that

interval. This process is executed only for lead I –

the beats are then matched on the other utilized leads

by once again looking for the highest sloped peak of

the two largest ones, but this time in a smaller interval

consisting of ±60 ms.

Having the location of the R-peaks for all leads,

the segments are constructed simply by taking the

ECG window from 0 ms to 600 ms, where 200 ms

correspond to the R-peak. As mentioned in Section 2,

all segments have a ﬁxed length of 600 ms.

4.2.1 Outlier Detection

Outlier removal is performed as in (Lourenc¸o et al.,

2013). It will only be executed for lead I – the beats

here discarded will be discarded for the other em-

ployed leads as well.

This algorithm receives as input the N

heartbeats

for subject i and begins by calculating the average

beat via:

∑

j=1

, (4)

Heartbeats which stray away from the average beat

are discarded according to the following procedure:

1. For each h

, compute its distance D(h

, h

) to

the mean waveform h

2. Compute the 1

and 2

order statistical moments

of the distances D(·, h

); µ

D(·,h

)

corresponds to

the mean value and σ

D(·,h

)

to the standard devi-

ation.

3. Compute the median of the minimum and maxi-

mum values over all templates h

, denoted as h

med

min

and h

med

max

respectively.

4. Verify the conditions below for every h

. If any is

conﬁrmed, then h

is discarded as an outlier:

(a) h

min

< 1.5 × h

med

min

, h

min

is the minimum value

for beat h

;

(b) h

max

> 1.5× h

med

max

, h

max

is the maximum value

for beat h

;

, h

) > µ

D(h

)

+ 0.5 × σ

D(h

)

;

Note that D(h

, h

) can refer to any distance metric.

The present work utilizes the cosine distance in 2.

4.3 Template Generation

The problem of template selection may be posed as

follows: given a set of N heartbeats, select K tem-

plates that “best” represent the variability as well as

the typically observed patterns according to a given

similarity criterion (Lourenc¸o et al., 2014).

Clustering methods are especially adequate for

this task, and have already been used for template se-

lection in other modalities (Uludag et al., 2004; Con-

nell and Jain, 1999; Liu and Wang, 2008; Lumini and

Nanni, 2006). In this paper the K-means algorithm

was used, with K empirically set to 5, the cluster’s

centroids being used as templates (Lourenc¸o et al.,

2014).

4.4 Dissimilarity Computation

Dissimilarities can be calculated by measuring a dis-

tance between beats according to a metric. Two such

metrics have been explored in this study – euclidean

distance and cosine similarity referred respectively in

Equations 1 and 2. In Section 5 their effects can be

contrasted.

Two different dissimilarity extraction techniques

are hereby proposed.

Subject based

This ﬁrst and simplest approach computes the dis-

tance D(h

, h

) between each segment h

and the

set of h

template beats for each lead in set C.

The process is repeated for all subjects S. It is

presented in pseudo-code below.

for each subject i in S do

for each beat j in N

= [ ]

for each lead l in C do

for each template t do

.append(D(h

, h

))

Considering T templates per subject, the resulting

dissimilarity representation is then an N × T*L

matrix composed by N dissimilarity arrays with

T ∗ L components.

Inter-subject based

A second strategy computes the distance

D(h

, h

) between each segment h

and the set

of h

template beats of the randomly chosen S

subjects for each lead in set C. The following

pseudo-code outlines these steps.

for all S subjects do

for each beat j in N do

= [ ]

for each lead l in C do

BIOSIGNALS2015-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

354

for each subject s in S

for each template t do

.append(D(h

, h

))

Once again, considering T templates per sub-

ject, the obtained dissimilarity representation is an

N × T*S

*L matrix consisting of N dissimilarity

arrays composed by T ∗ S

∗ L elements.

4.5 Classiﬁcation

In classiﬁcation, both authentication and identiﬁca-

tion follow the same principles for matching. A fea-

ture space comprising dissimilarities supports a large

variety of classiﬁers (Duin et al., 2010). The current

work employs a k-Nearest Neighbours (k-NN) model

applied to dissimilarity arrays, with k set to 3.

The steps taken during the classiﬁcation process

depend on the dissimilarity representation approach

taken in Sub-section 4.4. During enrolment, for a

Subject based approach, the classiﬁer will store all the

S users’ template heartbeats h

for all C leads.

As for the Inter-subject method the algorithm

saves only the template beats belonging to the S

sub-

jects (which have been recorded prior to enrolment)

over the C used leads.

(a) Enrolment for a Subject based

Dissimilarity Representation.

(b) Enrolment for an Inter-subject

based Dissimilarity Representation.

Figure 5: Enrolment for both approaches.

Both methods store their N respective template

dissimilarity arrays calculated for all the extracted

heartbeats h

i j

over S. Consider v

as one such tem-

plate. See Figure 5 for an illustration of this mode.

As for authentication and identiﬁcation, for each dis-

similarity representation approach:

Subject based

Authentication or identiﬁcation determine the set

of retrieved h

templates – in the former they only

originate from the requested subject while in the

latter they are obtained from the S subjects.

Inter-subject based

In this case, the classiﬁer calculates one single dis-

similarity representation with respect to the tem-

plates originating from the S

determined subjects

– regardless of the procedure taken.

Figure 6: Block diagram illustrating the identiﬁcation pro-

cedure for the Subject based dissimilarity approach.

The obtained dissimilarity arrays are called v

Then, the template dissimilarity arrays v

chosen for

the determination of the 3-NN are either sourced from

the input subject – for authentication – or obtained

from all subjects – for identiﬁcation.

Distances between dissimilarity vectors D(v

, v

)

are once again measured according to both euclidean

distance and cosine similarity referred in Equations 1

and 2. In Section 5 their effect in classiﬁcation is com-

pared as well. See Figures 6 and 7 for the classiﬁca-

tion procedure for the Subject based and Inter-subject

based dissimilarity approaches respectively.

After calculating all the D(v

, v

), the 3 small-

est distances are taken as the 3-Nearest Neighbours.

They will then be compared with a threshold, th

auth

for authentication or th

for identiﬁcation. This

threshold will validate the distances’ votes according

to:

<= th (5)

where d

is one of the 3 resulting distances and th

is either th

auth

or th

according to the chosen mode.

Distances d

not respecting Equation 5 are not consid-

ered in the ﬁnal classiﬁcation. If at least 2 d

distances

have been validated, then for authentication the input

user is conﬁrmed as valid. In identiﬁcation the most

voted user corresponding to those d

is provided as

ECGBiometricsUsingaDissimilaritySpaceRepresentation

355

Figure 7: Block diagram illustrating the identiﬁcation pro-

cedure for the Inter-subject based dissimilarity approach.

output. However, for this mode, if no majority exists

for the valid d

, then identiﬁcation fails.

5 RESULTS

In order to provide a thorough evaluation of the pro-

posed method, we applied it to a database provided by

a local cardiac hospital, Hospital de Santa Marta, that

has been previously validated in terms of biometric

performance in (Carreiras et al., 2014).

The used ECG records were acquired during nor-

mal hospital operation, encompassing scheduled ap-

pointments, emergency cases, and bedridden patients.

This study focuses on signals originating from healthy

individuals. All signals were acquired using Philips

PageWriter Trim III devices, following the standard

12-lead placement, with a sampling rate of 500 Hz

and 16 bit resolution. Each record has a duration of

10 s. 832 records were then employed belonging to

618 subjects.

The various results are compared through Re-

ceiver Operating Characteristic (ROC) and corre-

sponding Equal Error Rates (EER) for the authentica-

tion mode of operation while the identiﬁcation mode

is characterized by its Identiﬁcation Error (EID) as in:

+ F

(6)

where F

and T

correspond to the number of incor-

rect and correct identiﬁcations respectively.

All of the designed experiments were executed ac-

cording to a simple random approach. The existing

valid heartbeats were randomly partitioned into two

sets, a training set containing 75% of those beats and

a test set composed by the remaining 25%, whose

beats are individually evaluated. This procedure was

repeated 10 times and the average EER and EID were

calculated from all these 10 runs. The minimum num-

ber of training heartbeats is 5, an amount demanded

by the template generation block. Some subjects did

not satisfy this requirement and were thus not consid-

ered for testing. The employed number of subjects

thus becomes 503.

The following experimental set-ups were tested.

The effect of the metric choice over both dissimilarity

computation in Section 4.4 and classiﬁcation in Sec-

tion 4.5 was tested. The chosen metrics for both steps

were the cosine similarity and euclidean distance. To

test all the possible metric combination, 4 different

scenarios are possible. Given the existence of two

methods for dissimilarity computation, the number of

experiments rises to 8 for each lead conﬁguration C.

Experiments over all metrics were only carried out

with C = [I, II, III]. Due to time constraints, the set

of metrics corresponding to the best results for each

dissimilarity method were employed for C = [I]. The

total number of experiments is then 10. Tags are given

to these experiments so as to facilitate the observation

of results.

Subject based (B)

This approach is given the tag B. The employed

metric scenario for a given experiment will be de-

scribed by two letters C or E whether the em-

ployed metric is a cosine similarity or euclidean

distance respectively. The ﬁrst letter will corre-

spond to the metric utilized in the dissimilarity

computation process while the second refers to the

metric used in the classiﬁcation procedure. The

utilized set C is shown subscript to the letter B.

As a result, a tag of B

I,II,III

− CE pinpoints the

use of cosine similarity for dissimilarity calcula-

tion and the use of euclidean distance to get the

distances between dissimilarity arrays, for a set

C = [I, II, III]. As a result, for approach B there

are:

• B

I,II,III

−CC

• B

I,II,III

−CE

• B

I,II,III

− EC

• B

I,II,III

− EE

• B

−CE

Inter-subject based (I)

The tag I was attributed to experiments following

this approach. The notation associated to them

follows the same as in the previous method. As

such:

BIOSIGNALS2015-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

356

• I

I,II,III

−CC

• I

I,II,III

−CE

• I

I,II,III

− EC

• I

I,II,III

− EE

• I

−CC

Additionally, results obtained from the technique

in (Carreiras et al., 2014) are contrasted with the ones

hereby obtained. This technique is similar to the one

presented in this paper but ditches the dissimilarity

computation phase. Instead, it uses heartbeats as fea-

ture arrays and then applies k-NN directly to them.

Also this method is exclusively single-lead. It shall

be tagged with the expression O

. Table 1 summa-

rizes results obtained for all experiments in the form

of averaged EERs and EIDs and respective standard

deviations.

Table 1: Results all experiments’ EER & EID rates.

EER [%] EID [%]

8.51 ± 0.30 12.04 ± 0.68

I,II,III

CC 21.55 ±0.36 87.97 ± 2.21

CE 4.45 ± 0.13 8.89 ± 1.11

EC 30.25 ±0.34 92.59 ± 1.78

EE 1.53 ± 0.09 23.46 ± 1.96

CE 4.45 ± 0.09 9.92 ± 0.76

I,II,III

CC 2.46 ± 0.09 5.48 ± 0.33

CE 4.76 ± 0.24 6.90 ± 0.38

EC 10.75 ±0.06 13.02 ± 0.68

EE 3.83 ± 0.11 19.74 ± 0.50

CC 2.46 ± 0.09 5.37 ± 0.48

The ROC curve for the experiment B

I,II,III

− EE

is shown in Figure 8(a). A ROC curve comparing all

the experiments in authentication mode is shown in

Figure 8(b).

The results in Table 1 suggest the following obser-

vations.

• Classiﬁers whose training computes dissimilari-

ties based on a cosine metric proved much supe-

rior to their euclidean distance equivalents. This

was expected due to the large initial feature size of

M = 300. The exception concerns authentication

results for CE vs EE, in which the latter is better.

• For a Subject-based approach, experiments em-

ploying a cosine similarity metric during classi-

ﬁcation present very high EER and EID indices.

This was expectable due to the very small dissim-

ilarity array size (T ∗L = 15) rendering the cosine

similarity metric unable to extract viable informa-

tion. Precisely the opposite can be said for an

(a) ROC curve and obtained EER for the exper-

iment B

− EE.

(b) ROC curve for all the executed experiments.

Figure 8: ROC curves outlining obtained results for authen-

tication.

Inter-subject approach which presents a dissimi-

larity array size of (T ∗ S

∗ L = 1125).

• Both single-lead experiments present better re-

sults than those obtained from O

. From here it

is possible to conclude that a dissimilarity based

representation can originate a more effective bio-

metric system.

• Authentication EERs did not vary signiﬁcantly

by changing the lead-set C. However, EIDs for

a triple-lead set improved in a subject based ap-

proach and deteriorated for an inter-subject based

one. The following are the possible explanations

for what happened.

– For a subject based approach B, the single-lead

dissimilarity array size is very small (T ∗L = 5).

The three extra leads raise this dimension to 15

which is a signiﬁcant increase in information.

ECGBiometricsUsingaDissimilaritySpaceRepresentation

357

– For an inter-subject based approach I, the

single-lead dissimilarity array size is already

relatively large (T ∗ S

∗ L = 375). Increasing

it further does not contribute to the rise in in-

formation. Thus the EID degrades slightly.

• Lastly, the lowest EER for authentication origi-

nated from B

I,II,III

−EE. However it also presents

a high EID. The lowest EID is shown by I

I,II,III

−

CC which also gave a good and second lowest

EER value.

6 CONCLUSIONS AND FUTURE

WORK

The number of possible ECG representations is end-

less and so far none has managed to stand out at the

expense of all the others. In this paper a new ECG rep-

resentation space is developed and integrated into an

already existing biometric system. This feature space

is built through dissimilarity computation, where the

new features are a direct and pairwise comparison be-

tween those present in two signals, which here were

taken via metrics.

Moreover, the computation of this novel repre-

sentation can be extended to various types of ECG

conﬁgurations or signals, underlining its versatility.

The current study extended its usage to multi-lead

ECG signals, where an EER rate of 1.53% has been

achieved, for authentication over a database of 503

subjects as well as an EID rate of 5.65%. It should

be emphasized that for authentication/identiﬁcation a

single heartbeat was used. The usage of a larger num-

ber of beats for classiﬁcation will likely lead to better

results. When contrasting with the original technique,

which does not compute a dissimilarity representa-

tion, this feature space returns better results, proving

the usefulness of such a representation. However, as

in previous work, the usage of more than one lead did

not signiﬁcantly improve results.

ACKNOWLEDGEMENTS

This work was partially funded by Fundac¸

ao para a

encia e Tecnologia (FCT) under grants PTDC/EEI-

SII/2312/2012, and SFRH/PROTEC/49512/2009,

whose support the authors gratefully acknowledge.

We would also like to thank Joana Santos for her hard

work labelling the ECG records.

REFERENCES

Agraﬁoti, F. and Hatzinakos, D. (2008). Fusion of ECG

sources for human identiﬁcation. In Communications,

Control and Signal Processing, 2008. ISCCSP 2008.

3rd International Symposium on, pages 1542–1547.

Biel, L., Pettersson, O., Philipson, L., and Wide, P. (2001).

ECG analysis: a new approach in human identiﬁca-

tion. Instrumentation and Measurement, IEEE Trans-

actions on, 50(3):808–812.

Carreiras, C., Lourenc¸o, A., Fred, A., and Ferreira, R.

(2014). ECG signals for biometric applications - are

we there yet? In 11th Int. Conf. on Informatics in

Control, Automation and Robotics, pages 765–772.

Scitepress.

Chan, A. D. C., Hamdy, M., Badre, A., and Badee, V.

(2008). Wavelet distance measure for person identiﬁ-

cation using electrocardiograms. Instrumentation and

Measurement, IEEE Transactions on, 57(2):248–253.

Connell, S. D. and Jain, A. K. (1999). Template-based on-

line character recognition. Pattern Recognition, 34:1–

14.

Conover, M. (2003). Understanding Electrocardiography.

Mosby.

de Chazal, P., O’Dwyer, M., and Reilly, R. (2004). Auto-

matic classiﬁcation of heartbeats using ECG morphol-

ogy and heartbeat interval features. Biomedical Engi-

neering, IEEE Transactions on, 51(7):1196–1206.

Duin, R., Loog, M., Pe¸kalska, E., and Tax, D. (2010).

Feature-based dissimilarity space classiﬁcation. In

Unay, D., C¸ taltepe, Z., and Aksoy, S., editors, Recog-

nizing Patterns in Signals, Speech, Images and Videos,

volume 6388 of Lecture Notes in Computer Science,

pages 46–55. Springer Berlin Heidelberg.

Duin, R. P. and Pe¸kalska, E. (2011). The dissimilarity

representation for structural pattern recognition. In

San Martin, C. and Kim, S.-W., editors, Progress in

Pattern Recognition, Image Analysis, Computer Vi-

sion, and Applications, volume 7042 of Lecture Notes

in Computer Science, pages 1–24. Springer Berlin

Heidelberg.

Duin, R. P. and Pe¸kalska, E. (2012). The dissimilar-

ity space: Bridging structural and statistical pattern

recognition. Pattern Recognition Letters, 33(7):826 –

832.

Fang, S.-C. and Chan, H.-L. (2009). Human identiﬁ-

cation by quantifying similarity and dissimilarity in

electrocardiogram phase space. Pattern Recognition,

42(9):1824 – 1831.

Fatemian, S. and Hatzinakos, D. (2009). A new ECG fea-

ture extractor for biometric recognition. In Digital

Signal Processing, 2009 16th International Confer-

ence on, pages 1–6.

Friesen, G., Jannett, T., Jadallah, M., Yates, S., Quint, S.,

and Nagle, H. (1990). A comparison of the noise sen-

sitivity of nine qrs detection algorithms. Biomedical

Engineering, IEEE Transactions on, 37(1):85–98.

Hamilton, P. (2002). Open source ECG analysis. In Com-

puters in Cardiology, 2002, pages 101–104.

BIOSIGNALS2015-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

358

Labati, R. D., Sassi, R., and Scotti, F. (2013). ECG biomet-

ric recognition: permanence analysis of qrs signals for

24 hours continuous authentication. In IEEE Interna-

tional Workshop on Information Forensics and Secu-

rity (WIFS), pages 31–36. Institute of electrical and

electronics engineers.

Li, M. and Narayanan, S. (2010). Robust ECG biomet-

rics by fusing temporal and cepstral information. In

Pattern Recognition (ICPR), 2010 20th International

Conference on, pages 1326–1329.

Liu, N. and Wang, Y. (2008). Template selection for on-line

signature veriﬁcation. In Proc. of the 19th Int. Conf.

on Pattern Recognition (ICPR), pages 1–4.

Lourenc¸o, A., Carreiras, C., Silva, H., and Fred, A. (2014).

ECG biometrics: A template selection approach. In

Medical Measurements and Applications (MeMeA),

2014 IEEE International Symposium on, pages 1–6.

Lourenc¸o, A., Silva, H., Carreiras, C., et al. (2013). Out-

lier detection in non-intrusive ECG biometric sys-

tem. In Image Analysis and Recognition, pages 43–52.

Springer Berlin Heidelberg.

Lumini, A. and Nanni, L. (2006). A clustering method

for automatic biometric template selection. Pattern

Recognition, 39(3):495–497.

Odinaka, I., Lai, P.-H., Kaplan, A., O’Sullivan, J., Sirevaag,

E., and Rohrbaugh, J. (2012). ECG biometric recog-

nition: A comparative analysis. Information Forensics

and Security, IEEE Transactions on, 7(6):1812–1824.

Plataniotis, K., Hatzinakos, D., and Lee, J. (2006). ECG

biometric recognition without ﬁducial detection. In

Biometric Consortium Conference, 2006 Biometrics

Symposium: Special Session on Research at the, pages

1–6.

Shen, T. W., Tompkins, W., and Hu, Y. H. (2002). One-

lead ECG for identity veriﬁcation. In Engineering in

Medicine and Biology, 2002. 24th Annual Conference

and the Annual Fall Meeting of the Biomedical Engi-

neering Society EMBS/BMES Conference, 2002. Pro-

ceedings of the Second Joint, volume 1, pages 62–63

vol.1.

Shen, Tsu-Wang, T. W. J. and Hu, Y. H. (2011). Implemen-

tation of a one-lead ECG human identiﬁcation system

on a normal population. Journal of Engineering and

Computer Innovations, 2(1):12–21.

Silva, H., Gamboa, H., and Fred, A. (2007). One lead

ECG based personal identiﬁcation with feature sub-

space ensembles. In Perner, P., editor, Machine Learn-

ing and Data Mining in Pattern Recognition, volume

4571 of Lecture Notes in Computer Science, pages

770–783. Springer Berlin Heidelberg.

Singla, S. and Sharma, A. (2010). ECG as biometric in the

automated world. International Journal of Computer

Science & Communication, 1:281–283.

Uludag, U., Ross, A., and Jain, A. (2004). Biometric tem-

plate selection and update: a case study in ﬁngerprints.

Pattern Recognition, 37(7):1533–1542.

Wang, J., She, M., Nahavandi, S., and Kouzani, A. (2013).

Human identiﬁcation from ECG signals via sparse

representation of local segments. Signal Processing

Letters, IEEE, 20(10):937–940.

Wang, Y., Agraﬁoti, F., Hatzinakos, D., and Plataniotis, K.

(2008). Analysis of human electrocardiogram for bio-

metric recognition. EURASIP Journal on Advances in

Signal Processing, 2008(1):148658.

ubbeler, G., Stavridis, M., Kreiseler, D., Bousseljot, R.-

D., and Elster, C. (2007). Veriﬁcation of humans using

the electrocardiogram. Pattern Recognition Letters,

28(10):1172 – 1175.

Ye, C., Coimbra, M., and Kumar, B. (2010). Investigation

of human identiﬁcation using two-lead electrocardio-

gram (ECG) signals. In Biometrics: Theory Applica-

tions and Systems (BTAS), 2010 Fourth IEEE Interna-

tional Conference on, pages 1–8.

ECGBiometricsUsingaDissimilaritySpaceRepresentation

359