Personal Identiﬁcation and Authentication based on

One-lead ECG using Ziv-Merhav Cross Parsing

David Pereira Coutinho

1,3

, Ana L. N. Fred

2,3

and M´ario A. T. Figueiredo

2,3

Instituto Superior de Engenharia de Lisboa, Lisboa, Portugal

Instituto Superior T´ecnico, Lisboa, Portugal

Instituto de Telecomunicac¸ ˜oes, Lisboa, Portugal

Abstract. In this paper, we propose a new data compression based ECG biomet-

ric method for personal identiﬁcation and authentication. The ECG is an emerg-

ing biometric that does not need liveliness veriﬁcation. There is strong evidence

that ECG signals contain sufﬁcient discriminative information to allow the iden-

tiﬁcation of individuals from a large population. Most approaches rely on ECG

data and the ﬁducia of different parts of the heartbeat waveform. However non-

ﬁducial approaches have proved recently to be also effective, and have the advan-

tage of not relying critically on the accurate extraction of ﬁducia. We propose a

non-ﬁducial method based on the Ziv-Merhav cross parsing algorithm for symbol

sequences (strings). Our method uses a string similarity measure obtained with

a data compression algorithm. We present results on real data, one-lead ECG,

acquired during a concentration task, from 19 healthy individuals, on which our

approach achieves 100% subject identiﬁcation rate and an average equal error

rate of 1.1% on the authentication task.

1 Introduction

Biometrics deals with identiﬁcation of individuals based on their physiological or be-

havioral characteristics [1]. Traditional methods of biometric identiﬁcation, include

those based on physiological characteristics like ﬁngerprints or iris, and those based on

behavioral characteristics like signature or speech. Although some technologies have

gained more acceptance than others, the ﬁeld of biometrics for access control plays an

important role in the security at airports, industry and corporate workplaces, for exam-

ple. But some technologies lack robustness against falsiﬁcation. Some may be based

on such characteristics that for a group of people is difﬁcult to acquire or even that

characteristics is missing.

The electrocardiogram (ECG) is an emerging biometric measure which exploits a

physiological feature that exists on every human and there is a strong evidence that the

ECG is sufﬁciently discriminative to identify individuals from a large population. The

ECG feature allows liveliness detection (intrinsic), personal identiﬁcation and authen-

tication, and different stress or emotion states detection [2]. The ECG is a behavioral

biometric trait that can be used with other biometric measures [3], as a complementary

feature, for fusion in a multimodal physiological authentication system [4, Ch. 18] and

for continuous authentication where biological signatures are continuously monitored

Pereira Coutinho D., L. N. Fred A. and A. T. Figueiredo M.

Personal Identiﬁcation and Authentication based on One-lead ECG using Ziv-Merhav Cross Parsing.

DOI: 10.5220/0003030000150024

In Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems (ICEIS 2010), page

ISBN: 978-989-8425-14-0

(easily done by using new signal acquisition technologies like the Vital Jacket [5], [6])

in order to guarantee the identity of the operator throughout the whole process [7].

Fig.1. Example of four latency times (features) measured from the P, QRS and T complexes of

an ECG heartbeat for ﬁducial-based feature extraction.

A typical ECG signal of a normal heartbeat can be divided into 3 parts, as depicted

in Fig. 1: the P wave (or P complex), which indicates the start and end of the atrial

depolarization of the heart; the QRS complex, which corresponds to the ventricular

depolarization; and, ﬁnally, the T wave (or T complex), which indicates the ventricular

repolarization. It is known that the shape of these complexes differs from person to

person, a fact which has stimulated the use of the ECG as a biometric [8].

In a broad sense, one can say there are two different approaches in the literature

concerning feature extraction from ECG: ﬁducial [8], [9], [10], [11], and non-ﬁducial

[12], [13]. Fiducial methods use points of interest within a single heartbeat waveform,

such as local maxima or minima; these points are used as reference to allow the deﬁ-

nition of latency times, as shown in Fig. 1. Several methods exist that extract different

time and amplitude features, using these reference points. Non-ﬁducial techniques aim

at extracting discriminative information from the ECG waveform without having to lo-

calize ﬁducial points. In this case, a global pattern from several heartbeat waveforms

may be used as a feature. Some methods combine these two different approaches or are

partially ﬁducial [14] (e.g., they use only the R peak as a reference for segmentation of

the heartbeat waveforms).

Biel et al. [8] pioneered the use of the ECG as a biometric for personal identiﬁ-

cation. They used a 12-lead ECG but ended up concluding that one lead was enough

because 12-lead ECG systems require meticulous placement of the electrodes on each

person, which is not practical. Using a proprietary equipment from SIEMENS, 30 ﬁdu-

cial features were extracted; a feature selection algorithm allowed concluding that the

best results were with 10 features. Classiﬁcation was based on the principal component

analysis (PCA) of each class. The purpose was to identify 20 subjects at rest and they

achieved an accuracy of 100%.

Recent studies have shown that non-ﬁducial approaches also allow successful per-

sonal identiﬁcation using the ECG heartbeat signal.

Chiu et al. [13], using a one-lead ECG, introduced a system based on a 3-step fea-

ture extraction method. It uses QRS complex detection (with the So and Chan method

[15]) and waveform alignment in the time domain; the features extracted are based on

the discrete wavelet transform. A nearest neighbor classiﬁer based on the Euclidean dis-

tance between pairs of feature vectors is used. The purpose was to identify 35 subjects

(no activity speciﬁed) from the QT database [16]. The results obtained were: 100% of

accuracy on person identiﬁcation and 0.83% FAR (false acceptance rate) and 0.86%

FRR (false rejection rate) for authentication.

This paper introduces a new non-ﬁducial ECG-biometric method that uses averaged

single heartbeat waveforms and is based on data compression techniques, namely the

Ziv-Merhav cross parsing (ZMCP) algorithm for sequences of symbols. We present

results on real data, using one-lead ECG acquisition during a concentration task. Notice

that a study [2] with the dataset showed the existence of differentiated states in the

data representing the ECG signal of a subject due to detectable changes along the time

in the acquired signal. On a set of 19 healthy individuals, our method achieves 100%

subject identiﬁcation (recognition) rate and an average equal error rate of 1.1% on the

authentication (veriﬁcation) task.

The outline of the paper is as follows. In Section 2, we review the fundamental

tools underlying our approach: Lempel-Ziv string parsing and compression; the Ziv-

Merhav cross parsing algorithm. Section 3 presents the proposed classiﬁcation method.

Experimental results are presented in Section 4, while Section 5 concludes the paper.

2 The Lempel-Ziv and Ziv-Merhav Algorithms

The Lempel-Ziv (LZ) algorithm is a well-known tool for text compression [17], [18],

[19], [20], which in recent years has also been used for classiﬁcation purposes (see [21]

and references therein). In particular, in [21], we have shown how the Ziv-Merhav (ZM)

method for measuring relative entropy [22] (which is based on Lempel-Ziv-type string

parsing) achieves state-of-the-art performance in a speciﬁc text classiﬁcation task. We

will now brieﬂy review these algorithms.

– The incremental LZ parsing algorithm [18], is a self parsing procedure of a se-

quence into c(z) distinct phrases such that each phrase is the shortest sequence that

is not a previously parsed phrase. For example, let n = 11 and z = (01111000110 ),

then the self incremental parsing yields (0, 1, 11, 10, 00, 110), namely, c(z) = 6.

– The ZM (cross parsing) algorithm, a variant of the LZ parsing algorithm, is a se-

quential parsing of a sequence z with respect to another sequence x (cross parsing).

Let c(z|x) denote the number of phrases in z with respect to x. For example, let

z be as above and x = (10010100110); then, parsing z with respect to x yields

(011, 110, 0 0110), that is c(z|x) = 3.

Roughly speaking, we can see c(z) as a measure of the complexity of the sequence

z, while c(z|x), the code-length obtained when coding z using a model for x (cross

parsing), can be seen as an estimate of the cross complexity [23]. It is expectable that

the cross complexity is low when the two sequences are very similar; this is the key

idea behind the use of ZM cross parsing in classiﬁcation [21], which in this paper will

be adopted for ECG-based personal identiﬁcation and authentication.

Fig.2. The original LZ77 algorithm uses a sliding window over the input sequence to update the

dictionary; in our implementation of ZM cross parsing algorithm, the dictionary is static and only

the lookahead buffer (LAB) slides over the input sequence.

An implementation of the ZM cross parsing algorithm as a component of a ZM

method for relative entropy estimation was proposed in [21], based on a modiﬁed LZ77

[17] algorithm, where the dictionary is static and only the lookahead buffer slides over

the input sequence, as shown in Fig. 2 (for more details, see [21]). This very same

implementation of the cross parsing, using a 64 Kbyte dictionary and a 256 byte look

ahead buffer, was used in the experiments reported below.

3 Proposed Methods

To use ZM-based tools for identiﬁcation or authentication, a necessary ﬁrst step is the

conversion of the ECG (discrete-time analog) signal into a sequence of symbols (text).

In this paper, we propose a very simple approach based on quantization. Assuming we

are given a set of single heartbeat waveforms (resulting from a segmentation prepro-

cessing stage), we simply apply 8-bit (256 levels) uniform quantization, thus obtaining

a sequence of symbols (from a 256 symbols alphabet) from each single heartbeat.

Quantizers with fewer bits were considered in early experiments but discarded be-

cause they didn’t perform as well as the 8-bit quantizer. Higher values were not consid-

ered for sake of system implementation simplicity and because of the good performance

obtained with 8 bits.

Consider a collection of training samples partitioned into K classes (the set of sub-

jects to be identiﬁed): X = {X

, X

,..., X

}. For each subject/class k, X

contains n

strings obtained from the same number of heartbeats using the quantization procedure

described in the previous paragraph. A string x

is formed by concatenating the n train-

ing strings of subject k; string x

is, in some sense, a “model” representing the shape

of the heartbeats of subject k.

3.1 Identiﬁcation

Given a test sample z (containing the string representing m heartbeats) obtained from

an unknown subject (assumed to be one from which the training set was obtained), its

identity is estimated as follows:

k(z) = arg min

k∈{1,...,K}

c(z|x

where c(z|x

) is computed by the ZM cross parsing (ZMCP) algorithm, as described

in Section 2. In other words, the test sample is classiﬁed as belonging to the subject that

leads to its shortest description. Although using different tools, this approach is related

in spirit with the minimum incremental coding length (MICL) approach [24].

3.2 Authentication

The authentication (veriﬁcation) procedure depends on a threshold level, which depends

itself from the range of values of c(z|x

). In order to limit its variation to a predeﬁned

set of values, normalization is used. Since in the worst case the description length, re-

sulting from the ZMCP algorithm for the test sample z, is the length of z, the normalized

description length c

(z|x

) is deﬁned as follows:

(z|x

) =

c(z|x

)

len(z)

where len(z) is the number of bytes in test sample z. Notice that c

(z|x

) ∈ [0, 1].

Test sample veriﬁcation is made by comparing the value of c

(z|x

) when using the

claimed identity model with a threshold value ∈ [0, 1], previously set according to a

selected error rate, false acceptance rate (FAR), or false rejection rate (FRR). It decides

for genuine when the comparison result is less or equal to the selected threshold level.

4 Experiments

The architecture of the proposed ECG-based biometric system for person identiﬁcation

and authentication follows the same model proposed by Jain et al in [1]. Fig. 3 shows

the block diagram of the implemented system for the authentication task.

one template

User interface

Feature Extractor Matcher (1 match)

System

Database

Preprocessing Quantization ZMCP Comparator

Name (PIN)

True/False

threshold

Fig.3. Block diagram of the implemented system, for the authentication task, is shown using the

ﬁve main modules of a biometric system, i.e., sensor, preprocessing, feature extraction, matcher

and system database.

4.1 Data Collection

The ECG waveform dataset used was acquired using one lead, in the context of the

Himotion project.

The dataset contains ECG recordings from 19 subjects acquired

during a concentration task on a computer, designed for an average completion time of

10 minutes. All the acquired ECG signals were normalized and band-pass ﬁltered (2–

30Hz) in order to remove noise. Each heartbeat waveform was sequentially segmented

from the full recording and then all the obtained waveforms were aligned by their R

peaks. From the resulting collection of ECG heartbeat waveforms, the mean wave for

groups of 10 consecutive waveforms (without overlap) was computed. Each of these

mean waveforms is what we call a single heartbeat in Section 3.

An intra-class study [2] with the dataset, in the context of the exploration of elec-

trophysiological signals for emotional states detection, showed the existence of differ-

entiated states in the data that represent the ECG signal of a subject. To deal with this

intra-class differences the proposed method includes in the “model” (as mentioned in

Section 3) single heartbeats randomly selected from the whole ECG signal sample.

The reported results are averages over 50 runs. In each run, we partition the set of

heartbeats of each subject into two mutually exclusive subsets: one of these subsets is

used to form the training data set X = {X

, X

,..., X

}, while the other is used to build

the test waveforms. We consider several values for n (the length of the “model” strings)

as well as for m (the length of the test waveforms).

4.2 Identiﬁcation Results

The results for the identiﬁcation experiment, which are depicted in Fig. 4, show that the

proposed method achieves 100% accuracy for m = 12 and n = 13 or n = 20. This

is better than the results reported in [11] over the same dataset. The approaches in [8],

[9], [13], were not tested on the same dataset, so the results are not directly comparable.

Notice that using only m = 5 waveforms for the test patterns, we already reach an

accuracy around 99.5%. As expected, the accuracy increases both with n and m.

https://www.it.pt/auto temp web page preview.asp?id=305

0 2 4 6 8 10 12 14

Number of test waveforms

Error [%]

10 waveforms model

13 waveforms model

20 waveforms model

Fig.4. Mean recognition error and standard deviation intervals for subject identiﬁcation when

considering a variable number of waveforms as test samples.

4.3 Authentication Results

Regarding veriﬁcation (authentication) three different test were made. The ﬁrst test fol-

lows the model shown in Fig. 3. The results, which are depicted in Fig. 5 (a), show

that the proposed method achieves an overall equal error rate EER ≈ 6%. Notice that

one can lower the error rate using lower threshold values but then the system will reject

more legitmate users. However, it is possible to use lower threshold values if we use a

different value for each subject (user-tuned thresholds).

The second test also follows the model shownin Fig. 3 but now the threshold is user-

tuned. An equal error rate (EER) was computed for each subject and then an average

EER is reported. The test results presented in Table 1 show that the proposed method

outperforms ﬁducial approaches results reported in [25] and [26], over the same dataset.

Table 1. Comparison of veriﬁcation related work results with our method, over the same dataset.

Reference Feature EER

Oliveira and Fred [26] Fiducial (1-NN classiﬁer) 8.0 %

Gamboa [25] Fiducial (user tuned) 1.7 %

Proposed method Non-ﬁducial (user tuned) 1.1 %

On the last veriﬁcation test, we evaluate the combination of multiple source acqui-

sition signals, classiﬁed by a bank of classiﬁers with the same structure of the ﬁrst test,

shown in Fig. 3, and a ﬁnal decision made according to the majority voting criterion.

Given a test sample (of length m = 12), it was decomposed in 64 different ways into

samples of length m = 6 which were classiﬁed by a bank of 64 classiﬁers using the

same threshold level and the same database. The results in Fig. 5 (b) show that this

multiple classiﬁer strategy doesn’t improve the performance.

Fig.5. ROC curves for the veriﬁcation task (the solid straight line has slope 1, for reference pur-

poses). Left plot: results for different n and m values; notice the improvement with the increase

of m and n. The best equal error rate (EER) is close to 6%. Right plot: results for single classi-

ﬁers versus a bank of 64 classiﬁers with the same structure for combination of multiple source

acquisition signals.

5 Summary and Conclusions

We have presented a method for personal identiﬁcation and authentication from one-

lead ECG signals which involves no explicit feature extraction other than 8 bit uniform

quantization of the waveforms. The classiﬁer is based on the Ziv-Merhav cross pars-

ing (ZMCP) algorithm, which is an estimator of the algorithmic cross-complexity [23],

used to measure the similarity between the model waveforms and the test waveforms.

Experiments carried out on a dataset with 19 healthy subjects, for whom the existence

of differentiated states in the ECG data of a subject has been shown [2], showed that

our method achieves100% accuracy in recognition (identiﬁcation) and an average equal

error rate close to 1.1% in veriﬁcation (authentication) tasks. Although further experi-

ments, on other datasets, are needed to assess the relative performance of the proposed

method, with respect to other state-of-the-art techniques, these results demonstrated the

validity of our approach as a tool for personal identiﬁcation and authentication, and

of the ECG signal as a viable biometric. Future work will include tests with the Max-

Lloyd quantizer and further evaluation of our method when used in an adaptive way for

authentication purposes with continuous biometrics systems [7].

Acknowledgements

We acknowledge the following ﬁnancial support: the FET programme, within the EU

FP7, under the SIMBAD project (contract 213250);Fundac¸˜ao para a Ciˆencia e Tecnolo-

gia (FCT), under grants PTDC/EEA-TEL/72572/2006 and QREN 3475; Departamento

de Engenharia de Electr´onica e Telecomunicac¸˜oes e de Computadores, Instituto Supe-

rior de Engenharia de Lisboa.

References

1. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans.

Circuits Syst. Video Techn. 14 (2004) 4–20

2. Medina, L.A.S., Fred, A.L.N.: Genetic algorithm for clustering temporal data - application

to the detection of stress from ecg signals. In: Proc 2nd International Conference on Agents

and Artiﬁcial Intelligence (ICAART). (2010) 135–142

3. Ross, A.A., Nandakumar, K., Jain, A.K.: Handbook of Multibiometrics (International Series

on Biometrics). Springer-Verlag New York, Inc., Secaucus, NJ, USA (2006)

4. N. V. Boulgouris (Editor), Konstantinos N. Plataniotis (Editor), E.M.T.E.: Biometrics: The-

ory, Methods, and Applications. Wiley-IEEE Press (2009)

5. Cunha, J., Cunha, B., Xavier, W., Ferreira, N., Pereira, A.: Vital-jacket: A wearable wireless

vital signs monitor for patients mobility. In: Proceedings of the Avantex Symposium. (2007)

6. Leonov, V.: Wireless body-powered electrocardiography shirt. In: Proceedings of the Smart

Systems Integration European Conference. (2009)

7. Damousis, I.G., Tzovaras, D., Bekiaris, E.: Unobtrusive multimodal biometric authentica-

tion: the humabio project concept. EURASIP J. Adv. Signal Process 2008 (2008) 1–11

8. Biel, L., Pettersson, O., Philipson, L., Wide, P.: ECG analysis – a new approach in human

identiﬁcation. IEEE Transactions on Instrumentation and Measurement 50 (2001) 808–812

9. Shen, T., Tompkins, W.J., Hu, Y.H.: One-lead ECG for identity veriﬁcation. Proc. of the 2nd

Joint Conf. of the IEEE Eng. in Medicine and Biology Soc. and the 24th Annual Conf. and

the Annual Fall Meeting of the Biomedical Eng. Soc. (EMBS/BMES ’02) 1 (October 2002)

62–63

10. Israel, S.A., Irvine, J.M., Cheng, A., Wiederhold, M.D., Wiederhold, B.K.: ECG to identify

individuals. Pattern Recognition 38 (2005) 133–142

11. Silva, H., Gamboa, H., Fred, A.: One lead ECG based personal identiﬁcation with feature

subspace ensembles. In: MLDM ’07: Proceedings of the 5th international conference on

Machine Learning and Data Mining in Pattern Recognition, Berlin, Heidelberg, Springer-

Verlag (2007) 770–783

12. Chan, A., Hamdy, M., Badre, A., Badee, V.: Wavelet distance measure for person identiﬁca-

tion using electrocardiograms. Instrumentation and Measurement, IEEE Transactions on 57

(2008) 248–253

13. Chiu, C.C., Chuang, C.M., Hsu, C.Y.: A novel personal identity veriﬁcation approach using

a discrete wavelet transform of the ECG signal. In: MUE ’08: Proceedings of the 2008 In-

ternational Conference on Multimedia and Ubiquitous Engineering, Washington, DC, USA,

IEEE Computer Society (2008) 201–206

14. Wang, Y., Agraﬁoti, F., Hatzinakos, D., Plataniotis, K.N.: Analysis of human electrocardio-

gram for biometric recognition. EURASIP J. Adv. Signal Process 2008 (2008) 19

15. So, H., K.L.Chan: Development of qrs detection method for real-time ambulatory cardiac

monitor. IEEE Engineering in Medicine and Biology Society. (1997) 289–292

16. Laguna, P., Mark, R., Goldberg, A., Moody, G.: A database for evaluation of algorithms for

measurement of qt and other waveform intervals in the ECG. In: Computers in Cardiology

1997. (1997) 673–676

17. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transac-

tions on Information Theory 23 (1977) 337–343

18. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE

Transactions on Information Theory 24 (1978) 530–536

19. Nelson, M., Gailly, J.: The Data Compression Book. M&T Books, New York (1995)

20. Salomon, D.: Data Compression. The complete reference. 3rd edition edn. Springer, New

York (2004)

21. Pereira Coutinho, D., Figueiredo, M.: Information theoretic text classiﬁcation using the

Ziv-Merhav method. 2nd Iberian Conference on Pattern Recognition and Image Analysis –

IbPRIA’2005 (2005)

22. Ziv, J., Merhav, N.: A measure of relative entropy between individual sequences with ap-

plication to universal classiﬁcation. IEEE Transactions on Information Theory 39 (1993)

1270–1279

23. Cerra, D., Datcu, M.: Algorithmic cross-complexity and relative complexity. In: DCC ’09:

Proceedings of the 2009 Data Compression Conference, Washington, DC, USA, IEEE Com-

puter Society (2009) 342–351

24. Wright, J., Ma, Y., Tao, Y., Lin, Z., Shum, H.: Classiﬁcation via minimum incremental

coding length. SIAM Journal on Imaging Sciences 2 (2009) 367–395

25. Gamboa, H.: Multi-Modal Behavioral Biometrics Based on HCI and Electrophysiology. PhD

thesis, Instituto Superior T´ecnico, Universidade T´ecnica de Lisboa, Lisboa, Portugal (2008)

26. Oliveira, C., Fred, A.L.N.: ECG-based authentication: Bayesian vs. nearest neighbour clas-

siﬁers. In: Proc International Conf. on Bio-inspired Systems and Signal Processing - Biosig-

nals - INSTICC, Porto, Portugal (2009)