X-CleLo: Intelligent Deterministic RFID Data

Transformer

Peter Darcy, Bela Stantic and Abdul Sattar

Institute for Integrated and Intelligent Systems, Grifﬁth University, Queensland, Australia

Abstract. Recently, passive Radio Frequency Identiﬁcation (RFID) systems have

received an exponential amount of attention as researchers have worked tirelessly

to implement a stable and reliable system. Unfortunately, despite vast improve-

ments in the quality of RFID technology, a signiﬁcant amount of erroneous data is

still captured in the system. Currently, the problems associated with RFID have

been addressed by cleaning algorithms to enhance the data quality. In this pa-

per, we present X-CleLo, a means to intelligently clean and enhance the dirty

data using Clausal Defeasible Logic. The extensive experimental study we have

conducted has shown that the X-CleLo method has several advantages against a

currently utilised cleaning technique and achieves a higher cleaning rate.

1 Introduction

RFID is technology which uses radio communication between tags and readers to au-

tomatically identify the locations of items. The architecture itself has the potential to

simplify the processes currently involved in capturing data and, eventually, replace the

barcode system employed in various distribution retailers. Despite signiﬁcant advances

in technology, there are still problems surrounding the implementation of the system.

These problems include: the data being at low level; the large amount of incorrect read-

ings being recorded; the exponential volume of data continually stored; and the complex

temporal and spatial aspects of the data.

Current methods employed to enhance RFID data include using deterministic rules

to improve the integrity of the observations. While this method does provide a means

of enhancing the RFID data’s accuracy, there are fatal ﬂaws within the said method

that undermine the integrity of the resulting data sets. These ﬂaws result from the rule

based approach lacking a higher level of intelligence needed when faced with an am-

biguous situation which can result in no detection of various anomalies or the creation

of artiﬁcial anomalies.

A common solution put forward within the data warehouse community is that dirty

data is extracted from the data warehouse, transformed to comply with a set of rules

determined by the user, and loaded back into the data warehouse. The rules employed

in the transformation stage may, unfortunately, be undermined due to an ambiguous

situation. A method to cope with ambiguous situations is to use Clausal Defeasible

Logic to ﬁnd the correct solution when given several facts. Clausal Defeasible Logic

is a theory of Non-Monotonic Reasoning that has been devised to seek out the correct

solution when having passed several different inputs using various levels of ambiguity.

Darcy P., Stantic B. and Sattar A.

X-CleLo: Intelligent Deterministic RFID Data Transformer.

DOI: 10.5220/0003015700030012

In Proceedings of the 4th International Workshop on RFID Technology - Concepts, Applications, Challenges (ICEIS 2010), page

ISBN: 978-989-8425-11-9

In this work, we propose X-CleLo (eXtra-CLEan-LOad), an intelligent and deter-

ministic method to enhance the integrity of the RFID observations. X-CleLo has been

designed to clean the records intelligently in order to transform the said data into highly

accurate observations worthy of being used in a commercial environment. Through an

experimental study on a simulated RFID hospital scenario, we have demonstrated how

X-CleLo houses distinct advantages that allows it to perform better than the currently

employed rule-based approach.

The remainder of this paper is organised as follows: Section 2 will discuss the fun-

damental concepts needed to understand the mechanics of X-CleLo; Section 3 reviews

some of the current state-of-the-art related work used to achieve clean RFID data; Sec-

tion 4 describes the architecture of X-CleLo and the scenario considered in experiment;

Section 5 discusses the experimental evaluation we have run on X-CleLo; In Section 6,

we analyse the experiments performed using X-CleLo and in Section 7, we conclude

this study as well as provide some directions for future work.

2 Background

Acceptance and implementation of RFID systems is still hindered due to the complexi-

ties associated with integrating the technology into the environment. Due to the nature

of the capturing process, certain anomalies such as dirty data prevent attempts to create

a ﬂawless RFID system which would usually be handled by data correction methodolo-

gies.

2.1 RFID

Radio Frequency Identiﬁcation (RFID) is technology created to identify automatically

a group of individual items. The RFID architecture is comprised of three main com-

ponents: the tag, the reader, and the middleware used to interpret the tag data found

within the proximity of readers. RFID systems have been integrated into various ap-

plications, for example, Supply-Chain situations [1] and Hospital Patient Monitoring

[2]. The advent of RFID technology has also ushered in a promise of simpliﬁcation of

time-costly tasks. However, wide-scale deployment appears to have been stalled due to

the problems in the implementation phase.

These problems are most prevalent within RFID applications which use the passive

tags (tags that do not contain power source). Passive tags, however, have several advan-

tages such as an affordable price, ease of integration into applications and, theoretically,

they will last forever. Four aspects of RFID systems have been identiﬁed as problem-

atic: low level representations of the observations, error-prone data, high information

collection volumes and complex spatial and temporal aspects of the data, [3], [1] and

[4].

Data which is captured by the readers has been found to be erroneous in providing

False Negative Readings (readings which should have been recorded but were not),

False Positive Readings (readings which were not supposed to be recorded, but they

were) and Duplicate Readings (readings in which two identical records exist where

only one should). It has been estimated that only 60%-70% of tags are read when an

RFID scan is performed [5], [6].

2.2 Clausal Defeasible Logic

Clausal Defeasible Logic is a type of Non-Monotonic Reasoning that has been devel-

oped to obtain a conclusion after given certain observations and rules [7]. For example,

in Figure 1, it has been shown that mammals do not usually lay eggs. There is an excep-

tion to this rule however, as certain creatures, such as the platypus, do lay eggs and are

still mammals [8]. It is important to note the relation represented as a semi circle giving

priority to the most clockwise entity, in this case the platypus rule, in this example. This

relation is known as a “Priority Relation”; in a typical scenario, priority is given to the

rule which is more speciﬁc and will always reach the conclusion before other rules.

Fig. 1. The Clausal Defeasible Logic map of Platypus and Mammals laying eggs.

From the rules shown in Figure 1, the Clausal Defeasible Reasoning Engine can

ascertain that platypus are mammals and, therefore, must not lay eggs. However, since

the second rule which states that for the platypus, the negative of “don’t lay eggs” is

defeasible, it may be stated that the platypus does lay eggs. Although it is true that one

conclusion must be drawn for any given situation, Clausal Defeasible Logic, has several

levels of conﬁdence represented in formulae that can be used to obtain a different correct

answer dependent on the amount of ambiguity allowed [8]. These formulae include:

– µ: This formula uses only certain information to obtain its conclusion.

– π: This formula allows conclusions in which ambiguity is propagated.

– β: This formula does not allow any ambiguity to be used to obtain its conclusion.

3 Related Work

There have been many different methods proposed in literature pertaining to how to

deal with RFID’s problems. General methodologies used to correct dirty data include

conditional functional dependencies [9] and data repairing via updates [10]. Although

these methodologies clean the majority of anomalies in the average data set, they would

struggle to clean the unique characteristics of dirty RFID observations. Some clean-

ing methods which have been speciﬁcally tailored to correct anomalous data in RFID

applications include ﬁltration algorithms [6], probabilistic approximations [11], [12],

a sampling-based approach [13] and a deferred cleansing rule-based method [14]. We

have chosen to investigate the “Deferred Cleansing Method” as it cleanses data at a

deferred stage of the capturing cycle.

3.1 Deferred Cleansing

The “Deferred Cleansing Method” is a method which uses a rewrite engine to correct

anomalies within the database after the capture process [14]. This is done by the system

referring to several rules deﬁned by the user which can be applied in a speciﬁc order in

an attempt to correct the data. An example rule is that, if there are two readings of the

same tag within one minute at the same location, the system will treat the second tag as

a duplicate entry and promptly proceed to eliminate it.

We have found a fundamental ﬂaw, however, in that the rule cleaning process does

not take into consideration ambiguous situations. In the event that one or more rule(s)

conﬂict, the order that the rules have been implemented dictates how the data set will be

cleaned. An example of this ﬂaw would be if there is a rule which states that a second

tag should be deleted. If it is detected at two different reader locations far apart, then

the rule will move on to delete one. However, in reality, the situation could arise that it

is the second case which reﬂects the true location of the tag and that the ﬁrst itself is the

wrong data.

4 Extra-Clean-Load (X-CleLo)

In this section we present the core of our method for enhancing RFID data, eXtra-

CLEan-LOad (X-CleLo) which was designed speciﬁcally to emphasise the cleaning of

data intelligently. The particular information we will focus on providing in this section

includes a detailed analysis of X-CleLo’s architecture, the Logical Engine structures

and assumptions we have made while performing experimentation. With regards to the

actual storing of the RFID readings, we have closely followed database structure intro-

duced by [15] called the Data Model for RFID Applications (DMRA).

4.1 X-CleLo Structure

As seen in Figure 2, X-CleLo is divided up into three main sections - the Extraction

processes, the Transformation processes and the Loading process. To accomplish this,

it requires three sets of input information to properly function. These inputs include the

readings, the referential data and the user’s input. The data is the recorded readings of

RFID tags. The RFID data consists of the unique tag identiﬁer, the timestamp the tag

was recorded at and the reader identiﬁer.

The referential data set contains all related data which will be used later to trans-

form the low level readings into high level events. Additionally, this data will include a

MapData table possessing a list of readers which are geographically within proximity.

These information sets may include the location of the reader, the items which the tag

is attached to etc. The User is the person who inputs various event rules into the system

through use of the Non-Monotonic Reasoning Engine.

After the input information has been received, X-CleLo runs the “Extract” phase

which consists of two information-processing function, the Data Extractor and the Rule

Extractor. The Data Extractor is the process of accepting the “RFID Readings” and the

“RFID Referential Data” and mining other tables for information related to the data.

Fig. 2. A high level diagram to represent the processes and data ﬂow of X-CleLo.

The Rule extractor receives the logic engine setup devised by the “User” and forwards

it onto the Non-Monotonic Reasoning Engine.

The next phase which occurs in X-CleLo is the “Transformation” phase in which

the information is then turned into higher, more meaningful data. The processes used

to accomplish this include the Data Cleanser and the Non-Monotonic Reasoning En-

gine. The Data cleansing process consists of an algorithm which prepares the data for

the next phase. It uses the Non-Monotonic Engine to ﬁnd the correct course of action

of cleaning when ambiguous situations arise. It employs a set of logic in the form of

Clausal Defeasible Logic.

The speciﬁc data to be cleaned is taken from the “Data Extractor” and, speciﬁcally,

is the “RFID Readings.” The Non-Monotonic Reasoning Engine houses the Clausal De-

feasible Logic program discussed in Section 2.2 which is used to determine the correct

course of action when given information. The rules are directly stated by the user. The

corrected data is then passed to the Data Loader process in the “Load” phase which

will then load the corrected information into the database.

4.2 Logic Engine Structure

X-CleLo relies on Clausal Defeasible Logic Engines to determine the correct outcome

when given observational inputs. We have set up two logic engines to deal with missing

and wrong values from the RFID readings.

The ﬁrst logic engine depicted in Figure 3 is the Wrong Value Logic Map. It is de-

vised to detect and remove any readings that have been found to be recorded incorrectly

when analysing other observations. The engine will then check if there are no readers

that can ﬁll in the extra value but still be geographically correct according to the Map-

Data table. If it cannot ﬁnd any reader to ﬁll the gap left by the extra value, the program

will promptly delete it. Before it deletes the said value, it will check to see if there are

any combinations outside the temporally closest readings.

The second engine shown in Figure 4 is the Missing Value Logic Map which has

been designed to be used when cleaning the RFID readings and makes use of the Map-

Data table. It operates by examining if the missed value (detected due to the assumption

that readers scan every minute) is between two values that, if, when an additional read-

Fig. 3. The Logic Map created with the intent of detecting and deleting wrong values.

Fig. 4. The Logic Map created with the intent of detecting and inserting missing values.

ing is inserted, would be next to each other geographically according to the MapData

table.

If the case is that the inserted value would geographically make the observation

sound, it will conclude that it needs to insert the value. It then searches to see if the

readers around the missed value are temporally next to an entry/exit reader and, if so,

it will conclude that the value will not be inserted. The logic engine then checks if the

value is between two of the same readers and, if so, concludes to insert the value, and

ﬁnally, if the value is temporally between two other missed values, it will conclude that

the value will not be inserted.

5 Experimental Evaluation

As mentioned earlier, X-CleLo makes use of the DMRA to store all relevant RFID data.

The tables found in DMRA and the procedural language PL/SQL were implemented

on Oracle 10g RDBMS. The Clausal Defeasible Non-Monotonic Reasoning Engine is

written in the C language, implemented on Cygwin Version 1.5.24-2. In the absence

of real data, we generated data to simulate a hospital scenario based on real world

applications [2]. The type of simulated data includes a unique tag id, timestamp as to

when the reading occurred, and the identiﬁer of the reader. To extensively experiment X-

CleLo, we considered four data sets changing the number of deletions in the observation

table. We have devised an algorithm to examine each record and to delete randomly the

record based on a deletion chance percentage threshold system to simulate a real RFID

situation with missing data. Due to the previously mentioned statement that only 60%-

70% RFID tags are captured, the four data sets have been given a different deletion

percentage, namely 20%, 30%, 40% and 50%.

6 Results and Analysis

For the experimental study of the effectiveness of X-CleLo, we ran two experiments

to compare X-CleLo to a naive rule based method. The naive rule based algorithm

uses rules in the order used by X-CleLo. However, it lacks defeasible logic to ascertain

high level intelligence when cleaning. Through this experiment, we intend to identify

X-CleLo’s intelligence cleaning success while compared to the rule based algorithms

such as the rules used in the Deferred Cleansing method.

Fig. 5. The results of cleaning the Observation table with wrong data present. Bars are divided up

into the amount of false positive and false negatives for each of the Rule Based and the Clausal

Defeasible Formulae (µ π and β).

6.1 Wrong Data Results

The ﬁrst experiment tests X-CleLo against the Rule Based Algorithm when faced with

the situation of having to clean false positive errors. In this experiment, a number of ran-

domly artiﬁcially produced false positive readings are introduced into the Observation

table. X-CleLo and the Rule Based Algorithm are then subjected to the dirty data set to

attempt to clean it and restore its integrity as best they can by deleting the false positive

readings. In this experimentation, we term false positives as any readings which were

not present within the original data set, and false negatives as any data which has been

deleted that was present within the original data set.

The graph of the results may be seen in Figure 5 in which the amount of correct

deletions and the amount of false positives still present are recorded for each algorithm.

It is important to note that the least performing algorithms in both false positives and

negatives were the µ Clausal Defeasible formula and the Rule Based which had the

highest false positive anomalies and false negative anomalies respectively. Both the π

and β Clausal Defeasible formulae proved to have the smallest amount of false positive

and false negative anomalies obtaining the lowest false negatives and positives each.

6.2 Missing Data Results

The second experiment was to evaluate the performance of X-CleLo against the Rule

Based Algorithm when faced with the situation of having to clean missed readings.

In this experiment, four data sets were generated each with the complete scenario but

having different percentages of original values still present. The missed values are sim-

ulated in the data sets by choosing a random number between 1% and 100%. If the

random number is below the speciﬁed threshold percentage unique to each data set, the

reading will be deleted. The thresholds used are based upon the statement that only 60%

to 70% of RFID readings are recorded. We have set the threshold to be 50%, 60%, 70%

and 80% which results in approximately 50%, 40%, 30% and 20% of the data being

deleted respectively. The X-CleLo and Rule Based Algorithms are then employed to

seek out missed readings and infer them using their respected methods. From analysing

the accuracy of the data sets after performing the cleaning in the graph in Figure 6, we

found that the rule based algorithm and X-CleLo µ performed the least successfully

while the X-CleLo π and X-CleLo β formulae provided the most thorough clean. As a

general observation, it appears that in every test the highest accuracy is achieved with

lower missing values.

Fig. 6. The results of cleaning the Observation table with missing data present. Results are divided

into the performance of the Rule Based and Clausal Defeasible Formulae (µ π and β) when

attempting to clean a database with 50%, 40%, 30% and 20% of the data missing.

6.3 Analysis

From the above results, it may be detected that there is a common element in all the

tests performed, X-CleLo’s π and β formulae provided the greatest accuracy followed

by either the X-CleLo’s µ formula or the algorithm used for comparison (rule based or

probabilistic). The reason why the π and β formulae have achieved a higher integrity

level than that of the µ formula may be explained in that the µ formula will only allow

strict rules to direct the answer to a conclusion. The logic engines used in this experi-

ment also contributed to the µ formula not functioning. If a strict rule had been in place

rather than plausible rules, the µ formula would have sought an answer.

We have found two core strengths while using X-CleLo as a cleaning method. The

ﬁrst is that we are able to obtain a higher level of accuracy when compared with other

techniques which state-of-the-art methodologies employ. We accomplish this by pro-

viding the methodology in which rules are stated by the user and employed on the data

set with high level intelligence in the form of Non-Monotonic Reasoning. The second

strength is that the utilisation of a deterministic cleaning approach is a way of ensuring

that the lowest amount of false positives are introduced into the data set via cleaning.

This has been reﬂected within our ﬁrst experiment in which there are a large amount of

false negatives introduced when the rule based approach is used. We counter the effects

of introducing false negative readings by applying Non-Monotonic Reasoning to justify

when and where it is unacceptable to delete a value that appears ambiguously placed.

7 Conclusions

In this paper we addressed the issues related to the problems associated with errors in

RFID data. Speciﬁcally, this study makes the following contributions to the ﬁeld:

– We have proposed X-CleLo, a deterministic method that emphasises the cleaning

of the data set intelligently by incorporating Clausal Defeasible Logic.

– We have demonstrated that X-CleLo can be effectively used to clean stored RFID

data.

– We have identiﬁed that the π and β Clausal Defeasible Logic formulae achieved

the highest accuracy when attempting to clean the erroneous data.

– In the experimental study, we compared the performance of X-CleLo to the state-of-

the-art Rule-Based cleaning system. Experimental results have shown that X-CleLo

obtained a higher cleaning rate.

The proposed concept is not necessarily limited to the RFID applications and may

be applied as a base principle in resolving ambiguous data anomalies, speciﬁcally, large

databases that have spatio-temporal recordings with false-positive and false-negative

data anomalies. In terms of future work, we have identiﬁed two enhancements to im-

prove the accuracy of X-CleLo. The ﬁrst is to develop a method which will allow vari-

ous rule manipulations with the logic engines in order to calibrate X-CleLo. The second

is to devise increasingly complex logic maps, which will be able to correct complex am-

biguous observations for scenarios outside the scope of our experimental evaluation.

Acknowledgements

This research is partly supported by ARC (Australian Research Council) grant number

DP0557303.

References

1. Derakhshan, R., Orlowska, M. E., Li, X.: RFID Data Management: Challenges and Oppor-

tunities. In: RFID 2007. (2007) 175 – 182

2. Swedberg, C.: Hospital Uses RFID for Surgical Patients. RFID Journal (2005) Available

from: http://www.rﬁdjournal.com/article/articleview/1714/1/1/.

3. Cocci, R., Tran, T., Diao, Y., Shenoy, P. J.: Efﬁcient Data Interpretation and Compression

over RFID Streams. In: ICDE, IEEE (2008) 1445–1447

4. Wang, F., Liu, P.: Temporal Management of RFID Data. In: VLDB. (2005) 1128–1139

5. Floerkemeier, C., Lampe, M.: Issues with RFID usage in ubiquitous computing applications.

In Ferscha, A., Mattern, F., eds.: Pervasive Computing: Second International Conference,

PERVASIVE 2004. Number 3001, Linz/Vienna, Austria, Springer-Verlag (2004) 188–193

6. Jeffery, S. ., Garofalakis, M. N., Franklin, M. J.: Adaptive Cleaning for RFID Data Streams.

In: VLDB. (2006) 163–174

7. Billington, D.: Propositional Clausal Defeasible Logic. In: European Conference on Logics

in Artiﬁcial Intelligence (JELIA). (2008) 34–47

8. Billington, D.: An Introduction to Clausal Defeasible Logic [online]. David Billington’s

Home Page (2007) Available from: http://www.cit.gu.edu.au/∼db/research.pdf.

9. Golab, L., Karloff, H., Korn, F., Srivastava, D., Yu, B.: On Generating Near-Optimal

Tableaux for Conditional Functional Dependencies. VLDB Endow. 1 (2008) 376–390

10. Wijsen, J.: Database repairing using updates. ACM Trans. Database Syst. 30 (2005) 722–768

11. R

e, C., Letchner, J., Balazinksa, M., Suciu, D.: Event Queries on Correlated Probabilistic

Streams. In: SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international confer-

ence on Management of data, New York, NY, USA, ACM (2008) 715–728

12. Tran, T., Sutton, C., Cocci, R., Nie, Y., Diao, Y., Shenoy, P.: Probabilistic Inference over

RFID Streams in Mobile Environments. In: ICDE ’09: Proceedings of the 2009 IEEE Inter-

national Conference on Data Engineering, Washington, DC, USA, IEEE Computer Society

(2009) 1096–1107

13. Xie, J., Yang, J., Chen, Y., Wang, H., Yu, P. S.: A Sampling-Based Approach to Information

Recovery. In: ICDE ’08: Proceedings of the 2008 IEEE 24th International Conference on

Data Engineering, Washington, DC, USA, IEEE Computer Society (2008) 476–485

14. Rao, J., Doraiswamy, S., Thakkar, H., Colby, L. S.: A Deferred Cleansing Method for RFID

Data Analytics. In: VLDB. (2006) 175–186

15. Wang, F., Liu, S., Liu, P.: A temporal RFID data model for querying physical objects. Per-

vasive and Mobile Computing In Press, Corrected Proof (2009)