Clothes Change Detection Using the Kinect Sensor

Dimitrios Sgouropoulos, Theodoros Giannakopoulos, Sergios Petridis,

Stavros Perantonis and Antonis Korakis

Computational Intelligence Laboratory (CIL), Institute of Informatics and Telecommunications, National Center for

Scientiﬁc Research DEMOKRITOS, Patriarchou Grigoriou & Neapoleos, Ag. Paraskevi 15310, Athens, Greece

Keywords:

Kinect, Clothes Change Detection, Activities of Daily Living.

Abstract:

This paper describes a methodology for detecting when a human has changed clothes. Changing clothes is

a basic activity of daily living which makes the methodology valuable for tracking the functional status of

elderly people, in the context of a non-contract unobtrusive monitoring system. Our approach uses Kinect and

the OpenNI SDK, along with a workﬂow of basic image analysis steps. Evaluation has been conducted on a

set of real recordings under various illumination conditions, which is publicly available along with the source

code of the proposed system at http://users.iit.demokritos.gr/ tyianak/ClothesCode.html.

1 INTRODUCTION

The elderly population is constantly growing during

the last decades and is expected to grow dramatically

over the next few years, especially in Europe. This

increase has made elderly care a rapidly growing task

and in particular it has led to major research effort

on implementing automatic assistive services for the

elderly, in order to facilitate independent living. In

this work, we employ image analysis techniques ap-

plied on data recorded from the Kinect sensor (Kin,

2011)(Zhang, 2012), in order to detect that a human

has changed clothes between two successive record-

ing sessions. The purpose of such a service is to mea-

sure the functional status of a person in the context

of in-home unobtrusive health monitoring. The abil-

ity to change clothes is an important self-care tasks

taken into consideration by health professionals when

monitoring a patient, especially for the case of people

with disabilities or the elderly. Such functional activi-

ties are referred to as Activities of daily living (ALDs)

(Self-maintenance, 1969) and include self-care tasks

such as: bathing, personal hygiene, toilet hygiene and

eating (Collin et al., 1988)(Collin and Wade, 1988)

Automatically recognizing ADLs has gained re-

search interest during the last years. This is usually

achieved through sensors such as accelerometers, Ra-

dio Frequency Identiﬁcation (RFIDs), microphones

and cameras (Fleury et al., 2010)(Stikic et al., 2008).

In (Fleury et al., 2010) a multi-class Support Vec-

tor Machine (SVM) has been employed in order to

recognize among 7 ADLs, based on several sensors:

infra-red presence sensor, wearable kinematic sen-

sors, microphones and others. For the particular case

of the dressing/undressing activity, the overall classi-

ﬁcation accuracy was found equal to 75%, while the

maximum confusion was observed for the “resting”

and “sleeping” activities. Instead of recognizing the

(un)dressing activity among other classes of events,

in this work we focus on simply answering the binary

question: “has the person changed clothes between

two successive recordings?”. The task then simpliﬁes

to (a) detect the clothes worn by the person (b) model

the clothes and (c) measure the similarity of clothes

detected between two recordings.

Automatically recognizing apparel using visual

information can have a wide range of potential ap-

plications: surveillance, e-commerce, household au-

tomated services, etc. Depending on the ﬁeld of ap-

plication and the approach of information acquisition

the respective methods has been either applied on sin-

gle color images (e.g., (Chen et al., 2012)(Liu et al.,

2012b)) or combination of color and depth images

(e.g., (Maitin-Shepard et al., 2010)). In the context

of shopping recommendation and customer proﬁling,

some papers have proposed adopting visual analy-

sis methods to describe clothing appearance with se-

mantic attributes. (Chen et al., 2012) proposes us-

ing SIFT descriptors and SVMs to predict 26 pre-

deﬁned attributes concerning clothing patterns, col-

ors, gender as well as general clothing categories (e.g.

shirts). (Liu et al., 2012b) describes a method for

Sgouropoulos D., Giannakopoulos T., Petridis S., Perantonis S. and Korakis A..

Clothes Change Detection Using the Kinect Sensor.

DOI: 10.5220/0005001200850089

In Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications (SIGMAP-2014), pages 85-89

ISBN: 978-989-758-046-8

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

clothing retrieval using a daily human photo captured

in a general environment (e.g. street). In (Liu et al.,

2012a), an automatic occasion-oriented (e.g. wed-

ding) clothing recommendation system is presented.

Towards this end, Histograms of Oriented Gradient

(HOG) and color histograms have been adopted as

features, while SVMs have been used as classiﬁers.

In a similar context, (Bossard et al., 2013) introduces

a pipeline for recognizing and classifying people’s

clothing in natural scenes. Among others, HOGs and

color histograms are used as features, while classiﬁ-

cation is achieved via Random Forests. (Kalantidis

et al., 2013) presents a visual analysis method that

suggests clothing results given a single image.

Visual-based clothing classiﬁcation has also been

used in household service robotic applications, i.e.

in automated laundry. In (Willimon et al., 2011), a

robotic system which identiﬁes and extracts items se-

quentially from a pile using only visual sensors is de-

scribed. Classiﬁcation of each clothing item is con-

ducted based on a six-class hierarchy (pants, shorts,

short-sleeve shirt, long-sleeve shirt, socks, or under-

wear). Depth information is also used based on a

stereo pair of cameras. In (Maitin-Shepard et al.,

2010) an application of robotic towel folding is pre-

sented, where image (both color and depth) analysis is

adopted to detect corners that can be used for grasping

the towel. (Ramisa et al., 2012) also presents a grasp-

ing point detection method using color and depth

information from a Kinect device. This work fo-

cuses on identifying the grasping points in one single

step, even when clothes are highly wrinkled, there-

fore avoiding multiple re-graspings. (Willimon et al.,

2013) focuses on deﬁning mid-level features in or-

der to boost the clothing classiﬁcation performance.

Again, the task here is to classify clothes from a pile

of laundry (three categories have been used: shirts,

socks and dresses).

2 METHODOLOGY

2.1 Clothes Detection

The Kinect Sensor provides RGB, depth data and

skeletal tracking information, i.e 3D coordinates of

tracked body skeletons ( (Xia et al., 2011)(Shotton

et al., 2013)). In particular, we have employed the

OpenNI SDK (http://www.openni.org/) in order to

identify the positions of these key joints on the hu-

man body (hands, elbows, head, etc), along with other

human position information such as orientation esti-

mates and distance from the sensor. The ﬁrst step is

to specify the bounds of the areas of interest to nar-

rowdownthe particular task. The Kinect middle-ware

produces two matrices that are used for this purpose:

(a) pixel matrix: this matrix contains the color infor-

mation of each pixel in the RGB color space (b) user

matrix: this matrix indicates if the respective pixels

belong to a user (human) or not.

Using these matrices it is possible to maintain only

the important information, that is, the user-related

color values. Following that basic notation, the areas

of interest for the particular task have been set. In par-

ticular, two primary areas have been deﬁned, namely,

the torso area (used to model the upper clothes eval-

uation) and the lower body (used to model the lower

clothes). The determination of these areas of inter-

est was based on the user data provided by Kinect

and information stemming from particular joint co-

ordinates of the skeleton, also provided by the Kinect

sensor. The upper body area is based on a rectangu-

lar area, with dimensions that are primarily deﬁned

by the shoulders and torso joints and then enhanced

based on the user’s body width and height. Following

the same method we form the user’s lower body area

by using his hip and knee joints, from the skeleton

estimate, and performing similar improvements.

2.2 Clothes Color Representation

For each area of interest (torso and lower body), 60

feature values related to the color of the respective

clothing are extracted. In particular, 30 features stem

from the three histograms from the color information

(RGB), since 10 bins per color channels are used in

the histogram calibration. Similarly, 30 features stem

from the histograms of the edges of each color coor-

dinate. Towards, this end, the Sobel image operator

is applied on each color coordinate. At each frame

where a human is detected, a feature vector of the area

of interest is calculated as described above, along with

a respective conﬁdence measure related to that detec-

tion. This process forms a feature matrix X : M ×D,

whose rows correspond to the respective feature vec-

tors. The conﬁdence measure is extracted according

to the following weighted heuristic:

H(O, R, D) = w

·cos

(O)+ w

·R+ w

√

πσ

−

2σ

(1)

The ﬁrst factor is based on the user’s orientation in

the room (in degrees): frontal orientation either look-

ing to the Kinect (O = 0) or at the other side (O = 180)

gives the highest conﬁdence while proﬁle orientations

(e.g. O = 90) gives the lowest. The second fac-

tor depends on R which is the ratio of the current to

the previous user pixel count (i.e. number of pixels

SIGMAP2014-InternationalConferenceonSignalProcessingandMultimediaApplications

that belong to the human, as estimated by the mid-

dleware). The last factor takes into account the user’s

distance from the sensor, where the σ is set to reﬂect

the expected. The respective weights of each factor

are determined by the efﬁciency of the heuristic

in our various experiments and our only restriction is

that w

+ w

= 1. In our experiments we use

= w

= 0.4 , w

= 0.2.

Finally, each recording session is represented as a

single feature vector which is computed as a weighted

average of individual feature vectors F

∑

i=1

i,n

·C

∑

i=1

where D is the number of feature dimensions (60),

M is the number of samples (feature vectors) of the

recording session, X

i,n

is the n-th feature of the i-th

sample, n = 1, . . . , D, and C

is the conﬁdence value

of the i-th sample of the recording.

2.3 Color Constancy

The proposed system should function under a real

home-environment, therefore there is a need for ro-

bustness to varying illumination conditions. Towards

this end, we include in our methodology a color

constancy method, aiming towards color features re-

taining constant statistics under different illumination

conditions (Funt et al., 1996). In particular, we have

experimented with the following static methods for

color constancy: (a) the Grey-World algorithm which

assumes that the average color in a scene is achro-

matic and therefore normalizes each color based to

the respective gray (average) values (b) the White-

Patch method which normalizes each color coordinate

by the respective maximum channel value achieving

maximization towards a hypothetical white reference

area and (c) a simple modiﬁcation of the White-Patch

which uses the average value of a range of high-

valued pixels (instead of using the global maximum)

in order to increase robustness to noise.

3 EXPERIMENTS

3.1 Data Used

In order to evaluate the cloth change detection ability

of the proposed approach, a dataset of real recordings

has been compiled and manually annotated. In total,

four humans have participated in the recordings un-

der two different lighting conditions, namely natural

and artiﬁcial lighting. For each case, a different num-

ber of upper and lower apparel has been used. Each

recorded session is stored on a separate

oni

ﬁle using

the OpenNI library. The name of these ﬁles indicate

the IDs of the corresponding apparel.

3.2 Evaluation Method

In this Section we describe the adopted methodol-

ogy for the evaluation of the discrimination ability of

the adopted color representation. We particularly de-

scribe the process for the upper clothes as it is exactly

the same for the lower clothes case. Given:

• a set of upper clothes feature vectors FU

, i =

1, . . . , N, where N is the total number of video ses-

sions of 60 elements each.

• a vector of upper labels LU

, i = 1, . . . , N where

each different value represents a distinct piece of

clothing. This is used as ground truth in the eval-

uation process.

We start by creating the confusion matrix CM

and initializing it with zeros. Then for each pos-

sible pair of FU

and FU

where i = 1, . . . , N and

j = 1, . . . , N, j 6= i we compute their Euclidean dis-

tance DU

i, j

and compare it to a user-deﬁned threshold

T. If DU

i, j

is greater than T then that pair of clothes

is perceived to be different, otherwise the same. So

we now have four separate cases:

• CM

1,1

: number of times that two feature vectors

have the same estimated label (DU

i, j

≤T) and the

same ground truth label (LU

= LU

) - true nega-

tive

• CM

1,2

: number of times that two feature vectors

have different estimated labels (DU

i, j

> T) but the

same ground truth label (LU

= LU

) - false posi-

tive

• CM

2,1

: number of times that two feature vectors

have the same estimated label (DU

i, j

≤T) but dif-

ferent ground truth labels (LU

6= LU

)- false neg-

ative

• CM

2,2

: number of times that two feature vectors

have different estimated labels (DU

i, j

> T) and

different truth labels (LU

6= LU

) - true positive

After computing the overall confusion matrix, as

described above, it is normalized so that the two

events are considered equiprobable and ﬁnally the

performance measures Precision, Recall and F1 mea-

sure are calculated: Pr =

2,2

∑

i=1

i,2

, Re =

2,2

∑

i=1

2,i

and F1 =

2·Pr·Re

Pr+Re

. The exact same evaluation process

is repeated for the lower clothing.

3.3 Evaluation Results

As described above, the recordings have been con-

ducted under two different general illumination cate-

ClothesChangeDetectionUsingtheKinectSensor

Table 1: F1 evaluation results (%) for different lighting con-

ditions and all feature calibration methods.

Artiﬁcial Natural Mixed

No color constancy 77 82 72

Gray world

78 83 71

White Patch

84 82 77

Modiﬁed White Patch

85 85 80

gories (natural and artiﬁcial lighting). The evaluation

has been based on these two categories, as long as

their “mixed” condition: the latter is the general (and

harder) case of detecting changes under all possible

illumination conditions. The results of this process

are shown in Table 1. Due to space limitations, we do

not present the performance results for the upper and

lower clothings but only their averages. However, we

would like to report that, in average, the problem of

detecting changes on the lower clothes is at least 10%

harder, in terms of F1 measure. This is probably due

to the fact that the lower body part is usually not en-

tirely visible, in the context of a real home environ-

ment, since there are usually pieces of furniture and

other objects intervening between the sensor and the

human.

4 CONCLUSIONS

We have presented a Kinect-based approach to detect-

ing changes in users’ clothes in a smart home envi-

ronment in the context of measuring the functional

status of the elderly. The whole system has been im-

plemented in the Processing programming language,

using the OpenNI SDK and achieves real-time detec-

tion. In order to evaluate the proposed approach, a

dataset of recordings under various illumination con-

ditions has been compiled, which is also publicly

available. Experimental results have indicated that

the overall change detection method achieves up to

80% performance for mixed lighting conditions and

85 for single conditions, that is 8% compared to the

performance when the initial feature representation is

adopted. In addition, the adopted color constancy ap-

proach abridges the gap at the performance between

different illumination conditions. In the context of the

carried out ongoing work we focus on the following

directions: (a) implementation of more advanced im-

age features (e.g. HOGs) (b) evaluation of more so-

phisticated color constancy techniques and (c) exten-

sion of the benchmark with more users and clothes

combinations.

ACKNOWLEDGEMENTS

The research leading to these results has re-

ceived funding from the European Union’s Seventh

Framework Programme (FP7/2007-2013) under grant

agreement no 288532. For more details, please see

http://www.useﬁl.eu.

REFERENCES

(2011). Microsoft kinect sensor. Online available:

http://www.microsoft.com/en-us/kinectforwindows/.

Accessed April 1, 2013.

Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack,

T., and Gool, L. V. (2013). Apparel classiﬁcation with

style. In Computer Vision–ACCV 2012, pages 321–

335. Springer.

Chen, H., Gallagher, A., and Girod, B. (2012). Describing

clothing by semantic attributes. In Computer Vision–

ECCV 2012, pages 609–623. Springer.

Collin, C. and Wade, D. (1988). The barthel adl index: a

standard measure of physical disability? Disability &

Rehabilitation, 10(2):64–67.

Collin, C., Wade, D., Davies, S., and Horne, V. (1988). The

barthel adl index: a reliability study. Disability & Re-

habilitation, 10(2):61–63.

Fleury, A., Vacher, M., and Noury, N. (2010). Svm-based

multimodal classiﬁcation of activities of daily liv-

ing in health smart homes: sensors, algorithms, and

ﬁrst experimental results. Information Technology in

Biomedicine, IEEE Transactions on, 14(2):274–283.

Funt, B., Cardei, V., and Barnard, K. (1996). Learning color

constancy. In IS&T/SID Fourth Color Imaging Con-

ference, pages 58–60.

Kalantidis, Y., Kennedy, L., and Li, L.-J. (2013). Getting

the look: clothing recognition and segmentation for

automatic product suggestions in everyday photos. In

Proceedings of the 3rd conference on International

conference on multimedia retrieval, pages 105–112.

ACM.

Liu, S., Feng, J., Song, Z., Zhang, T., Lu, H., Xu, C., and

Yan, S. (2012a). Hi, magic closet, tell me what to

wear! In Proceedings of the 20th international con-

ference on Multimedia, pages 619–628. ACM.

Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., and Yan, S.

(2012b). Street-to-shop: Cross-scenario clothing re-

trieval via parts alignment and auxiliary set. In Com-

puter Vision and Pattern Recognition (CVPR), 2012

IEEE Conference on, pages 3330–3337. IEEE.

Maitin-Shepard, J., Cusumano-Towner, M., Lei, J., and

Abbeel, P. (2010). Cloth grasp point detection based

on multiple-view geometric cues with application to

robotic towel folding. In Robotics and Automa-

tion (ICRA), 2010 IEEE International Conference on,

pages 2308–2315. IEEE.

Ramisa, A., Alenya, G., Moreno-Noguer, F., and Torras, C.

(2012). Using depth and appearance features for in-

formed robot grasping of highly wrinkled clothes. In

SIGMAP2014-InternationalConferenceonSignalProcessingandMultimediaApplications

Robotics and Automation, International Conference

on, pages 1703–1708. IEEE.

Self-maintenance, P. (1969). Assessment of older people:

self-maintaining and instrumental activities of daily

living.

Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finoc-

chio, M., Blake, A., Cook, M., and Moore, R. (2013).

Real-time human pose recognition in parts from sin-

gle depth images. Communications of the ACM,

56(1):116–124.

Stikic, M., Huynh, T., Laerhoven, K. V., and Schiele, B.

(2008). Adl recognition based on the combination of

rﬁd and accelerometer sensing. In Pervasive Com-

puting Technologies for Healthcare, 2008, pages 258–

263. IEEE.

Willimon, B., Birchﬂeld, S., and Walker, I. (2011). Clas-

siﬁcation of clothing using interactive perception. In

Robotics and Automation (ICRA), 2011 IEEE Interna-

tional Conference on, pages 1862–1868. IEEE.

Willimon, B., Walker, I., and Birchﬁeld, S. (2013). A

new approach to clothing classiﬁcation using mid-

level layers. In Proceedings of the International Con-

ference on Robotics and Automation (ICRA).

Xia, L., Chen, C.-C., and Aggarwal, J. (2011). Hu-

man detection using depth information by kinect. In

Computer Vision and Pattern Recognition Workshops

(CVPRW), 2011 IEEE Computer Society Conference

on, pages 15–22. IEEE.

Zhang, Z. (2012). Microsoft kinect sensor and its effect.

Multimedia, IEEE, 19(2):4–10.

ClothesChangeDetectionUsingtheKinectSensor