Section IV concludes the paper and suggests
possible directions for further work.
2 RELATED WORK
Our work is motivated by and builds on recent
results both in understanding the patterns of
behavior, habits and movements of the people and
data mining. In particular we take much of our
motivation from the work presented in (Gonzalez et
al., 2008); (Wang et al., 2009). Both of these papers
have similar goals of combining geospatial
information with mathematical models in order to
extract some significant patterns of human motion.
In (Gonzalez et al., 2008) the authors address the
challenging problem of mathematically modeling
human mobility. Their study is based on two mobile-
phone-use derived datasets. The first was collected
by tracking 100 000 anonymized mobile phone
users, selected out of a sample of over 6 million
users. Their position was recorded any time thy
initiated a call or sent an SMS over a six-month
period. The second dataset captured the location of
206 users whose position was recorded every two
hours, for an entire week.
Analyzing user displacements between
consecutive positions they mathematically show that
their distribution is well approximated by a truncated
power law. The authors continue the analysis to
show that this type is of distribution captures a
convolution of individual Lévy flight trajectories
(Righton and Pirchford, 2007) and population based
heterogeneity.
Defining the radius of gyration (r
g
) of a single
user be the typical distance travelled by the user up
to time t, they show that the rescaling of the
distribution of displacements with this value causes
it to collapse into a single distribution, suggesting
that a single relative jump size distribution
characterizes all users, independent of their r
g
.
Finally, ranking of the locations visited by the users
reveals that the people devote most time to a few
locations, while spending their remaining time in 5
to 50 places.
In (Wang et al., 2009) the authors used the same
mobile phone data to study fundamental spreading
patterns that characterize a mobile virus (Bluetooth
and MMS) outbreak. While geo-referenced images
from Flickr are not suitable for modeling the MMS
virus outbreak, they can be used to analyze the
spread of biological viruses which are passed in a
fashion analogous to that of the Bluetooth viruses.
Thus, the results derived in this paper have potential
application in the domain of virus outbreak analysis
and prevention.
3 METHODS AND RESULTS
3.1 Human Mobility Patterns
In our research, we used dataset of 1 million
metadata records associated with Flickr images
pertinent to the San Francisco/San Diego area. The
content has been downloaded automatically using a
tool developed in our lab, which in turn relies on
Flickr public API and uses C URL library.
We used two datasets to explore the mobility
patterns of individuals. The first (S1) consists of all
the geo-referenced videos in the downloaded dataset,
the second (S2) is a subset of this data that is
comprised of data uploaded by users who
contributed images over a period of time longer than
a week. This was done in an attempt to eliminate the
contribution of tourists from S2, as we assumed that
users with just a few images over a short period of
time fall in this category.
To explore the statistical properties of Flickr
users’ mobility patterns, we first take a look at the
displacements between user’s successive positions.
We find that the distribution of displacements can be
described well using a truncated power law (1):
P(∆r) = (∆r + ∆r
0
)
-β
exp(-∆r/k) (1)
with exponent values β=1.65±0.15 (for S
1
) and
β=1.70±0.18 (for S
2
) (mean ± standard deviation),
Δr
0
=1km and cut-off value k= 50km (see Figure 1).
Note that the observed scaling exponent is between
β=1.75±0.15 observed in (Gonzalez et al., 2008) for
mobile-phone-use data and β=1.59 observed in
(Edwards at el., 2007) for bank-note tracking data.
This suggests that all three distributions capture the
similar fundamental mechanism driving human
mobility patterns. Δr
0
and cut-of value k observed
are also close to what was obtained in (Gonzalez et
al., 2008) (Δr
0
=
1.5km, k=80km). The difference in
the value of Δr
0
may be due to the fact that the data
used in this study is actually more precise in terms
of user’s position, as the mobile phone data had to
be approximated to the center of the network cell.
A plot of the Probability Density Function (PDF)
of the displacements is shown in Figure 1. As the
figure indicates, S2 fits the power law better, but the
general trend is presented in both datasets.
ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods
544