Table 1: The scale used to identify affecting factors.
Very Strong Positive Affecting Factor r
c
> 0.8
Strong Positive Affecting Factor 0.5 6 r
c
< 0.8
Moderate Positive Affecting Factor 0.3 6 r
c
< 0.5
Weak Positive Affecting Factor 0.2 6 r
c
< 0.3
No-Effect Factor −0.2 < r
c
< 0.2
Weak Negative Affecting Factor −0.3 < r
c
6 0.2
Moderate Negative Affecting Factor −0.5 < r
c
6 0.3
Strong Negative Affecting Factor −0.8 < r
c
6 0.5
Very Strong Negative Affecting Factor r
c
6 −0.8
people to launch their self-tracking/monitoring exper-
iments. Second, some people also intend to investi-
gate the relationship between potential affecting fac-
tors and the target health metric, and seek out critical
affecting factors. By doing so, they can have deeper
self-knowledge as to how their physiological condi-
tions are affected by various factors so that it would
become possible for them to maximize their health
outcomes through life-style adjustment. Base on the
assumption that users have collected enough data on
the concerned target health metric and its potential af-
fecting factors, the R script analyzes the data accord-
ing to the following five stages.
1) Stage 1: Data Preparation and Cleaning
In any process of data collection process, missing
data (P. E. McKnight and Figueredo, 2007) is likely
to happen because of device failure, data entry error,
lost data, and human causes (S. R. Wisniewski and
Trivedi, 2006; J. E. Broderick, 2003). A data spread-
sheet containing missing values is not suitable for fur-
ther analysis, as arithmetic expressions and functions
that contain missing values yield problematic and un-
reliable results. In R, missing values are represented
by the symbol NA (not available). Missing data can
be handled by using either basic data cleaning tech-
niques to simply remove the observations that con-
tain missing values, or advanced techniques that re-
place the missing values with reasonable alternative
data values (G. L. Schlomer and Card, 2010; Allison,
2001).
2) Stage 2: Baseline Establishment
This function is achieved by calculating basic
statistics (average value and standard deviation) of
the target health metric. This help the users estab-
lish a baseline which is essential in identifying the
gap between current and desired status of the target
health metric. The baseline can also be used as a ref-
erence for comparison after the improvement actions
are taken.
3) Stage 3: Correlation Analysis
Affecting factors of the target metric are identified
through correlation analysis. The R script will calcu-
late the correlation between the target health metric
and each of the potential factors. The effect of a factor
is considered in proportion to its correlation with the
target health metric, and the scale that is used to iden-
tify affecting factors is summarized in Table 1 where
r
c
represents the correlation coefficient between a po-
tential affecting factor and the target health metric.
4) Stage 4: Summary and Interpretation
After analyzing the correlation between the target
health metric and each of the potential affecting fac-
tors, all potential affecting factors will be classified
into three categories: positive affecting factors, nega-
tive affecting factors, and no-effect factors. As poten-
tial users of this web application may not have back-
ground in statistics, the R script will interpret the anal-
ysis results into plain words to help users understand
the results. It is worth noticing that since automat-
ically extract meaning from the numbers and deliv-
ery the findings in natural language is a big challenge,
the interpretation may not sound very natural and is
not sufficiently adaptive in some cases. However, the
current interpretation scheme is enough to deliver the
basic findings of the analysis results.
5) Stage 5: Report Generation
At last, the R script will generate a report on the
analysis results in pdf format for the user to download.
A report covers the following information.
• The basic statistics on the target health metric;
• A list of the correlation coefficients between the
target health metric and the potential affecting fac-
tors;
• A summary on the effect of all potential affecting
factors;
• The interpretation of the analysis results.
5 EXPECTED OUTCOME
The final outcome of this project is a web-based user-
side statistics-free data analysis software tool named
DataMakeSense, which is accessible by all inter-
net users regardless of their physical location. The
AWebApplicationforAutomaticAnalysisonLife-styleFactorsAffectingPersonalHealthfromSelf-trackingData-
TowardsUser-sideStatisticsFreePersonalDataAnalysis
11