Visual Analysis of Time-dependent Multivariate Data from Dairy

Farming Industry

Lorenzo Di Silvestro

, Michael Burch

, Margherita Caccamo

, Daniel Weiskopf

, Fabian Beck

and

Giovanni Gallo

Department of Mathematics and Computer Science (DMI), University of Catania, Catania, Italy

Visualization Research Center (VISUS), University of Stuttgart, Stuttgart, Germany

Consorzio Ricerca Filiera Lattiero-Casearia (CoRFiLaC), Ragusa, Italy

Keywords:

Time-Dependent Multivariate Data, Multiple Timelines, Visual Analytics, Statistical Graphics, Dairy Science.

Abstract:

This paper addresses the problem of analyzing data collected by the dairy industry with the aim of optimizing

the cattle-breeding management and maximizing proﬁt in the production of milk. The amount of multivariate

data from daily records constantly increases due to the employment of modern systems in farm management,

requiring a method to show trends and insights in data for a rapid analysis. We have designed a visual analytics

system to analyze time-varying data. Well-known visualization techniques for multivariate data are used next

to novel methods that show the intrinsic multiple timeline nature of these data as well as the linear and cyclic

time behavior. Seasonal and monthly effects on production of milk are displayed by aggregating data values

on a cow-relative timeline. Basic statistics on data values are dynamically calculated and a density plot is used

to quantify the reliability of a dataset. A qualitative expert user study conducted with animal researchers shows

that the system is an important means to identify anomalies in data collected and to understand dominant data

patterns, such as clusters of samples and outliers. The evaluation is complemented by a case study with two

datasets from the ﬁeld of dairy science.

1 INTRODUCTION

To increase the competitiveness of the dairy sector

in the national economy, the dairy industry focuses

on improving farm management. The information

on farm productivity to support management on dairy

farms is often collected by Dairy Herd Improvements

agencies. Dairy farmers are usually visited once per

month, during a day called test-day. Information on

the herd such as breeding events, gender, and weight

of new-born calves is collected. In addition, the milk

production of each cow is measured and a milk sam-

ple is gathered to determine the fat and protein con-

tent, as well as somatic cell count.

Test-day records represent a valuable resource for

animal researchers, but as data sources become larger,

the analysis and exploration of patterns in data be-

comes more complex, representing a critical bottle-

neck in analytic reasoning. To address these prob-

lems, a visual analytics approach can be used to let

users explore their data and interact with them with

the aim to ﬁnd interesting insights and eventually data

anomalies and formulate hypotheses. Techniques that

support the production and dissemination of analysis

results may help researchers communicate to a variety

of audiences.

In this paper, we present an interactive system to

analyze time-dependent multivariate data. A suite

of visual analytics tools is designed for animal re-

searchers, allowing them to perform an in-depth study

of data in order to develop and explore their hypothe-

ses. Since test-day records are multivariate data with

an intrinsic time-varying nature, we devise techniques

that are capable of representing linear and cyclic time

behavior simultaneously, for example, by showing

seasonal and monthly effects. Furthermore, a density

plot is included to let the user evaluate the reliability

of each dataset. Our system design is characterized by

its conceptual simplicity that intends to be an easy-to-

use tool for researchers with no background in visual

analytics.

The utility of our system and its visualization

methods are evaluated by a qualitative user study with

domain experts and demonstrated in a case study.

Two datasets from the dairy industry in Sicily (Italy)

are used. In Sicily, approximately 125,000 dairy cat-

Di Silvestro L., Burch M., Caccamo M., Weiskopf D., Beck F. and Gallo G..

Visual Analysis of Time-Dependent Multivariate Data from Dairy Farming Industry.

DOI: 10.5220/0004652600990106

In Proceedings of the 5th International Conference on Information Visualization Theory and Applications (IVAPP-2014), pages 99-106

ISBN: 978-989-758-005-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: System overview. Raw data are shown in tabular form. In the bottom part, the view shows a scatter plot of

selected data. The production of milk is shown for each day (absolute timeline). In this example, the parity value (number

of lactations) is used as category. During the ﬁrst parity, milk production is lower than in the next and the data value range is

narrower. Missing data points follow a regular pattern (recurring blank vertical stripes) that reﬂects the lack of data samples

collected during summer vacation (August).

tle are raised. Compared to other regions such as

those in Northern Italy, smaller traditional and less

modernized farms are present that have to deal with

hot climate, higher costs of feed and energy. Around

62% of Sicilian milk is produced in Ragusa province,

representing therefore the most important production

center for the dairy sector in Sicily. In 1996, the dairy

research center CorFiLaC was established to support

the development of the dairy sector in Ragusa (Cac-

camo, 2012). The datasets provided by CorFiLaC

and used for our case study cover the typical range

of (varying) sample rate and time span in which data

are collected.

2 RELATED WORK

As our system was designed to analyze different as-

pects of the same dataset, it merges visualization

methods from several application domains. In this

section, relevant topics from several areas will be cov-

ered, according to the different aspect of data analysis

on which the user wants to focus.

2.1 Animal Research and Dairy

Industry Domain

One of the main topics investigated by animal re-

searchers is the study of movement of individual or-

ganisms or groups of animals. There is an increas-

ing interest in movement ecology. Accordingly, there

are a few examples of visualization and visual an-

alytics research geared toward data from movement

ecology. Grundy et al. (2009) exploit data obtained

through tri-axial accelerometers to trace movement of

wild animals. They make use of interactive spherical

scatterplots and spherical histograms instead of the

2D time-series plots commonly used to study accel-

eration data.

Visual analytics is also used to study and model

epidemic spread in herds. Afzal et al. (2011) present

a suite of predictive visual analytics tools to model

the spread of Rift Valley Fever through a simulated

mosquito and cattle population in Texas.

However, we see only very little previous visual-

ization work on animal research related to the dairy

industry. In fact, there is almost no previous pa-

per in which visualization tools and visual analytics

methods would be used on milk production records

or other data from dairy industry. The only excep-

tion known to us is the work of Galligan, who cre-

ated Cowpad (Galligan, 2007), a suite of web tools

to present analytical problems commonly occurring

on dairy herds. Simple interfaces let the user evaluate

the effect of changing variables in the process of dairy

farm management. However, these tools do not use

any advanced visualization or interaction technique to

represent data involved in the simulation, only curve

diagrams and gauge metaphors are proposed.

With this paper, we want to ﬁll this gap in the liter-

ature, adopting visual analytics methods for this par-

ticular application domain.

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

100

2.2 Time-Varying Data Visualization

For animal researchers, it is important to investigate

how various factors on dairy production change over

time. Data used for this contribution have a temporal

dimension that makes it possible to highlight monthly,

seasonal, and annual trends.

For time-dependent data, there is a large collection

of different visualization techniques for different pur-

poses of data exploration and depending on the char-

acteristics of the temporal structure. Following the

taxonomy by Aigner et al. (2011), dairy production

datasets exhibit a quite speciﬁc structure: they show a

dual structure of time, both linear and cyclic.

Van Wijk and Van Selow (1999) present a system

to identify patterns and trends on multiple time scales

simultaneously. It allows the visualization of univari-

ate time-series on different levels of aggregation by

clustering similar daily data patterns. We use different

levels of aggregation to visualize two different time-

lines at once. Spiral approaches are a way to visualize

large datasets and to support the identiﬁcation of peri-

odic structures in data. Previous solutions (Carlis and

Konstan, 1998; Weber et al., 2001), however, are not

suitable in cases in which the cyclic trends of data are

not perfectly periodic, or the period is not known but

subject to speciﬁc seasonal effects.

To the best of our knowledge, no previous work

addresses the problem of visualizing linear and cyclic

time data simultaneously.

3 APPLICATION BACKGROUND

Data on farm productivity is often collected by Dairy

Herd Improvements (DHI) agencies. Milk production

of each cow is measured and a milk sample is ana-

lyzed to determine fat and protein content, and so-

matic cell count. These records are then processed

and analyzed centrally (i.e., on DHI computers) by

taking into account information generated during pre-

vious test-days at both herd and individual cow level.

A few days after the test-day, the farmer receives a re-

port containing information collected during the test-

day. This report may also contain management infor-

mation on individual cows and the herd as a whole.

The information from the report can then be used by

the farmer (or milk and cheese producer) for making

decisions focusing on the improvement of farm man-

agement practices (Caccamo, 2012).

Although DHI data and information can con-

tribute to improve management practices, the ben-

eﬁts only come about if the farm managers and/or

the animal researchers that work as advisors at DHI

spend considerable effort for analyzing the incoming

information. This process can be time-consuming and

complex due to the large amount of data.

3.1 305-Days Lactation Yield and

Test-Day Models

Milk recording is recognized as a valuable means for

breeding and herd management worldwide. Further-

more, these records are also used to perform a genetic

evaluation for dairy production traits. The genetic

evaluation of dairy sires and cows has been based on

the analysis of the 305-day lactation yield for many

years. It uses the total amount of milk produced dur-

ing a lactation (or milking) period. Usually, a dairy

cow lactates for about 10 months (∼305 days) each

year before it dries up prior to giving birth to its next

calf. The ideal method to estimate the 305-days lac-

tation yield is to measure the amount of milk and

milk components of each cow on a daily basis for

10 months after calving. However, one sample of

milk is usually taken monthly for each cow. As stated

by Schaeffer and Burnside (1976), a common method

of predicting 305-days yield is to compute the average

production between two tests, multiply by the number

of days between tests, and accumulate this quantity

after each report. If the interval between two tests is

much longer than 30 days, erroneous predictions of

this value could result. Several methods have been

developed to improve the prediction of 305-day lacta-

tion yield. The term lactation curve is used to refer to

the curve representing the rate of milk secretion with

advance in lactation.

After two decades, animal researchers started to

use test-day yields for genetic evaluation rather than

305-days yield. By deﬁnition, a test-day (TD) model

is a method of evaluating daily production of milk,

fat, protein and somatic cell count considering effects

for each test-day instead of one set of ﬁxed effects

over the lactation. The TD model estimates lactation

curves and their changes.

In the early 1990’s, Ptak and Schaeffer (1993)

identiﬁed some drawback in the new approach, such

as the need to store all of the individual test-day

yields on every cow, or the higher computational

time needed to calculate a genetic evaluation by us-

ing more values per cow instead of only one.

Within less than a decade, test-day models have

been adopted in several countries. Test-day models

are often used to perform national genetic evaluations

for dairy cattle. Estimation of genetic, environmen-

tal, and herd effects can be used to predict future pro-

ductions on individual cows. Deviation between pre-

dicted and actual production could be used to detect

VisualAnalysisofTime-DependentMultivariateDatafromDairyFarmingIndustry

101

a disease at an early stage, i.e., before the cow shows

clinical signs. Therefore, animal researchers need a

system to identify abnormalities in lactation curves.

Nowadays, the storage of big amounts of data or

the computational resources needed to handle these

data do not represent a problem anymore. It is pos-

sible to use the complete set of test-day records col-

lected over the years. Researchers can work directly

on those data with no use of statistical or mathemati-

cal models that could lead to loss of precision or erro-

neous predictions.

3.2 Multiple Timelines

The challenge for dairy producers is to interpret and

utilize data from dairy production properly to improve

decision making. Milk production data have an intrin-

sic dual timeline nature. Usually, lactation is repre-

sented according to a “cow-related” timeline: a curve

shows the quantity of produced milk changing over

time, from the day of calving to the moment in which

the cow is dried off, and milking ceases. About sixty

days later, one year after the birth of her previous calf,

a cow will calve again, starting a new parity (period

of lactation).

Since the “cow-relative” behavior is most im-

portant for temporal analysis, that relative time—

represented as day-in-milk—is of highest relevance

for analysis. Parities represent cyclic time characteris-

tics in the data because the parities progress similarly.

If one analyzes a single parity, there is linear time be-

havior within that parity. However, there is another

temporal aspect that needs to be taken into account:

the absolute time of the yearly calendar. Possible sea-

sonal effects on production of milk and milk compo-

nents ratio are due to yearly cyclic effects. However,

within the year, we consider calendar dates as linear

time because there are no weekly or daily effects on

milk production data.

In this paper, a method to show and analyze the

dual nature of time, with respect to the linear and

cyclic time behavior is presented.

4 SYSTEM OVERVIEW

In our system, several visualization methods are used.

Well-known approaches are used besides some new

techniques for visualizing two coexisting timelines si-

multaneously. The aim of the tool is to assist ani-

mal researchers in all the phases of their analysis pro-

cess. A preliminary analysis could be performed for

a data cleaning phase; then a deeper exploration lets

the user study different aspects of the data, compar-

ing different views. To address multivariate and time-

dependent data visualization, we designed our system

to support several, complementary views. The user

can choose from a list of four different views (see be-

low for details).

It is possible to load and handle any kind of multi-

variate dataset with the constraint that a ﬁeld contain-

ing a date (or time) must be included. Data are loaded

and shown in a tabular form similar to spreadsheets

and application programs that animal researchers are

used to work with.

4.1 Scatter Plots

One of the visualization methods in the system is

based on scatter plots. In this way, multivariate data

can be shown, such as the attributes milk production,

fat contents, protein values, and their time depen-

dency. Any data ﬁeld can be used as x-axis and y-axis.

A third data ﬁeld can be used as a category to accord-

ingly color each data point in the scatter plot. In this

way, the colored scatter plot can show three data di-

mensions. If the category ﬁeld chosen contains labels

that assign classes (categorical data), the user is asked

to select a color for each label. Otherwise data points

are colored automatically using a color scheme se-

lected by the user from a suggested list. Furthermore,

it is possible to ﬁlter data by category values. When

a burst of aggregated data points overlaps making the

scatter plot difﬁcult to understand, transparency can

be used to reduce visual clutter. Moving the mouse

pointer over the plot, details on data points are dis-

played next to the view. Figure 1 shows the scatter

plot, and the side panel. This panel is used for choos-

ing data to plot, and environment settings.

4.2 Statistic Metrics and Density Plots

A line chart of mean values as well as standard devia-

tion can be superimposed on the scatter plot diagram,

showing aggregated statistical quantities. Moreover,

the density of data samples in the dataset for each

y-axis value is calculated and shown as a histogram.

The histogram allows the user to assess the quality of

the original data indirectly; less density is related to

higher variability of the data samples in that region.

Therefore, the combination of density diagram with

other diagrams allows us to include a measure of un-

certainty in the overall visualization.

A low-pass ﬁlter can be used to smooth the line

plots and density plots. The user can increase or de-

crease width of the low-pass ﬁlter interactively. In

Figure 2, a dataset is displayed in which the density

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

102

Figure 2: Statistics on time-varying multivariate data. The

value of produced milk for each day-in-milk is shown. Scat-

ter plot points are colored according to the value of fat con-

tained in the sample of milk. The curve of density of sam-

ples in the dataset is represented in gray. The red line indi-

cates the mean value of quantity of milk and the blue lines

the standard deviation.

Figure 3: Relation between the quantity of milk (x-axis)

and the percentage of protein (y-axis). Data points are col-

ored according to the percentage of fat in milk samples

(lighter color for higher values). The density plot shows

that the number of samples for milk quantity follows a nor-

mal distribution.

of data samples varies over time following a regular

pattern: more samples are collected in certain days

of the month. As shown in Figure 3, the density plot

can be used to recognize that samples are normally

distributed for a certain random variable (milk in this

example).

4.3 Multiple Timelines View

We present two methods for showing multiple time-

lines simultaneously. As discussed in Section 3.2, the

different temporal data characteristics make it difﬁ-

cult to analyze the temporal effects. In particular, de-

pending on the focal point of analysis, cyclic and lin-

ear as well as relative and absolute time play a role.

For analysts, it is useful to highlight cyclic data pat-

terns as well as to point out differences between dif-

ferent periods of time. The ﬁrst method aggregates

data points for different periods of time over the rel-

ative timeline. Multiple scatter plots share the same

x-axis that represents the relative timeline. A plot is

shown for each period of the absolute timeline. The

user can choose the time period to use from a list to

aggregate data and build up a stacked view of several

plots. It is possible to create a plot for each season,

Figure 4: The value of produced milk for each day-in-milk

(relative timeline). The absolute timeline (date) is shown

by aggregating monthly data. Data points are colored from

blue to yellow according to the parity. In ﬁrst parities milk

production is lower. Stacked scatter plots highlight differ-

ences in milk production and in data sampling: a (shifting)

gap in data points for a certain period in day-in-milk is vis-

ible.

month, or year. In Figure 4, stacked scatter plots are

shown for different months. This method helps iden-

tify frequent patterns in data points or highlight cyclic

behavior, but it is difﬁcult to draw a comparison be-

tween data values at the same relative time point in

different plots.

The second method uses a single plot: different

line charts are drawn, representing the mean value

curve for aggregated data on different time intervals.

In this view, each plot shares the same coordinate sys-

tem, so that the line charts are aligned. Thus, it is sim-

pler to identify differences in the curve of values for

different time periods.

5 CASE STUDY

We demonstrate the usefulness of our approach for

two datasets with different features from the same ap-

plication area. Data used for our study was provided

by CoRFiLaC dairy research center.

The analysis was conducted with an expert that

was familiar with the datasets. During our study, hy-

potheses made by the domain expert were conﬁrmed

with the visual analytics approach, and the implemen-

VisualAnalysisofTime-DependentMultivariateDatafromDairyFarmingIndustry

103

tation of new features was suggested in the sense of a

formative process.

Data used for our study were collected and used

by animal researchers to assist farmers in the man-

agement of dairy herds. The ﬁrst dataset (Dataset 1)

contains 175,689 records on 6,468 cows from over 40

herds. Samples were taken monthly at the test-day,

for over 10 years. Each record provides the follow-

ing information: identiﬁcation number of the herd and

the cow, breed of the cow, number of lactation (also

called parity), date of the test-day, days-in-milk (i.e.,

the number of days between the calving of the current

parity and the test-day), yield of milk, fat and protein

contents, detected pathology associated with the test-

day.

The second dataset (Dataset 2) used contains data

collected from one farm only, i.e., one herd. At

this farm, production data are collected in autonomy,

without the support of a DHI agency. It is a modern

farm in which data are collected almost daily. Data

records regard a single herd of 90 high production

cows that continue to produce a sufﬁcient quantity of

milk for 11 months after calving (5 to 395 days-in-

milk).

5.1 Analysis Process

The analysis started by showing an overview of data

contained in Dataset 1 (see Figure 1). By using the

absolute timeline (date of test-day), a recurring gap is

evident in data sampling. This gap can be explained

by the period of vacation that was observed in the

farm during years prior to 2010 in Sicily.

Colors were used to mark data points according

to the detected pathology associated with the test-day.

Different views allowed us to spot signiﬁcant changes

in the overall shape of curves, from which we could

derive some insights. A slight loss in milk produc-

tion was recognized for cows suffering from masti-

tis, when using the relative timeline (days-in-milk).

When using the absolute timeline, we noticed a lack

of registered sick cows before 2005. The density plot

revealed evident changes in number of data samples

over time. In the dataset, an increase in data collected

in the last years could be recognized, which we at-

tributed to the employment of modern systems in farm

management. By showing data according to the days-

in-milk (see Figure 2), a regular pattern in the density

plot could be highlighted. Usually, the same group of

agents from the DHI agency visits all the farms in the

same area, scheduling one visit per month for each

one. Thus, each peak in the density plot corresponds

to the recording day (test-day) in the biggest farm of

the area.

(a)

(b)

Figure 5: Multiple line plots highlight differences in lac-

tation curve (a) and fat contents (b) for ﬁve parities. The

quantity of fat decreases during the peak of milk produc-

tion. The shape of the fat curve is equal for different pari-

ties, whereas the shape of the lactation curve changes over

parities.

During the ﬁrst analysis phase, the system turned

out to be a useful and fast means for data cleaning—

on top of direct visual analysis.

The multiple timeline views were used to ﬁnd dif-

ferences in milk production for different seasons. In

winter, cows generally produce more milk than in

spring, but no evident differences could be identiﬁed

for different months. By plotting stacked data points

for each month, some data points in the August plot

are shown, which is an anomaly in data recording that

could be conﬁrmed after the session.

During the analysis process, some new features

were suggested by the domain expert. The possibil-

ity to aggregate data and to visualize different plots

according to the breed and the parity number were

added in response to the expert’s comments. By

showing different curves for breeds, it is possible to

ﬁnd out which cow variety is more productive and

how milk composition between different stocks dif-

fers. Different lactation curves for each parity conﬁrm

well-known theories. Figure 5 shows different curves

for the ﬁrst 5 parities in the data from Dataset 2. The

domain expert also asked to re-align data according

to the beginning of the lactation period, so that it is

possible to ﬁnd differences in the lactation curve de-

pending on the moment of calving.

6 EXPERT EVALUATION

The analysis of the feedback by potential target users

describes preferences concerning views and features

of the tool, and helps us improve the software to meet

users’ needs. We were mostly interested in follow-

ing the natural process of hypothesis building and

problem-solving adopted by the animal researchers.

To achieve this goal, it is fundamental to let the do-

main experts operate in their workplace as explained

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

104

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

Figure 6: Scatter plots show the relation between quantity of milk (x-axis) and percentage of fat (y-axis). Data points are

colored according to the parity. These image series shows different uses of the interactive ﬁlter. Figures (a)–(f) show data

for different parities (1 to 6). In Figures (g)–(l), the ﬁrst endpoint is constantly set to parity 1 and the width of the interval is

sequentially increased, to add data points from the next parity to the plot. Milk-fat correlation is different between the ﬁrst

two parities and the next ones. From the third parity, the percentage of fat decreases, while the production of milk increases.

The number of data points falls off in last parities because most of the cows became less productive, removed from the dairy

herd, and marketed for beef, around the age of four.

by Dunbar and Dunbar (1999) proposing in-vivo stud-

ies. For our study, we adopted the pair analytics

approach. This method generates verbal data about

thought processes in a naturalistic human-to-human

interaction with visual analytic tools. It requires two

participants, a Subject Matter Expert and a Visual An-

alytics Expert (Arias-Hernandez et al., 2011).

Every study session was performed by using a

VoIP software for communication and each audio ses-

sion was recorded. An application for remote desktop

control was used to operate on a computer in the IT

laboratory at CorFiLaC. The three volunteers worked

directly on the computer used for the user study.

During a preliminary group session, the partici-

pants were informed about the procedure and the vol-

untary nature of the experiment. Afterwards, a private

study session was performed with each of the partici-

pants left alone in the IT laboratory. For each session,

the remote desktop was video recorded. Each session

was performed in conjunction with a Visual Analytics

Expert that was able to control the remote computer

by using mouse and keyboard. He was disheartened to

control remotely the system and let the Subject Mat-

ter Expert use it. The Visual Analytics Expert was

instead encouraged to talk with the animal researcher

and ask for comments and the motivation behind ev-

ery choice made to perform the requested tasks.

At the beginning of the private session, the vol-

unteer was asked to inspect the overview of a subset

of records from Dataset 1 (in Figure 1). The analysis

performed by the domain expert during the case study

was used as ground truth to evaluate the correctness of

the volunteers’ answers. The ﬁrst task was to explain

the missing value pattern in the scatter plot. The sec-

(a)

(b)

Figure 7: (a) Fat content for day-in-milk during fall. (b) Fat

content for day-in-milk for parity started in fall. Data points

are colored from black to blue for the ﬁrst three parities.

Missing samples visible in (a) that can affect the analysis

task are avoided by aligning data points on the season of

calving, as shown in (b).

ond task was to freely comment the data points cloud

according to the color coding used to produce the plot.

Then, the participant was asked to freely use the

visual analytics approach on Dataset 2.

Each study session took about 90 minutes for each

participant. During the session, the experimenter an-

notated the user’s behavior, preferred view, and in-

teraction methods to complement audio and video

recording that were analyzed afterwards for the qual-

itative analysis of the system usage sessions.

For the qualitative user study, three domain ex-

perts from CorFiLaC (Sicily), i.e., potential target

users, were asked to use our system.

The simple tasks assigned were used to evaluate

the effort required to understand the plots. Two of the

participants readily explained the lack of data points

and the regular pattern followed.

The second task involved the comprehension of

the color coding used to mark data points accord-

ing to the parity. None of the participants had any

VisualAnalysisofTime-DependentMultivariateDatafromDairyFarmingIndustry

105

difﬁculties working with this task. Their comments

revealed that they took advantage of using color to

highlight different data behaviors. Initial difﬁculties

to correctly interpret the scatter plot were overcome

with the help of the Visual Analytics Expert.

User opinions on ease of use and utility of our sys-

tem were collected by the questionnaires sent to the

domain expert after the user study sessions. The an-

swers show that the overall user experience with the

system was positive. The participants identiﬁed the

multiple timelines view as the most useful when line

charts are used. They also found the possibility to

use the same view to highlight differences for differ-

ent parity or breed interesting, referring to the features

suggested by the domain expert involved in the case

study. In particular, they indicated as an important

means for analysis the possibility to re-align data on

the day of calving. This method helps obtain informa-

tion on the shape of complete curves, not considering

gaps in data sampling as shown in Figure 7.

Each of them preferred to use line plots to com-

pare the lactation curve shape, but they used scatter

plots to identify anomalies in the data, such as gaps

in data sampling of artiﬁcial data values. They recog-

nized the system as a means for simple graph creation,

in order to show analysis results to farmers.

In the free comments section of the question-

naires, each of them asked to ﬁlter data interactively,

without recurring to the editing interface of the sys-

tem. An interactive ﬁlter was later implemented and

the usage is shown in Figure 6.

7 CONCLUSION AND FUTURE

WORK

We used a visual analytics approach to support animal

researchers in analyzing multivariate time-varying

data. The system is designed to address the needs of

the domain experts. By using real-world data from

the dairy industry, we could prove the utility of the

system as a means of identifying anomalies for a data

cleaning phase, and as a tool for hypothesis building.

Besides, an expert user study showed that researchers

without background knowledge of visual analytics

methods are able to adopt the system quickly.

For proper interpretation of lactation curves, milk

production information has to be related to man-

agement practices and environmental conditions that

might affect lactation curves. The individual cow

level data have to be analyzed besides herd level data.

The possibility to handle this kind of data could be

added to our system in future work. Also, differ-

ent case studies could be conducted on datasets from

other domains. The multiple timeline nature of the

data from dairy science might be present in other data

as well, and the system might be applicable there, too.

Finally, the expert user study could be extended by in-

cluding a larger group of domain experts.

REFERENCES

Afzal, S., Maciejewski, R., and Ebert, D. (2011). Visual an-

alytics decision support environment for epidemic mod-

eling and response evaluation. In Proceedings of the

IEEE Conference on Visual Analytics Science and Tech-

nology (VAST), pages 191–200.

Aigner, W., Miksch, S., Schumann, H., and Tominski, C.

(2011). Visualization of Time-Oriented Data. Human-

Computer Interaction Series. Springer.

Arias-Hernandez, R., Kaastra, L. T., Green, T. M., and

Fisher, B. (2011). Pair analytics: Capturing reasoning

processes in collaborative visual analytics. In Proceed-

ings of the 44th Hawaii International Conference on Sys-

tem Sciences (HICSS), pages 1–10.

Caccamo, M. (2012). Management Parameters from the

Random Regressions Testday Model to Advice Farmers

on Cow Nutrition. publisher.

Carlis, J. V. and Konstan, J. A. (1998). Interactive visualiza-

tion of serial periodic data. In Proceedings of the ACM

Symposium on User Interface Software and Technology

(UIST), pages 29–38.

Dunbar, K. and Dunbar, K. (1999). Scientists build mod-

els invivo science as a window on the science mind. In

Model-Based Reasoning in Scientiﬁc Discovery, pages

85–99. Kluwer Academic/Plenum Publishers.

Galligan, D. (2007). Cowpad – Visual Analyt-

ics. http://cahpwww.vet.upenn.edu/node/89. Ac-

cessed: 22/02/2013.

Grundy, E., Jones, M. W., Laramee, R. S., Wilson, R. P.,

and Shepard, E. L. C. (2009). Visualisation of sensor

data from animal movement. Computer Graphics Forum,

28(3):815–822.

Ptak, E. and Schaeffer, L. (1993). Use of test day yields

for genetic evaluation of dairy sires and cows. Livestock

Production Science, 34(1):23–34.

Schaeffer, L. R. and Burnside, E. B. (1976). Estimating the

shape of the lactation curve. Canadian Journal of Animal

Science, 56(2):157–170.

Van Wijk, J. J. and Van Selow, E. R. (1999). Cluster and

calendar based visualization of time series data. In Pro-

ceedings of the 1999 IEEE Symposium on Information

Visualization, pages 4–9.

Weber, M., Alexa, M., and M

uller, W. (2001). Visualizing

time-series on spirals. In Proceedings of the IEEE Sym-

posium on Information Visualization, pages 7–13.

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

106