Dismantling Composite Visualizations in the Scientiﬁc Literature

Po-Shen Lee

and Bill Howe

Department of Electrical Engineering, University of Washington, 185 Stevens Way, Seattle, U.S.A.

Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, U.S.A.

Keywords:

Visualization, Multi-chart Figure, Chart Segmentation, Chart Recognition and Understanding, Scientiﬁc

Literature Retrieval, Content-based Image Retrieval.

Abstract:

We are analyzing the visualizations in the scientiﬁc literature to enhance search services, detect plagiarism,

and study bibliometrics. An immediate problem is the ubiquitous use of multi-part ﬁgures: single images with

multiple embedded sub-visualizations. Such ﬁgures account for approximately 35% of the ﬁgures in the scien-

tiﬁc literature. Conventional image segmentation techniques and other existing approaches have been shown

to be ineffective for parsing visualizations. We propose an algorithm to automatically segment multi-chart vi-

sualizations into a set of single-chart visualizations, thereby enabling downstream analysis. Our approach ﬁrst

splits an image into fragments based on background color and layout patterns. An SVM-based binary classi-

ﬁer then distinguishes complete charts from auxiliary fragments such as labels, ticks, and legends, achieving

an average 98.1% accuracy. Next, we recursively merge fragments to reconstruct complete visualizations,

choosing between alternative merge trees using a novel scoring function. To evaluate our approach, we used

261 scientiﬁc multi-chart ﬁgures randomly selected from the Pubmed database. Our algorithm achieves 80%

recall and 85% precision of perfect extractions for the common case of eight or fewer sub-ﬁgures per ﬁgure.

Further, even imperfect extractions are shown to be sufﬁcient for most chart classiﬁcation and reasoning tasks

associated with bibliometrics and academic search applications.

1 INTRODUCTION

The information content of the scientiﬁc literature is

largely represented visually in the ﬁgures — charts,

diagrams, tables, photographs, etc. (Tuﬂe, 1983).

However, this information remains largely inacces-

sible and unused for document analysis (Bergstrom

et al., 2008), (West et al., 2006) or in academic search

portals such as Google Scholar and Microsoft Aca-

demic Search. We posit that the structure and content

of the visual information in the ﬁgures closely relate

to its impact, that it can be analyzed to study how dif-

ferent ﬁelds of science organize and present data, and,

ultimately, that it can be used to improve search and

analytics tools (White, 2009), (Dean and Ghemawat,

2008). All of these applications share requirements

around the ability to extract, classify, manage, and

reason about the content of the ﬁgures in the papers

rather than just the text alone.

In an initial investigation of these hypotheses, we

extracted all ﬁgures from a corpus of PubMed papers

and developed a classiﬁer to recognize them, build-

ing on the work of Savva et al. (Savva et al., 2011).

We quickly found that about 35% of all ﬁgures were

composite ﬁgures that contained multiple sub-ﬁgures,

and therefore could not be meaningfully classiﬁed di-

rectly. Finding it difﬁcult to ignore 35% of the infor-

mation before we even began, we decided to tackle

the problem of “dismantling” these composite ﬁgures

automatically; our solution to the dismantling prob-

lem is described in this paper.

Figure 1(a) illustrates a simple example of a com-

posite ﬁgure. The ﬁgure includes a diagram of a

molecular sequence (A), a set of photographs of elec-

trophoresis gels (B), an accumulation of a speciﬁc

type of cells represented as a bar chart (C), and an

alternative visualization of molecular sequences (D).

Some sub-ﬁgures includes additional substructures:

Part A breaks the sequences into two zoom-in sec-

tions and part D includes four distinct (but related)

sub-diagrams. The task to extract the intended sub-

ﬁgures is hard: The diversity and complexity of the

hand-crafted visualizations that appear in the litera-

ture resist simple heuristic approaches. (The reader

is encouraged to browse some of the real examples in

this paper as an illustration of this diversity and com-

plexity.) Basic image segmentation techniques are in-

applicable; they cannot distinguish between meaning-

Lee P. and Howe B..

Dismantling Composite Visualizations in the Scientiﬁc Literature.

DOI: 10.5220/0005213100790091

In Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM-2015), pages 79-91

ISBN: 978-989-758-077-2

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

(a) (b) (c)

MergingSplitting

Figure 1: Dismantling a visualization from the scienctiﬁc literature. The original source image (a) is ﬁrst segmented by a

spliting method relying on background and layout patterns (b), then the fragments are classiﬁed and recursively merged into

meaningful visualizations. Image source: (Boone et al., PLoS ONE, 5).

ful sub-ﬁgures and auxiliary fragments such as labels,

annotations, legends, ticks, titles, etc.

Aside from the applications in literature search

services, ﬁgure classiﬁcation, and bibliometrics, ex-

posing these multi-part ﬁgures for analysis affords

new basic research into the role of visualization in sci-

ence. Consider the following questions:

1. What types of visualization are common in

each discipline, and how have these preferences

evolved over time?

2. How does the use of visualization in the literature

correlate with measures of impact?

3. Do certain disciplines tend to use specialized vi-

sualizations more often than others? Does the use

of specialized visualizations tend to be rewarded

in terms of impact?

4. Can we reverse engineer the data used in the paper

automatically by analyzing the text and visualiza-

tions of the paper alone?

As a ﬁrst step toward answering these questions,

we present a decomposition algorithm to extract the

content of multi-part ﬁgures. This algorithm in-

volves three steps: First, we split a composite ﬁgure

into small components by reasoning about layout pat-

terns and the empty space between sub-ﬁgures (Fig-

ure 1(b)). Second, we merge the split fragments by

using an SVM-based classiﬁer to distinguish auxiliary

elements such as ticks, labels and legends that can be

safely merged into standalone sub-ﬁgures that should

remain distinct (Figure 1(c)). Third, we assign a score

to alternative initial segmentation strategies and se-

lect the higher scoring decomposition as the ﬁnal out-

put. To evaluate our method, we compiled a cor-

pus of 261 multi-chart ﬁgures chosen randomly from

the PubMed database. We manually decomposed

these multi-part ﬁgures and found that they were com-

prised of 1534 individual visualizations. Our algo-

rithm produced 1281 total sub-images, of which 1035

were perfect matches for a manually extracted sub-

ﬁgure. The remaining 246 incorrect pieces were ei-

ther multi-chart images that required further subdivi-

sion, or meaningless fragments that required further

merging. For the 85% of the images containing eight

or fewer sub-ﬁgures, we achieved 80.1% recall and

85.1% precision of correct sub-images. For the re-

maining 15% of densely packed and complex ﬁgures,

we achieved 42.1% recall and 68.3% precision for

correct sub-images. However, even in these cases, the

incorrect sub-images were typically either still rec-

ognizable (Figure 14(a)(b)(f)) or were obviously am-

biguous cases that were difﬁcult to separate manually

(Figure 14(e)).

2 RELATED WORK

Content-based image retrieval (CBIR) organizes digi-

tal image archives by their visual content (Datta et al.,

2006), (Lew, 2006), (Smeulders et al., 2000), allow-

ing users to retrieve images sharing similar visual el-

ements with query images. This technology has been

widely deployed and is available in multiple online

applications. However, CBIR has not been used to

enhance scientiﬁc and technical document retrieval,

despite the importance of ﬁgures to the information

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

Block 1 Fire Lane Block 2

Figure 2: A ﬁgure containing two charts. Each chart can be

covered by one block. A blank stripe in the middle separates

them as a “ﬁre lane”. Image source: (

Cech et al., BMC

bioinformatics, 14:205).

content of a scientiﬁc paper. Current academic search

systems are based on annotations of titles, authors, ab-

stracts, key words and references, as well as the text

content.

Recognition of data visualizations is a different

problem than recognition of photographs or drawn

images. In early studies, Futrelle et al presented

a diagram-understanding system utilizing graphics

constraint grammars to recognize two-dimensional

graphs (Futrelle et al., 1992). Later, they proposed

a scheme to classify vector graphics in PDF docu-

ments via spatial analysis and graphemes (Futrelle

et al., 2003), (Shao and Futrelle, 2006). N. Yokokura

et al presented a layout-based approach to build a lay-

out network containing possible chart primitives for

recognition of bar charts (Yokokura and Watanabe,

1998). Y. Zhou et al used Hough-based techniques

(Zhou and Tan, 2000) and Hidden Markov Models

(Zhou and Tan, 2001) to approach bar chart detec-

tion and recognition. W. Huang et al proposed model-

based method to recognize several types of chart im-

ages (Huang et al., 2004). Later they also introduced

optical character recognition and question answering

for chart classiﬁcation (Huang and Tan, 2007). In

2007, V. Prasad et al applied multiple computer vision

techniques including Histogram of Orientation Gradi-

ent, Scale Invariant Feature Transform, detection of

salient curves etc. as well as Support Vector Ma-

chine (SVM) to classiﬁed ﬁve commonly used charts

(Prasad et al., 2007). In 2011, Savva et al. proposed

an interesting application of chart recognition (Savva

et al., 2011). Their system classiﬁes charts ﬁrst, ex-

tracts data from charts second and then re-designs vi-

sualizations to improve graphical perception ﬁnally.

They achieved above 90% accuracy in chart classiﬁ-

cation for ten commonly used charts. These works

focused on recognizing and understanding individual

chart images. None of these efforts worked with ﬁg-

ures in the scientiﬁc literature, which are considerably

more complex than typical visualizations, and none

Fire Lane Locations

Column #

Fire Lane

0 50 100 150 200 250 300 350 400 450 500 550 600

0.0

0.5

1.0

0.0

0.5

1.0

Blank Coverage

Fire Lane

Blank Coverage

(a)

(b)

Figure 3: (a) Fire lanes. We locate the lanes by using

the histogram of columns. Orange dots represent qualiﬁed

columns that pass the thresholds. (b) Histogram of columns.

Image source: (Subramaniam et al., The Journal of cell bi-

ology, 165:357-369).

involved the use of multi-chart images.

3 DECOMPOSITION

ALGORITHM

Our algorithm is comprised of three steps: (1) split-

ting, (2) merging and (3) selecting. In ﬁrst step, we

recursively segment an original images into separate

sub-images by analyzing empty space and applying

assumptions about layout. In the second step, we use

an SVM-based classiﬁer to distinguish complete sub-

ﬁgures from auxiliary fragments (ticks, labels, leg-

ends, annotations) or empty regions. In the third step,

we compare the results produced by alternative initial

segmentation strategies by using a scoring function

and select the best choice as the ﬁnal output.

3.1 Step 1: Splitting

The splitting algorithm recursively decomposes the

original ﬁgure into sub-images. Authors assemble

multiple visualizations together in a single ﬁgure to

accommodate a limited space budget or to relate mul-

tiple visualizations into a coherent argument. We

made a few observations about how these ﬁgures are

assembled that guide the design of our splitting al-

gorithm: First, the layout typically involves a hierar-

chical rectangular subdivision as opposed to an arbi-

trarily unstructured collage. Second, authors often in-

clude a narrow blank buffer between two sub-ﬁgures

as a “ﬁre lane” to ensure that the overall layout is

readable (Figure 2). Third, paper-based ﬁgures are

typically set against a light-colored background. We

will discuss ﬁgures that violate these assumptions in

Section 4.

DismantlingCompositeVisualizationsintheScientificLiterature

Based on these assumptions, our splitting algo-

rithm recursively locates empty spaces and divide the

multi-chart ﬁgure into blocks. Based on our rect-

angularity assumption, we locate empty spaces by

seeking wholly blank rows or columns as opposed to

empty pixels. We ﬁrst convert the color image into

grayscale, and then compute a histogram for rows

(and a second histogram for columns) by summing

the pixel values of each row (or column). Figure 3(b)

gives an example of a ﬁgure with its corresponding

histogram for the columns. Candidate ﬁre lanes ap-

pear as peaks or plateaus in the histogram with a value

near the maximum, so we normalize the histogram

to its maximum value and apply a high-pass empty

threshold θ

to obtain a candidate set of “blank” rows

(or columns). The maximum value does not necessar-

ily indicate a blank row or column, because there may

be no entirely blank rows or columns. For example,

the green vertical line in Figure 4(a) is the maximum

pixel value sum, but is not blank and is not a good

choice as a ﬁre lane. To address this issue, we ap-

ply a low-pass variance threshold θ

var1

to ﬁlter such

items by their relatively high variances (Figure 4(b)).

We use a second method to detect empty spaces by

applying another, stricter low-pass variance threshold

var2

on rows or columns. The ﬁrst method provides

a wider pass window and the second method is well-

suited to handle ﬁgures with a dark background.

To set the values of the three thresholds, we col-

lected 90 composite ﬁgures (avoiding ﬁgures with

photographs) and ran the splitting step with different

combinations of thresholds against this training set.

Since our goal is just to tune these parameters, we

make a simplifying assumption that ﬁnding the cor-

rect number of sub-images implies a perfect split; that

is, if the number of divided sub-images equals the

correct number of sub-ﬁgures determined manually,

we assume the division was perfect. The reason for

this simplifying assumption is to improve automation

for repeated experiments; we did not take the time

to manually extract perfect splits for each image with

which to compare. Under this analysis, the values

for the thresholds that produced the best results were

= 0.999, θ

var1

= 100, and θ

var2

= 3.

We group neighboring empty pixel-rows or empty

pixel-columns to create empty “ﬁre lanes” as shown

in Figure 3(a). The width of the ﬁre lane is used in

the merge step to determine each sub-image’s near-

est neighbor. Half of each ﬁre lane is assigned to

each of the two blocks; each block becomes a new

image input to be analyzed recursively. Row-oriented

splits and column-oriented splits are alternatively per-

formed, recursively, until no ﬁre lane is found within

a block. The recursion occurs at least two times to

(a) (b)

Figure 4: The identiﬁcation of ﬁre lanes is non-trivial. (a)

Locating ﬁre lanes without applying the variance threshold

var1

leads to an error: Since there are no enitrely blank

columns, the maximum value (highlighted in green) is not

a qualiﬁed ﬁre lane. (b) The disqualiﬁed column is ﬁltered

by applying θ

var1

= 100. Image source: (Hong et al., BMC

Genetics, 13:78).

Table 1: The features used to classify sub-images as ei-

ther standalone sub-ﬁgures or auxiliary fragments. We used

k = 5 for our experiment; thus the feature vector consists of

15 elements. We achieved classiﬁcation accuracy of 98.1%,

suggesting that these geometric and whitespace-oriented

features well describe the differences between the two cate-

gories.

Area Ratio =

Area

sub-image

Area

composite ﬁgure

Height Ratio =

Height

sub-image

Height

composite ﬁgure

Width Ratio =

Width

sub-image

Width

composite ﬁgure

Aspect Ratio =

Hight

sub-image

Width

sub-image

Blank Coverage =

sum of pixels in blank rows and columns

total number of pixels

for i 0..k:

Percent blank rows in horiz. section i =

num. of blank rows in i

Height

for j 0..k:

Percent blank columins in vert. section j =

num. of blank columns in j

Width

ensure both orientations are computed at least once.

Different initial splitting orientations can result

in different ﬁnal divisions, so the splitting algorithm

is performed twice: once beginning vertically and

once beginning horizontally. We individually execute

merging for the two results and automatically evalu-

ate the merging results in step 3. The split with higher

score is taken as the ﬁnal decomposition.

3.2 Step 2: Merging

The merging algorithm receives the splitting result

as input and then proceeds in two substeps: First,

we use an SVM-based classiﬁer to distinguish stan-

dalone sub-ﬁgures representing meaningful visualiza-

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

tions from auxiliary blocks that are only present as

annotations for one or more standalone sub-ﬁgures.

Second, we recursively merge auxiliary blocks, as-

signing each to its nearest block, until all auxiliary

blocks are associated with one (or more) standalone

sub-ﬁgures. We refer to this process as hierarchical

merging. If two neighboring blocks have incongruent

edges, a non-convex shape may result. In this case,

we perform T-merging: we search nearby for sub-

ﬁgures that can ﬁll the non-convexity in the shape.

We will discuss the details of the classiﬁer, Hierarchi-

cal Merging, and T-Merging in this section.

3.2.1 Training SVM-based Binary Classiﬁer

Figure 6 shows an example of an intermediate state

while merging, consisting of 18 sub-images from the

composite ﬁgure. Sub-images labeled (D, F, H, J, N,

O, Q, R) are classiﬁed as standalone blocks. All oth-

ers are classiﬁed as auxiliary blocks. The goal of the

merging algorithm is to remove auxiliary blocks by

assigning them to one or more standalone blocks. To

recognize auxiliary blocks, we extract a set of features

for each block and train an SVM-based classiﬁer. The

features selected are based on the assumption that the

authors tend to follow implicit rules about balanc-

ing image dimensions and distributing empty space

within each ﬁgure, and that these rules are violated for

auxiliary annotations. To describe the dimensions of

the block, we compute proportional area, height and

width relative to that of the original image, as well as

the aspect ratio. To describe the distribution of empty

space, we use the same thresholds from the splitting

step to locate entirely blank rows or columns and then

compute the proportion of the total area covered by

the pixels of these blank elements. We do not con-

sider the overall proportion of empty pixels, because

many visualizations use an empty background — con-

sider a scatter plot, where the only non-empty pixels

are the axes, the labels, the legend, and the glyphs. As

a result, blank rows and columns should be penalized,

but blank pixels should not necessarily be penalized.

Blank coverage alone does not sufﬁciently penal-

ize sub-ﬁgures that have large blocks of contiguous

empty space; a pattern we see frequently in auxiliary

sub-images. For example, an auxiliary legend offset

in the upper right corner of a ﬁgure will have large

contiguous blocks of white space below and to the

left of it. To describe these cases where empty space

tends to be concentrated in particular areas, we divide

each sub-image into k equal-size sections via horizon-

tal cuts and another k sections via vertical cuts. We

then extract one feature for each horizontal and verti-

cal section; 2k features total. Each feature f

is com-

puted as the proportion of blank rows in section i. To

0.5217

0.0435

0.0943 0 0.5094 0.1321 0.1132

Blank Horizontal Section Percentage

Blank Vertical Section Percentage

Figure 5: Blank coverage according to blank rows (red)

and blank columns (green). We divided the image into 5

sections horizontally and vertically. In each section, we

{computed the percentage of blank-row or blank-column

respectively. The 10 vectors form a portion of image fea-

ture. Image source: (Kapina et al., PLoS ONE, 6).

determine a suitable k, we experimented with differ-

ent values from k = 0 to k = 10 on the training data

and set k = 5 based on the results. In this paper, we do

not consider further optimizing this parameter. With

the combination of the dimensional features and the

empty-space features, we obtain a 15-element feature

vector for each sub-image. These features are sum-

marized in Table 1.

As an example of how these features manifest in

practice, consider Figure 5. This image has 26.6%

blank coverage; blank columns are colored green and

blank rows are colored red. As with most visualiza-

tions in the literature, the overall percentage of blank

pixels is very high, but the percentage of blank rows

and columns is relatively low. We divide the image

into horizontal and vertical sections as indicated by

the green and red dashed lines. The decimals indicate

percentages of blank row or column of their nearby

sections.The complete 15-element feature vector of

this image is {1, 1, 1, 0.4272, 0.2656, 0.5217, 0, 0,

0, 0.0435, 0.0943, 0, 0.5094, 0.1321, 0.1132}.

To evaluate our classiﬁer, we collected another

corpus containing 213 composite ﬁgures from the

same source for training independence. The splitting

algorithm was used to produce 7541 sub-images from

the corpus. For evaluation, we manually classiﬁed a

set of 6524 standalone sub-images and 1017 auxiliary

sub-images. We used LibSVM (Hsu et al., 2010) and

set all parameters as default to train the model. Ta-

ble 2 shows the performance of the image features for

merging determination by training an SVM-based bi-

nary classiﬁer through 10-fold cross-validation on the

Table 2: Classiﬁcation accuracy calculated by 10-fold

cross-validation.

Class Correct Incorrect Accuracy (%)

Auxiliary 6482 42 99.4

Standalone 917 100 90.2

Total 7399 142 98.1

DismantlingCompositeVisualizationsintheScientificLiterature

A(1,1)

C(2,1)

D(2,2)

F(2,4)

E(2,3)

B(1,2)

G(3,1)

H(3,2)

K(4,1)

M(5,1)

N(5,2)

Q(6,2)

P(6,1)

B(1,2)

A(1,1)

L(4,2)

K(4,1)

D(2,2)

C(2,1)

F(2,4)

E(2,3)

H(3,2)

G(3,1)

J(3,4)

I(3,3)

N(5,2)

M(5,1)

O(5,3)

Q(6,2)

P(6,1)

R(6,3)

I(3,3)

L(4,2)

J(3,4)

O(5,3)

R(6,3)

Figure 6: The tree structure of a decomposition. This multi-

chart image was split using column-orientation. The result

of the splitting step can form a tree structure. The numbers

in parentheses present the sections in each splitting level

that the block belongs to. For instance, H(3, 2) refers that

block H is in the third section when the original multi-chart

ﬁgure is split. Then it is the second sub-section when the

third section is split again. With the assistance of the classi-

ﬁer, we color standalone blocks by blue and auxiliary blocks

by red. Image source: (Botella-Soler et al., PLoS ONE, 7).

Acceptable

S A

Forbidden

To The Next Level

S S S

To T-Merging

Forbidden

(a) (b) (c) (d) (e)

Figure 7: Examples of Hierarchical Merging. In all cases,

the goal is to merge all auxiliary blocks, (labeled A) into

standalone blocks (labeled S). Each merge operation is in-

dicated by a white arrow. (a) An acceptable merge. The

new block is the smallest rectangle that covers both merging

blocks. (b) Two different merge paths that lead to the same

result. (c) Another case of acceptable multi-merging. (d)

This merging is forbidden because after merging the aux-

iliary into the standalone block the resulting shape is non-

rectangular. The operation only involves the blocks with

yellow outline. Once the local merging in this level is com-

pleted, it repeats again in the next level, which will involve

the very right standalone block. (e) Another case of forbid-

den merging because of the same reason. After completing

Hierarchical Merging, the residual auxiliary blocks will be

handled by T-Merging.

sub-image set. The classiﬁcation accuracy is 98.1%.

3.2.2 Hierarchical Merging

The result of the splitting step forms a tree structure

as shown in Figure 6. Merging starts from the leaves

of the tree. All leaves can only merge with other

leaves in the same level. After completing all possible

merges among siblings, we transfer the newly merged

block to their parents’ level as new leaves. Hierarchi-

cal merging stops after it ﬁnishes merging in the top

level.

In each level, we re-run the classiﬁer to determine

auxiliary blocks. We then induce a function on the

set of blocks, assigning each block to its nearest ad-

jacent neighbor called its merge target. A block and

its merge target are called a merge pair. Under the

assumption that the width of a ﬁre lane indicates the

strength of relationship between the two blocks, the

merge target for a block is the adjacent neighbor with

the narrowest lane between them. Figure 7(a) shows

a combination of two blocks. The new block is the

smallest rectangle that covers both merging blocks.

Only adjacent blocks are allowed to merge together.

If there are two or more qualiﬁed blocks, we break

the tie by the shortest distance between centroids of

blocks.

If after merging all auxiliary blocks at one level

of the tree we ﬁnd that the resulting shape is non-

rectangular, we attempt to apply T-merging or pass

the result on to the next higher level. For example,

Figures 7(b) and 7(c) can be merged since the result

is rectangular. Figures 7(d) and 7(e) are forbidden. In

this case, the merging of the block pair is skipped and

the auxiliary block is labeled as standalone and pro-

cessed using T-Merging, described next. We repeat

the local merging until the statuses of all blocks are

standalone. We then pass these blocks up to the next

level, reclassify them, and repeat the local merging

again.

3.2.3 T-Merging

T-Merging handles residual auxiliary blocks ignored

in Hierarchical Merging. These are usually shared ti-

tles, shared axes or text annotations that apply to mul-

tiple sub-ﬁgures; e.g., Figure 7(d) and Figure 7(e). As

shown in Figure 8, merging the auxiliary block 1 (“the

legacy”), to any adjacent standalone block generates a

non-rectangular shape. We deﬁne block 2 and block 3

as legatees

to proportionally share block 1. We ﬁnd

the set of legatees by the following procedure: For

each edge e of the legacy, ﬁnd all blocks that share

any part of e and construct a set. If merging this set

as a unit produces a rectangle, then it is a qualiﬁed

set. If multiple edges produce qualiﬁed sets, choose

the edge with t the narrowest ﬁre lane. In Figure 8,

only the set consisting of block 2 and 3 satisﬁes the

above criteria; blocks 4, 5, 6 are not proper legatees.

Figure 9 illustrates the evolution from a source image

through Hierarchical Merging to its T-merged output.

meaning “those who will receive the legacy”

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

(a)

Splitting

Hierarchical Merging

(LV 2 to LV 1)

T-Merging

Hierarchical Merging

(LV 4 to LV 3)

(b) (c) (d) (e)

Figure 9: (a) Composite ﬁgure. (b) Splitting result. (c) Intermediate state of Hierarchical Merging after completing level 3.

(d) Hierarchical Merging result. The very-top block and the middle block require T-Merging. (e) T-Merging result. Image

source: (Botella-Soler et al., PLoS ONE, 7).

block 1

block 2

block 5 block 6

block 3

block 4

block 2

block 5

T-Merging

block 6

block 3 block 4

SSS

Figure 8: Examples of T-Merging. The legacy (block 1) is

marked by white color in text. According to our algorithm,

only block 2 and block 3 are qualiﬁed to share block 1.

3.3 Step 3: Selecting

The splitting and merging steps may produce differ-

ent results from different initial splitting orientations

(Figure 9). Step 3 scores the two different results

and selects the one with higher score as the ﬁnal out-

put. Under the assumption that authors tend to follow

implicit rules about balancing image dimensions, the

decomposition that produces more sub-images with

similar dimensions is given a higher score. To cap-

ture this intuition, we deﬁne the scoring function as

decomposition

= 4

∑

i∈blocks

− 2α

∑

i, j∈Pairs

− l

|+|l

e f t

− l

e f t

where A

is the area of the corresponding block,

α is a penalty coefﬁcient and l

is the length of

the top edge of block i (respectively, le f t). Each el-

ement of the set Pairs is a pair of blocks i, j, where

j is the block that has the most similar dimensions

to i for i 6= j. The two coefﬁcients normalize the two

terms to the full perimeter. The formula enforces a ge-

ometric property of composite ﬁgures: The ﬁrst term

obtains its maximum value when all blocks are equal

(b)

(a)

Figure 10: Results of different initial splitting orientations.

(a) Split begins from horizontal (initially row-oriented), and

a lower score due to mismatched elements. (b) Split begins

from vertical (initially column-oriented), and a higher score.

Image source: (T

urumen et al. PLoS Genetics, 5).

in size. The second term subtracts the difference be-

tween each block and its most similar neighbor to re-

ward repeating patterns and penalize diversity in the

set. The penalty coefﬁcient weights the importance

of dimensional difference. We assigned α = 1 in our

experiment.

DismantlingCompositeVisualizationsintheScientificLiterature

4 EXPERIMENTAL EVALUATION

In this section, we describe experiments designed to

answer the following questions: a) Can our algorithm

be used to estimate visualization diversity, a weaker

quality metric sufﬁcient for many of our target appli-

cations? (Yes; Table 3) b) Can our algorithm effec-

tively extract correct sub-ﬁgures, a stronger quality

metric? (Yes; Table 3) c) Could a simpler method

work just as well as our algorithm? (No; Table 3) d)

Is step 3 of the algorithm (selection) necessary and ef-

fective? (Yes; Figure 11) e) Where does the algorithm

make mistakes? (Figure 14)

The corpus we used for our experiments was col-

lected from the PubMed database. We selected a ran-

dom subset of the PubMed database by collecting all

tar.gz ﬁles from 188 folders (from //pub/pmc/ee/00 to

//pub/pmc/ee/bb); these ﬁles contain the pdf ﬁles of

the papers as well as the source images of the ﬁgures,

so ﬁgure extraction was straightforward. In order to

ﬁlter non-ﬁgure images such as logos, banners, etc.,

we only used images of size greater than 8KB. We

manually identiﬁed the composite ﬁgures (we have a

classiﬁer that can recognize multi-chart ﬁgures, but

for this experiment we wanted to avoid the additional

dependency) and divided them into a testing set and a

training set. We trained the classiﬁer and performed

cross-evaluation with the training set, reserving the

test set for a ﬁnal experimental evaluation. The test-

ing set S for the experiments contains 261 compos-

ite ﬁgures related to biology, biomedicine, or bio-

chemistry. Each ﬁgure contains at least two different

types of visualizations; e.g., a line plot and a scatter

plot, a photograph and a bar chart, etc. We ignored

multi-chart ﬁgures comprised of single-type ﬁgures

in this experiment for the convenience of evaluation,

described later in the ﬁrst question. We evaluated per-

formance in two ways: (1) type-based evaluation, a

simpler metric in which we attempt to count the num-

ber of distinct types of visualizations within a single

ﬁgure, and (2) chart-based evaluation, a stronger met-

ric in which we attempt to perfectly recover all sub-

ﬁgures within a composite ﬁgure.

Can Our Algorithm Be Used to Estimate Visual-

ization Diversity? The motivation for type-based

evaluation is that some of our target applications in

bibliometrics and search services need only know the

presence or absence of particular types of visualiza-

tions in each ﬁgure to afford improved search or to

collect aggregate statistics — it is not always required

to precisely extract a perfect sub-ﬁgure, as long as we

can tell what type of ﬁgure it is. For example, the

presence or absence of an electrophoresis gel image

appears to be a strong predictor of whether the paper

is in the area of experiemental molecular biology; we

need not differentiate between a sub-ﬁgure with one

gel and a sub-ﬁgure with several gels. Moreover, it is

not always obvious what the correct answer should be

when decomposing collections of sub-ﬁgures of ho-

mogeneous type: Part of Figure 14(e) contains a num-

ber of repeated small multiples of the same type — it

is not clear that the correct answer is to subdivide all

of these individually. Intuitively, we are assessing the

algorithms’ ability to eliminate ambiguity about what

types of visualizations are being employed by a given

ﬁgure, since this task is a primitive in many of our

target applications.

To perform type-based evaluations we label a test

set by manually counting the number of distinct visu-

alization types in each composite ﬁgure. For exam-

ple, Figure 3 has two types of visualizations, a line

chart and a bar chart; Figure 6 also has two types of

visualizations, a line chart and an area chart; Figure

10(a) also has two types of visualizations, bar charts

and electrophoresis gels. We then run the decompo-

sition algorithm and manually distinguish correct ex-

tractions from incorrect extractions. Only homoge-

neous sub-images — those containing only one type

of visualization — are considered correct. For exam-

ple, the top block in Figure 10(a) is considered cor-

rect, because both sub-ﬁgures are the same type of vi-

sualization: an electrophoresis gel image. The bottom

two blocks of Figure 10(a) are considered incorrect,

since each contains both a bar chart and a gel.

Using only the homogeneous sub-images (the het-

erogeneous sub-images are considered incorrect), we

manually count the number of distinct visualization

types found for each ﬁgure. We compare this number

with the number of distinct visualization types found

by manual inspection of the original ﬁgure. For exam-

ple, in Figure 10(a), the algorithm produced one ho-

mogeneous sub-image (the top portion), so only one

visualization type was discovered. However, the orig-

inal image has two distinct visualization types. So our

result for this ﬁgure would be 50%.

To determine the overall accuracy we de-

ﬁne a function diversity : Figure → Int as

diversity( f ) = |{type(s) | s ∈ decompose( f )}|,

where decompose returns the set of subﬁgures and

type classiﬁes each subﬁgure as a scatterplot, line

plot, etc. The ﬁnal return value is the number of

distinct types that appear in the ﬁgure. We then

sum the diversity scores for all ﬁgures in the cor-

pus. We compute this value twice: once using our

automatic version of the decompose function and

once using a manual process. Finally, we divide the

total diversity computed autoamtically by the total

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

diversity computed manually to determine the overall

quality metric. The automatic method is not generally

capable of ﬁnding more types than are present in the

ﬁgure, so this metric is bounded above by 1. In our

experiment, we obtained the diversity score of 591

and 640 respectively from automatic decomposition

and manual process. The accuracy by this metric is

therefore 92.3%.

Can Our Algorithm Effectively Extract Correct

Sub-ﬁgures? For chart-based evaluation, we at-

tempt to perfectly extract the exact subﬁgures found

by manual inspection, and measure precision and re-

call. For instance, Figure 3, Figure 6, and Figure

10(b) contain 5, 8, and 6 sub-ﬁgures respectively. To

obtain ground truth, we manually extracted 1534 vi-

sualizations from the entire image set S; about 5.88

visualizations per composite ﬁgure on average. In this

experiment, a sub-image that includes exactly one vi-

sualization is deﬁned as correctly extracted; excep-

tions are described next. However, a sub-image that

crops a portion of a visualization (e.g., Figure 14(a),

bottom), or includes only auxiliary annotations (e.g.,

Figure 14(b), bottom right), or includes two or more

visualizations (e.g., Figure 14(b), top left) is consid-

ered incorrect. These criteria are more strict than

necessary for many applications of the algorithm; for

example, partial visualizations or visualizations with

sub-structure will often still be properly recognized

by a visualization-type classiﬁer and can therefore be

used for analysis. However, this metric provides a

reasonable lower bound on quality.

We make an exception to these criteria: We con-

sider an array of photographic images to be one vi-

sualization. This exception is to ensure that we do

not artiﬁcially improve our results: The algorithm is

very effective at decomposing arrays of photos, but it

is not obvious that these arrays should always be de-

composed; the set is often treated as a unit. In this

analysis, we also ignore cases where an auxiliary an-

notation is incorrectly assigned to one owner instead

of another. The reason is that we ﬁnd a number of

ambiguous cases where “ownership” of an auxiliary

annotation is not well-deﬁned.

We deﬁne a notion of recall and precision based

on these correctness criteria. To compute recall, the

number of correct sub-images returned by the algo-

rithm is divided by the number of correct sub-ﬁgures

manually counted in the corpus using the same crite-

ria for correctness. To compute precision, we divide

the number of correct sub-images returned by the al-

gorithm by the total number of extracted sub-images.

Our algorithm achieves recall of 67.5% and precision

of 80.8%. In addition, the percentage of ﬁgures that

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Final Score by Appying Initial Row-oriented Split

Final Score by Appying Initial Column-oriented Split

Select initial row-oriented split

Select initial column-oriented split

Imperfect Decomposition

Wrong Selection

Correct Selection

Same Result from Two

Initial Splitting Orientations

Perfect Decomposition

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Figure 11: Step 3 of the algorithm, selection, makes correct

decisions. Circles represent perfectly decomposed ﬁgures

and crosses represent imperfectly decomposed ﬁgures. This

scatter plot illustrates that ﬁgures with perfect decomposi-

tion mostly distribute near the line of slope 1, indicating

similar solutions were found by our decomposition algo-

rithm regardless of the starting orientation. The selection

step deals with the the grey plots and the red plot, which are

composite ﬁgures that have different outputs from the two

initial splitting orientations. Only one mistaken selection

was made and 95.9% accuracy was achieved.

are perfectly decomposed — the right number of cor-

rect images and no incorrect images — is 57.9%. Ta-

ble 3 summarizes the chart-based evaluation in more

detail. Later in this section we will analyze the mis-

takes made by the algorithm.

Does a Simpler Method Work Just as Well? For

comparison, we measured the performance of our al-

gorithm relative to a simpler split-based algorithm.

Here, we modiﬁed our splitting step (Section 3.1) to

make it more viable as a complete algorithm. As pre-

sented, our splitting step may produce a large number

of auxiliary fragments that need to be merged (e.g.,

Figure 9(b)). But a reasonable approach would be to

cap the number of recursive steps and see if we could

avoid the need to merge altogether. We use two recur-

sive steps — once for vertical and once for horizontal.

Also, as a heuristic to try and improve the results, we

discarded ﬁre lanes with width less than 4 pixels for

the same purpose because most lanes between aux-

iliary fragments or between auxiliary fragments and

effective sub-ﬁgures are relatively narrow.

Our results show that this splitting-only algorithm

extracted 833 correct sub-images and achieved 54.3%

recall and 53.1% precision. Only 16.1% of the orig-

inal composite ﬁgures were decomposed perfectly

into exact sub-ﬁgures without any errors. By both

DismantlingCompositeVisualizationsintheScientificLiterature

9 9

4 4 4 4

2 2

25%

82%

39%

76%

71%

50%50%

68%

0%0%0%0%0%0%0%0%0%0%

60%

20%

40%

60%

80%

125

146

140

100

134

(a) (b)

Number of Sub-figures

Accuracy

Composite Figures

Number of Sub-figures

Overall Sub-Figures

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 20 21 34

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 20 21 34

100

150

Accuracy of Perfect Decomposition

Imperfect Decomposed Figures

Perfect Decomposed Figures

Unextracted Sub-figures

Extracted Sub-figures

Figure 12: (a) Histogram of perfectly decomposed and imperfectly decomposed ﬁgures. (b) Histogram of extracted sub-

ﬁgures. Our decomposition algorithm performed better for composite ﬁgures with lower number of sub-ﬁgures. Entanglement

and over-merging are common issues for images of densely packed sub-ﬁgures.

measures, this simpler method performs signiﬁcantly

worse despite optimizations (Table 3).

Is Step 3 of the Algorithm (Selection) Useful and

Effective? To evaluate the utility of our selection

step, we manually compared the two outputs of dif-

ferent splitting orientations before our algorithm au-

tomatically chose one. There are 237 ﬁgures that

have the same results from the two initial splitting

orientations. For the remaining 24 ﬁgures that re-

quire selecting algorithm, our selection algorithm cor-

rectly chose the better output for 23 ﬁgures, 11 from

initial column-oriented split and 13 from initial row-

oriented split. Figure 11 shows an overview of all se-

lection scores as computed by the formula in Section

3.3. Each point denotes a composite ﬁgure. Circles

are ﬁgures decomposed perfectly and crosses are ﬁg-

ures decomposed imperfectly. Figures with perfect

decomposition mostly appear near the line of slope

1, indicating that our decomposition algorithm often

ﬁnds similar solutions for regardless of the starting

orientation. However, for points where one score is

different than the other, we conclude that the selec-

tion step plays an important role.

Where Does the Algorithm Make Mistakes? To

understand the algorithm’s performance more deeply,

we considered whether the complexity of the initial

ﬁgure had any effect on the measured performance.

Figure 12(a) shows a histogram of composite ﬁgures,

where each category is a different number of sub-

ﬁgures. The dark portion of each bar indicates the

proportion of composite ﬁgures that were perfectly

decomposed. The curve, which shows the accuracy

of perfect decomposition, decays signiﬁcantly as the

number of sub-ﬁgures increases; the algorithm tends

to perform signiﬁcantly better on ﬁgures with eight or

fewer sub-ﬁgures.

Figure 12(b) is a histogram of the total number

of sub-ﬁgures extracted from each category, regard-

less of whether or not the entire ﬁgure was perfectly

decomposed. The black dotted line divides the cat-

egories into two subsets. The right subset, compris-

ing composite ﬁgures containing nine or more sub-

ﬁgures, includes only 15.0% source ﬁgures but con-

tributes 61.7% of the unextracted sub-ﬁgures (i.e., the

ﬁgures the algorithm failed to properly extract). Thus,

a relative low recall of 42.1% was obtained in this

subset (Table 3). In the left subset, comprising 222

composite ﬁgures with eight or fewer sub-ﬁgures, the

recall was greatly increased to 80.1% (Table 3). The

two bar charts both show a better performance on

composite ﬁgures with lower sub-ﬁgure populations.

Table 4 summarizes causes of decomposed er-

rors. Merging errors include over-merging (47), or-

phan fragment (27), and mistaken merging target

(11). Over-merging means that the algorithm mistak-

enly merges two or more sub-ﬁgures together, and or-

phan fragment implies a missed opportunity to merge

a block containing only auxiliary annotations to it’s

nearby sub-ﬁgure. The two causes are both due to the

misclassiﬁcation by our classiﬁer. Mistaken merging

target mostly occurs with tables, diagrams and photo

arrays with inner separation wider than nearby ﬁre

lanes. Splitting errors are categorized into entangle-

ment (28), noisy ﬁre lane (10), and insider (1). An

entangled ﬁgure violates our basic assumption of a

grid-based layout. Figure 14(c) shows an example of

entanglement. Noise made by image compression in

empty space causes the same issue. Embedded charts

are also not extractable by our algorithm. Figure 14

shows a collection of imperfectly decomposed exam-

ples.

5 LIMITATIONS AND FUTURE

WORK

The current decomposition algorithm is suitable for

grid-aligned multi-chart visualizations, where there

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

Table 3: Chart-based evaluation. Where S

all

denotes the en-

tire composite ﬁgure set, S

p≤8

denotes the subset of com-

posite ﬁgures containing eight or fewer sub-ﬁgures, and

p>8

denotes the subset of composite ﬁgures containing

nine or more sub-ﬁgures. We compared our main approach

to a splitting-only method based on our splitting algorithm.

The recall and the precision of correct sub-images, as well

as the accuracy of decomposition were signiﬁcantly en-

hanced. Our technique achieved a better performance for

a subset of composite ﬁgures that contains eight or fewer

sub-ﬁgures.

Recall of Precision of Accuracy of

Correct Correct Perfect

Sub-images Sub-images Decomposition

Splitting-only

all

54.3% 53.1% 16.1%

Approach (833/1534) (833/1569) (42/261)

Main Approach

all

67.5% 80.8% 57.9%

(1035/1534) (1035/1281) (151/261)

p≤8

80.1% 85.1% 67.1%

(811/1002) (811/953) (149/222)

p>8

42.1% 68.3% 5.1%

(224/532) (224/328) (2/39)

Table 4: Causes of decomposition errors. We categorized

errors into three types and seven causes. An imperfect de-

composition can regard two or more causes.

Causes of Units

Merging errors

Over-merging 46

orphan fragment 27

Mistaken merging target 11

Splitting errors

Entanglement 28

Noisy ﬁre lane 10

Insider 1

Selecting errors Mistaken selecting 1

exists at least one edge-to-edge ﬁre lane that can boot-

strap the process. Figure 13 shows two examples that

do not satisfy this criterion, and for which our algo-

rithm does not produce a result. Our algorithm is also

ill-suited for arrays of similar sub-ﬁgures for which

it is ambiguous and subjective whether or not they

should be considered as one coherent unit. We chose

to maximally penalize our algorithm by assuming that

every individual element should be considered a sep-

arate sub-ﬁgure.

We are also working to make the classiﬁer used

in the merging phase more robust. Although our cur-

rent binary classiﬁer has achieved 90% recall to rec-

ognize standalone sub-ﬁgures, we still receive an ex-

ponential decay to perfectly divide ﬁgures. For our

corpus, 5.88 charts per image on average gave only

1 − 0.9

5.88

= 46.2% accuracy of perfect decomposi-

tion in expectation. The misclassiﬁcation is mostly

on relatively small sub-ﬁgures. To improve the bi-

nary classiﬁer, we need to consider more features de-

rived from the color information, and from text loca-

tion (via the use of character recognition algorithms).

For the future work, we are working to use the

decomposition algorithm with visualization classiﬁ-

cation techniques to analyze the use of visualization

(a)

(b)

Figure 13: Cases for which the splitting algorithm is not

appropriate. (a) Irregular outer bound of sub-ﬁgures may

form a zigzag ﬁre lane. (b) There is no end-to-end ﬁre lane.

in the literature by domain, by impact, by year, and by

demography. We believe this effort represents a ﬁrst

step toward a new ﬁeld of visiometrics — the analy-

sis of how visualization techniques are used to convey

information in practice.

6 CONCLUSIONS

We have presented an algorithm that automatically

dismantles composite ﬁgures into sub-ﬁgures. The al-

gorithm ﬁrst decomposes an image into several blocks

via spatial analysis. An SVM-based image classi-

ﬁer with an accuracy of 98.1% is used to recognize

effective visualization and auxiliary fragments. For

type-based analysis, a weaker metric suitable for un-

derstanding the diversity of visualization used in the

literature, 78.1% of the composite ﬁgures in our cor-

pus were completely divided into homogeneous, rec-

ognizable visualizations. For chart-based analysis, a

stronger metric suitable for extracting the original vi-

sualizations exacty, we successfully extracted 67.5%

of the sub-images from 261 composite ﬁgures. For

the 85% of images in the corpus containing eight or

fewer sub-ﬁgures, a better performance of 80.1% re-

call was achieved. With this technique, we are now

poised to unlock the content of multi-part composite

ﬁgures in the literature and make it available for use in

advanced applications such as bibliometrics and aca-

demic search.

ACKNOWLEDGEMENTS

The authors wish to thank the authors of the papers

from which we drew the examples in this paper. This

work is sponsored in part by the National Science

Foundation through S2I2 award 1216879 and and

IIS award III-1064505, the University of Washington

eScience Institute, and an award from the Gordon and

Betty Moore Foundation and the Alred P. Sloan Foun-

dation.

DismantlingCompositeVisualizationsintheScientificLiterature

(e) (f)

(a) (b)

Figure 14: Decomposition errors. (a) Mistaken merging target. The chain B in chart (d*) mistakenly merged to the chart (e*)

due to an inner wider stripe. (b) Over-merging and orphan fragment. Diagram A was misclassiﬁed to be auxiliary and caused a

merging error. Furthermore, the label of electrophoresis gels are separated. (c) Entanglement. (d) Insider. (e) Noisy between

the photo arrays is generated during image compression. (f) Our algorithm mistakenly chose the result derived by initial

column-oriented split (left) in selecting step. (* denotes original marks of assignment in the source image.) Image source:

(Toriyama et al., The Journal of cell biology, 175:147-157.), (Taneja et al., PLoS ONE, 5.), (Rodriguez-Jato et al., Nucleic

acids research, 33:4740-4753.), (Neddens and Buonanno, PLoS ONE, 6.), (Jin et al., Nucleic acids research, 39:3724-3734.),

(Gongalsky et al., Nanoscale Research Letters, 7:446).

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

REFERENCES

Bergstrom, C. T., West, J. D., and Wiseman, M. A. (2008).

The Eigenfactor metrics. The Journal of neuroscience

: the ofﬁcial journal of the Society for Neuroscience,

28:11433–11434.

Datta, R., Joshi, D., Li, J., and Wang, J. Z. (2006). Image

Retrieval : Ideas , Inﬂuences, and Trends of the New

Age. ACM Computing Surveys, pages 1–35.

Dean, J. and Ghemawat, S. (2008). Mapreduce: Simpli-

ﬁed data processing on large clusters. Commun. ACM,

51(1):107–113.

Futrelle, R., Kakadiaris, I., Alexander, J., Carriero, C.,

Nikolakis, N., and Futrelle, J. (1992). Understanding

diagrams in technical documents. Computer, 25.

Futrelle, R., Shao, M., Cieslik, C., and Grimes, A. (2003).

Extraction,layout analysis and classiﬁcation of dia-

grams in pdf documents. In Document Analysis

and Recognition, 2003. Proceedings. Seventh Interna-

tional Conference on, pages 1007–1013.

Hsu, C.-w., Chang, C.-c., and Lin, C.-j. (2010). A Practical

Guide to Support Vector Classiﬁcation. Bioinformat-

ics, 1:1–16.

Huang, W., Tan, C., and Leow, W. (2004). Model-based

chart image recognition. In Llads, J. and Kwon, Y.-B.,

editors, Graphics Recognition. Recent Advances and

Perspectives, volume 3088 of Lecture Notes in Com-

puter Science, pages 87–99. Springer Berlin Heidel-

berg.

Huang, W. and Tan, C. L. (2007). A System for Under-

standing Imaged Infographics and Its Applications. In

DOCENG’07: Proceedings of the 2007 ACM Sympo-

sium on Document Engineering, pages 9–18.

Lew, M. S. (2006). Content-Based Multimedia Informa-

tion Retrieval : State of the Art and Challenges. ACM

Transactions on Multimedia Computing, Communica-

tions and Applications, 2:1–19.

Prasad, V., Siddiquie, B., Golbeck, J., and Davis, L. (2007).

Classifying Computer Generated Charts. 2007 Inter-

national Workshop on Content-Based Multimedia In-

dexing.

Savva, M., Kong, N., Chhajta, A., Fei-Fei, L., Agrawala,

M., and Heer, J. (2011). ReVision: Automated Clas-

siﬁcation, Analysis and Redesign of Chart Images. In

UIST ’11, pages 393–402.

Shao, M. and Futrelle, R. (2006). Recognition and clas-

siﬁcation of ﬁgures in pdf documents. In Liu, W. and

Llads, J., editors, Graphics Recognition. Ten Years Re-

view and Future Perspectives, volume 3926 of Lecture

Notes in Computer Science, pages 231–242. Springer

Berlin Heidelberg.

Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A.,

and Jain, R. (2000). Content-based image retrieval at

the end of the early years. Pattern Analysis and Ma-

chine Intelligence, IEEE Transactions on, 22:1349–

1380.

Tuﬂe, E. (1983). The visual display of quantitative infor-

mation. CT Graphics, Cheshire.

West, J. D., Bergstrom, T. C., and Bergstrom, C. T. (2006).

The Eigenfactor Metrics: A Network Approach to As-

sessing Scholarly Journals. College & Research Li-

braries, 71:236–244.

White, T. (2009). Hadoop: The Deﬁnitive Guide: The

Deﬁnitive Guide. O’Reilly Media.

Yokokura, N. and Watanabe, T. (1998). Layout-based ap-

proach for extracting constructive elements of bar-

charts. In Tombre, K. and Chhabra, A., editors,

Graphics Recognition Algorithms and Systems, vol-

ume 1389 of Lecture Notes in Computer Science,

pages 163–174. Springer Berlin Heidelberg.

Zhou, Y. and Tan, C. L. (2001). Learning-based scientiﬁc

chart recognition. In 4th IAPR International Work-

shop on Graphics Recognition, GREC2001, pages

482–492.

Zhou, Y. P. Z. Y. P. and Tan, C. L. T. C. L. (2000).

Hough technique for bar charts detection and recog-

nition in document images. Proceedings 2000 In-

ternational Conference on Image Processing (Cat.

No.00CH37101), 2.

DismantlingCompositeVisualizationsintheScientificLiterature