Table 1: Hierarchical Clustering Results.
Books Tags
The Time Traveler's Wife,
Flowers for Algernon
adult, sad, simple, sex
Watership Down, The
Princess Bride
adult, adventure, classic,
entertaining, exciting, humor
The Dark Tower, The Road battle, compelling, dark, epic,
reality, sad, simple
Journey to the Center of the
Earth, 20,000 Leagues
Under the Sea
adventure, classic, deep,
entertaining, exciting, modern,
science, technology
Outlander, Kushiel’s Dart adult, adventure, compelling,
complex, entertaining, epic,
exciting, fantasy, hero, sex,
religion, intriguing, , political
The Complete Chronicles
of Conan, Watchmen
adventure, battle, compelling,
complex, dark, deep,
entertaining, evil, fantasy,
hero, modern, reality, simple,
political
Small Gods, The Book of
the New Sun
epic, fantasy, humor, reality,
religion, simple, small,
technology, sex
Doomsday Book,
Cryptonomicon, Snow
Crash, The Diamond Age
adventure, compelling,
complex, entertaining, sex,
exciting, humor, intriguing,
modern, reality, religion,
science, social, technology
The Mists of Avalon,
American Gods, The Last
Unicorn, The Once and
Future King, The Way of
Kings, Gardens of the
Moon, Dragonflight, ……
adult, adventure, battle,
compelling, complex, dark,
epic, evil, exciting, fantasy,
hero, humor, intriguing, magic,
sad, small
Homeland, Something
Wicked This Way Comes,
Wicked, A Clockwork
Orange, Animal Farm, The
Stand
adult, battle, dark, deep, evil,
simple, social, political
I Am Legend, 1984, The
Handmaid's Tale, Brave
New World, World War Z,
Frankenstein, ……
classic, modern, reality,
religion, sad, science, social,
political, sex
Do Androids Dream of
Electric Sheep?, Contact, A
Canticle for Leibowitz,
Cat's Cradle, , Ender's
Game, Heir to the Empire,
……
alien, battle, classic,
compelling, complex, deep,
entertaining, exciting,
intriguing, reality, religion,
science, small, social, space,
technology, political
Slaughterhouse Five, or the
Children's Crusade,
Hitchhiker's Guide to the
Galaxy, Going Postal, The
Eyre Affair
classic, entertaining, humor,
reality
science fiction). Given the unique classifications of
these two books, we felt it was appropriate that they
remained as a 2-book cluster until one of the last
stages of clustering, where they were eventually
combined with other books like Brave New World,
1984, and Fahrenheit 451: books with relatively
similar faint elements of the science fiction and
fantasy genres.
After observing our clustering results at several
threshold levels, we decided to use clustering results
with a threshold value t=0.75. At this level of
clustering, there were 13 total book clusters, which
are shown in Table 1. Despite the presence of seven
two-book clusters, we believed the similarity
threshold had kept most of these books separate
from the larger clusters for a reason, as in the first
cluster, where both books deviate substantially from
the fairly standard formula of the science fiction
genre.
Previously, in our data collection process, we
collected every review that had been written about
the books on the NPR 100 list. All reviews in our
data set were grouped by user author, which allowed
us to mine each user in the same way we mined
books, looking for weights of the same feature tags
used for book clustering. There are 162 qualified (20
or more reviews on the NPR 100 list) users. Mining
user reviews with the same set of features was a
natural extension of our work in clustering books.
We believed that by mining the text of a user’s
reviews and looking for those same features, we
could make reasonable predictions about the type of
book a particular user tends to read. By performing
the same feature identification for a user, and
looking for a correlation between books they have
read, and books that the computer thinks are related
to books they rated highly, we would be able to
evaluate the performance of our clustering methods.
5 VISUALIZATION METHODS
The visualization of the book review data serves two
purposes: (1) we want to visualize the distributions
of the books and readers over the set of tags to see if
they exhibit natural clustering behaviour; and (2) we
want to see how the books and readers interact and
correlate through their tags coordinates and clusters.
Two visualization techniques are developed: parallel
coordinate views and correlative cluster views.
5.1 Parallel Coordinate Views
Parallel coordinate approach aligns all variables
(dimensions) along the X-axis, and plots the
coordinates of each data element in the Y-direction
as piecewise line segments. The variables in this
case are the 30 keyword tags. Each book or reader
can now be plotted as one piecewise line segments
curve, as shown in Figure 1. Colors can also be used
to depict different clusters coming from the
automatic clustering algorithm. One problem with
parallel coordinate is that when there are a large
IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications
190