Authors:
Rohan Seth
;
Michele Covell
;
Deepak Ravichandran
;
D. Sivakumar
and
Shumeet Baluja
Affiliation:
Google and Inc., United States
Keyword(s):
Data mining, Spatial data mining, Log analysis, Large scale similarity measurement, Search engine queries, Query logs, Census data.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Business Intelligence Applications
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Symbolic Systems
Abstract:
Understanding the backgrounds and interest of the people who are consuming a piece of content, such as a news story, video, or music, is vital for the content producer as well the advertisers who rely on the content to provide a channel on which to advertise. We extend traditional search-engine query log analysis, which has primarily concentrated on analyzing either single or small groups of queries or users, to examining the complete query stream of very large groups of users – the inhabitants of 13,377 cities across the United States. Query logs can be a good representation of the interests of the city’s inhabitants and a useful characterization of the city itself. Further, we demonstrate how query logs can be effectively used to gather city-level statistics sufficient for providing insights into the similarities and differences between cities. Cities that are found to be similar through the use of query analysis correspond well to the similar cities as determined through other
large-scale and time-consuming direct measurement studies, such as those undertaken by the Census Bureau.
(More)