Authors:
Komal Dadhich
;
Siri Chandana Daggubati
and
Jaya Sreevalsan-Nair
Affiliation:
Graphics-Visualization-Computing Lab, and E-health Research Center, International Institute of Information Technology Bangalore, Bangalore, India
Keyword(s):
Chart Classification, Chart Segmentation, Chart Image Analysis, Optical Character Recognition, Data Extraction, Text Recognition, Text Summarization, Convolutional Neural Network, Bar Charts, Stacked Bar Charts, Grouped Bar Charts, Histograms.
Abstract:
Charts or scientific plots are widely used visualizations for efficient knowledge dissemination from datasets. However, these charts are predominantly available in image format. There are various scenarios where these images are interpreted in the absence of datasets used initially to generate the charts. This leads to a pertinent need for data extraction from an available chart image. We narrow down our scope to bar charts and propose a semi-automated workflow, BarChartAnalyzer, for data extraction from chart images. Our workflow integrates the following tasks in sequence: chart type classification, image annotation, object detection, text detection and recognition, data table extraction, text summarization, and optionally, chart redesign. Our data extraction uses second-order tensor fields from tensor voting used in computer vision. Our results show that our workflow can effectively and accurately extract data from images of different resolutions and of different subtypes of bar ch
arts. We also discuss specific test cases where BarChartAnalyzer fails. We conclude that our work is an effective and special image processing application for interpreting charts.
(More)