Use of the HTML Tables
80,11%
19,89%
Data Tables
Layout tables
Figure 1: How the HTML tables are used.
3 HTML DATA TABLES
The bi-dimensional nature of a table offers a lot of
information. But this nature makes necessary to
know the header of the row and the column of a cell
to obtain the maximum information. The helping
tools offer the information contained on a table in a
linear list of elements. In this list, the headers
usually appear just at the beginning and they are
very difficult to remember. Our system offers the
relationship between the headers and the content of
the cell. We use HTML standard elements to
indicate these relationships.
4 PREVIOUS PROPOSED
SOLUTIONS
We can classify the current proposed solutions in
three basic groups: the group where the Web
browsers adapted to offer correct navigation in
tables, new languages and the proposals which try
to modify the content of the document to mark these
relationships.
4.1 Adapted Browsers
This kind of solution has a limited field of use,
because this sort of browsers can only solve one
problem. And also, the user has to learn how to use
it as well as other Web browsers. A very good
example of that is the table browser called EVITA
(Yesilada et al., 2004). Our proposal is totally
independent of the Web browser and the user does
not need to learn to use new software.
4.2 New Languages
All the solutions of this kind have the same problem:
they are not standards and the user needs specific
software to obtain the information offered. Our
proposal is based on the W3C standard. In this way
we can see, for instance, the proposals of Pontelli
and Filepp.
Enrico Pontelli and Tran Cao Son (2002)
propose the use of Domain Specific Language to
express the content of a table. This content is
extracted thanks to the semantics of the information
inside the table.
On the other hand, there exist other languages
XML based that improve the interaction between the
screen reader and the Web site. TTPML (Filepp et
al., 2002) is a language of this kind that offers all the
information to the screen reader in an easy way.
4.3 Header Detection by Means of
Visualization
The Web site developer offers information about the
relationship of the table’s content in a visual way.
We can use this difference between cells to obtain
the header of a table and to relate the different cells.
The first approximation is by means of a visual
recognition after the Web page has been displayed
by the Web browser (Krüpl and Herzog, 2006). This
system has the inconvenient that it is strongly
dependent of the Web browser
The second approximation, where we are, works
with the source of the Web page. The visualization
of the Web document is marked with HTML and
CSS code and we can access to it independently
from the browser. K. Kottapally et al. (2003)
presented a system that implements this proposal.
The application implements a logic system and a
Hidden Markov Model system. The proposal has
very good results but with a very poor test set which
produces that the systems based on rules, like this
one, can fell on a situation of memorization. On the
contrary, our approximation is not based on rules to
avoid this situation. It is based on a Bayes classifier
and it will be explained in the next section.
5 HEADER DETECTION
As we have commented, it is possible to use the
visualization of the different elements of a table to
establish the existing relationships. To offer this
visual information HTML has a group of tags and
attributes that are specific to offer the visual layout.
We have made a study to obtain the use of the
different elements of a table. This study was made
over a set of 107 random data tables and the first
point to observe is the difference of the quantity of
ICEIS 2008 - International Conference on Enterprise Information Systems
398