IMPROVING HTML DATA TABLES NAVIGATION

A Method to obtain Information for Visually Impaired People

Juan Manuel Fernández

, Vicenç Soler

1,2

Dept. Microelectrònica i Sistemes Electrònics, Universitat Autònoma de Barcelona, Cerdanyola del Vallés, Spain

Ciber-BBN: Bioengineering, Biomaterials and Nanomedicine. Campus UAB, Bellaterra, Spain

Jordi Roig

Dept. Microelectrònica i Sistemes Electrònics, Universitat Autònoma de Barccelona, Cerdanyola del Vallés, Spain

Keywords: Web Accessibility, Table navigation, e-integration, Repairing Tools, Disability, WAI, HTML.

Abstract: Nowadays the broad use of the new technologies based on the Web gives facilities to people all over the

world, but for impaired people. This leads us to the field of Web Accessibility and one of the biggest

problems in it is the use of data tables on HTML documents. For disabled users, elements such as these and

their natural bi-dimensional structure make more difficult to navigate than for the rest of the users. In this

paper we present a solution to avoid those difficulties that disabled users find while navigating. The system

we propose is based on the way a non-disabled person visualizes the table contents, but avoiding the

processing of the images which is the natural procedure.

1 INTRODUCTION

One of the more important problems that exist in the

field of Web accessibility is the navigation on tables.

HTML tables have a complex nature that impedes to

visual impaired people to obtain the information

contained in these elements. Moreover, the use of

standards, like World Content Accessibility

Guidelines (WCAG), is very low and therefore the

navigation on tables gets worse.

The correct design of a Web site is an unresolved

matter for the current Web content developers. A

correct use of the Word Wide Web Consortium

standards makes the navigation become easier and

friendly. The access to the information stored in the

Web can be done in a more efficiently way with the

W3C standards. We will talk about the observance

of the specific normative of the World Content

Accessibility Guidelines (WCAG) which was made

by the World Accessibility Initiative. The lack of use

of the WAI’s proposals and guidelines makes the

Web to lose all its capacity to improve the labour

and social integration of all kind of people.

Thus, we propose a solution to the correct data

table’s navigation, taking into account the stardards

of the Web. Thanks to it, we avoid the use of new

languages or specific software to read the

information of a data table as the most of the

solutions. So, the solution consists of a detection of

header’s rows and columns, that allows us offer

information regarding the relationships between the

headers and the cells of the table.

2 USE OF HTML TABLES

The lack of standard’s uses is a very important

problem. The HTML grammar is the base of the

Web, but it is ignored by the Web site’s developers.

One of the most important errors is the use of

HTML to offer visual information rather than CSS.

The use of both technologies allows us to offer, if

necessary, a different visualization of the

information without changing the structure of the

HTML document. A good example of that is the use

of tables to distribute the contents of the Web site.

The layout use of tables is larger than the

expected one when we started the research. Figure 1

shows a comparison over a randomly group of 487

Web pages. The search of these Web documents has

been done in an automatic way. The only condition

was that the Web site had to use the table element.

397

Manuel Fernández J., Soler V. and Roig J. (2008).

IMPROVING HTML DATA TABLES NAVIGATION - A Method to obtain Information for Visually Impaired People.

In Proceedings of the Tenth International Conference on Enterprise Information Systems - HCI, pages 397-400

DOI: 10.5220/0001714703970400

 SciTePress

Use of the HTML Tables

80,11%

19,89%

Data Tables

Layout tables

Figure 1: How the HTML tables are used.

3 HTML DATA TABLES

The bi-dimensional nature of a table offers a lot of

information. But this nature makes necessary to

know the header of the row and the column of a cell

to obtain the maximum information. The helping

tools offer the information contained on a table in a

linear list of elements. In this list, the headers

usually appear just at the beginning and they are

very difficult to remember. Our system offers the

relationship between the headers and the content of

the cell. We use HTML standard elements to

indicate these relationships.

4 PREVIOUS PROPOSED

SOLUTIONS

We can classify the current proposed solutions in

three basic groups: the group where the Web

browsers adapted to offer correct navigation in

tables, new languages and the proposals which try

to modify the content of the document to mark these

relationships.

4.1 Adapted Browsers

This kind of solution has a limited field of use,

because this sort of browsers can only solve one

problem. And also, the user has to learn how to use

it as well as other Web browsers. A very good

example of that is the table browser called EVITA

(Yesilada et al., 2004). Our proposal is totally

independent of the Web browser and the user does

not need to learn to use new software.

4.2 New Languages

All the solutions of this kind have the same problem:

they are not standards and the user needs specific

software to obtain the information offered. Our

proposal is based on the W3C standard. In this way

we can see, for instance, the proposals of Pontelli

and Filepp.

Enrico Pontelli and Tran Cao Son (2002)

propose the use of Domain Specific Language to

express the content of a table. This content is

extracted thanks to the semantics of the information

inside the table.

On the other hand, there exist other languages

XML based that improve the interaction between the

screen reader and the Web site. TTPML (Filepp et

al., 2002) is a language of this kind that offers all the

information to the screen reader in an easy way.

4.3 Header Detection by Means of

Visualization

The Web site developer offers information about the

relationship of the table’s content in a visual way.

We can use this difference between cells to obtain

the header of a table and to relate the different cells.

The first approximation is by means of a visual

recognition after the Web page has been displayed

by the Web browser (Krüpl and Herzog, 2006). This

system has the inconvenient that it is strongly

dependent of the Web browser

The second approximation, where we are, works

with the source of the Web page. The visualization

of the Web document is marked with HTML and

CSS code and we can access to it independently

from the browser. K. Kottapally et al. (2003)

presented a system that implements this proposal.

The application implements a logic system and a

Hidden Markov Model system. The proposal has

very good results but with a very poor test set which

produces that the systems based on rules, like this

one, can fell on a situation of memorization. On the

contrary, our approximation is not based on rules to

avoid this situation. It is based on a Bayes classifier

and it will be explained in the next section.

5 HEADER DETECTION

As we have commented, it is possible to use the

visualization of the different elements of a table to

establish the existing relationships. To offer this

visual information HTML has a group of tags and

attributes that are specific to offer the visual layout.

We have made a study to obtain the use of the

different elements of a table. This study was made

over a set of 107 random data tables and the first

point to observe is the difference of the quantity of

ICEIS 2008 - International Conference on Enterprise Information Systems

398

-0,05

0,00

0,05

0,10

0,15

0,20

align bgcolor char charoff class dir id lang style title valign

Use of the TR attributes

Figure 2: Distribution of the TR attributes.

tables belonging to both classes. The tables without

header are the most of the part, and they are twice as

the tables with headers. This fact makes, that the

dataset was imbalanced and it adds more difficulty

to the learning process.

Figure 3 shows the use of the attributes of the TR

tag. The values are standardized, and a value of ‘1’

is assigned if the value of the first row is completely

different to the rest of values of the table. A value of

‘-1’ means that the value of the first row is the same

at the entire table. The values offered in the Figure 2

are the result of the difference between the average

of the tables with and without headers.

The same process applied to the rows can be

used with the cells. We compare the horizontally

adjacent cells. This result can be compared with the

rest of the rows and we can know if the row is

visually marked with every cell (see Figure 3).

We rejected the idea of comparing all the content

of a cell to obtain difference between cells because

the same visual effect can be done with different

combinations of HTML elements. The lack of

improvement does not justify the growth of

complexity which would suppose the adding in the

system.

Once discussed the attributes we are going to

talk about the system of learning used. We use

Naive Bayes, a method of supervised leaning,

because it is very powerful but at the same time it

has a temporal cost and the complexity is very low.

Also, this method offers the advantage of not being

affected by the unknown values of the elements.

6 LEARNING PROCESS AND

RESULTS

The learning process and the tests have been done

using WEKA (Witten and Frank, 2005). This

application allows us to test our set of tables and

selection of attributes in an easy and quickly way.

Furthermore, it is developed in Java fact that allows

us to use the class, that implements the method

Naive Bayes, in the ACTAW platform (Fernández et

al., 2007). This repairing tool allows us to obtain all

the information contained in a HTML document and

modifies it in an easy way.

We saw the great number of noise that the set of

tables contain, caused by the tables belonging to the

subset of tables without headers. This fact made us

to decide to classify all the tables without

information, like tables without header. It also

complies with one of the premises of the WCAG.

We can see the results on Table 1.

Table 1: Results of the studied Subset.

Class Positive Negative % OK % Wrong

Positive 37 3 92.5% 7.5%

Negativ

1 10 90.91% 9.09%

Total 92.16% 7.84%

This classification was made with a set of

learning of 50 elements and a set of test composed

by 51 elements and without noise. With this studied

distribution, the number of tables classified as tables

with header when they really are not, i.e. false

positive, is very low. Only the 7.84% was false

positive, and the correct classification is 92.15%, a

very high result.

On the other hand, and to corroborate the good

results, the system has been tested by using Cross

Validation. We use 5 subsets for the validation, and

we can see the results in Table 2.

Table 2: Results of Cross Validation.

Class Positive Negative %Ok %Wrong

Positive 68 4 94.45% 5.55%

Negative 9 20 68.97% 31.03%

Total 87.13% 12.87%

The correct classification is 87.12%, it is really

high and the false positive is so low enough. In

Table 3 we can see a comparative of the results

IMPROVING HTML DATA TABLES NAVIGATION - A Method to obtain Information for Visually Impaired People

399

-0,05

0,00

0,05

0,10

0,15

Use of the TD attributes

Figure 3: Distribution of the TD attributes.

Offered by the cross validation and the first test In

both of them the values of g-means (Kubat and

Matwin, 1997), which is the most used measure to

evaluate results in imbalanced datasets, and the area

under the ROC curve are really good.

Table 3: Results of C. Validation and the studied Subset.

C.Validation Studied subset

G-means 0.8071 0.9170

Positive 0.869 0.941 ROC

Area

Negative 0.869 0.941

7 CONCLUSIONS

We have presented a system of learning that allows

us to detect the headers of a table. This tool is

independent of Web browsers and compliant with

the W3C standards. With the presented method, we

can offer the relationship between the header and the

cells under its scope. This is an important

improvement because it means that the content of

the table is not only a list of elements. The table

recovers the bi-dimensional nature and allows the

impaired user to obtain all the information inside the

table.

The proposed solution has been tested with a

heterogeneous set of real Web pages. The selection

of this set was completely random and with it we

can assure that the system does not offer good

results for only a concrete situation. The system

obtains excellent results and improves the results of

the system developed up to now.

REFERENCES

Kubat, M, Matwin, S. Addressing the Curse of Imbalanced

Training Sets: One-Sided Selection. In Proceedings of

the 14th International Conference on Machine

Learning, 1997.

Yeliz Yesilada, Robert Stevens, Carole Goble and Shazad

Hussein. Rendering Tables in Audio: The Interaction

of Structure and Reading Styles. In Proceeding

ASSETS’04, pages 16-23, Atlanta,Georgia, USA,

2004.

Juan Manuel Fernández, Vicenç Soler and Jordi Roig.

Automatic Conversion Tool for Accessible Web. In

Proceedings of the 3rd International Conference on

Web Information Systems and Technologies, pages

459-462. Barcelona, Spain, 2007.

Enrico Pontelli and Tran Cao Son. Planning, Reasoning,

and Agents for Non-visual Navigation of Tables and

Frames. In International ACM SIGCAPH Conference

on Assistive Technologies pages 73-80. Edinburgh,

UK, 2002.

Robet Filepp, James Challenger and Daniela Rosu.

Improving the accessibility of aurally rendered HTML

tables. In International ACM SIGCAPH Conference

on Assistive Technologies pages 9-16. Edinburgh, UK,

2002.

Bernhard Krüpl and Marcus Herzog. Visually Guided

Bottom-Up Table Detection and Segmentation in Web

Documents. In Proceeding of International World

Wide Web Conference, pages 933-934, Edinburgh,

UK, 2006.

K Kottapally, C. Ngo, R. Reddy, E. Pontelli, T.C.Son and

D.Gillan. Towards the Creation of Accessibility

Agents for Non-visual Navigation of the Web. In ACM

Conference Universal Usability, pages 134-141,

Vancouver, Canada, 2003.

Ian H. Witten and Eibe Frank. Data Mining: Practical

Machine Learnign Tools and Techniques 2

Edition.Elsevier, San Francisco, USA 2005. ISBN: 0-

12-088407-0

ICEIS 2008 - International Conference on Enterprise Information Systems

400