USING CONTENT SYNDICATION TECHNOLOGIES

IN DISTRIBUTING AND PUBLISHING INFORMATION

TO REACH ALL USERS

Serena Pastore

INAF - Astronomical Observatory of Padova, vicolo Osservatorio 5 – 35122 – Padova, Italy

Keywords: Content syndication, RSS specification, ATOM standards, XML technologies, web feeds, Ajax.

Abstract: Content syndication is a widely used method to distribute information as web feeds. It is an easy way of

reaching the greatest number of end users requiring immediate access. Content syndication is essentially

based on XML technology, is easily distributed and possesses a high level of interoperability across

platforms. Both the website providing information with an up-to-date structure regardless of the different

techniques to stored and manage content and the website consuming information benefit from such

technology. Several different specifications and standards have been developed to support syndication, all

used in every context. The paper describes how syndication technology has been used to distribute centrally

located INAF information to each organizational entity’s local Web site. It describes technological choices

done both for producing and distributing the feed to each local website presenting it as a specific section of

the home page. Feeds are produced according to different formats, and technologies and standards used are

specific Web technologies collectively known as Web 2.0 applications.

1 INTRODUCTION

Content syndication technologies are essentially

XML-based technologies (Moller A., et al. 2006)

used to aggregate and distribute information, making

it more accessible to users. Syndication differs only

nominally from publishing methods using dynamic

content management systems (CMS). It includes

standardized protocols which permit the use of a

site’s data in other contexts such as other websites,

browser plug-ins or a separate desktop application. It

is an application of the kind collectively known as

Web 2.0 applications (Murugesan, S. 2007). Web

mashups are also Web 2.0 applications that combine

information and services from several sources.

Mashups could be used with syndication to create an

improved user interface to view data. Information

arrives to the user in a pull method rather then a

push one in a specific format, called feed, which

summarize different content giving only some items

and usually a link to the place where to deepen into

the topic. From the provider side, delivering feeds

usually means providing information wrapped in an

XML-based file. Such file is then shared and

processed by many client applications such as feed

readers and web aggregators as specific software

that collects the feed and visualizes it inside a

specific application or in a single Web page.

However sometimes it is useful not only to design an

application able to produce web content in a feed

format, but also to provide a method to process it

and visualize in an appealing way by using the

interactive web technologies This paper describes

how syndication and rich internet technologies have

been used to distribute information, which, even if

already aggregated in a system (Boccato, C.,

Pastore, S. 2006), needs to be interactively delivered

to specific users. The Italian National Institute for

Astrophysics (INAF, http://www.inaf.it) is

composed of one headquarters and 19 satellite

organizations mainly in Italy. Its Web site is

basically designed to act as a reference source of

astrophysical information for end users. However, a

study has revealed that the majority of users prefer

their local Web sites and do not take advantage

either of the INAF Web site, which they visit only

few times a week, or of other applications such as

feed aggregators. It is therefore necessary to find a

more effective information distribution method.

Syndication and Web 2.0 technologies have thus

been chosen as the way to collect, distribute and

process updated information in order to be viewed

228

Pastore S. (2008).

USING CONTENT SYNDICATION TECHNOLOGIES IN DISTRIBUTING AND PUBLISHING INFORMATION TO REACH ALL USERS.

In Proceedings of the Fourth International Conference on Web Information Systems and Technologies, pages 228-231

DOI: 10.5220/0001518702280231

 SciTePress

from each local Institute’s Web site integrated on the

home page. The main issues involved in this solution

include: 1) the heterogeneous structure of

information that derives from different logical and

physical sources; 2) the existence of different

syndication specifications and standards, each one

having strengths and benefits and 3) the need to

provide an application as a package to be easily

included in each Web site independently of the

technologies used by each webmaster. Standard web

technologies (XHTML, CSS, Javascript, XML) and

web programming languages which are the basis of

the Web 2.0 technology paradigm, have been used to

develop a solution for creating, processing and

publishing information in the form of a feed.

2 PROBLEM DEFINITION

The initial problem is how best to distribute specific

information that, although available on the INAF

Web site, is not being accessed by potential users.

The distribution approach could be divided into

three phases: definition of publishing content,

creation of the feed and its visualization.

2.1 Content Selection

The main data to be published (Figure 1) concerns

different aspects of the Institute and astrophysics in

general and could be practically divided according to

their structure.

Figure 1: The main problem is how to aggregate and

publish information coming from different sources and

display them in a single web feed.

Content may be organized in a relational

database and managed by a LAMP (Linux,

PHP/MySQL) platform (Davis, M.E., Philips, J.A.,

2007). Alternately, content could be organized in an

object database and managed through a CMS such

as the Plone/Zope/Python environment (Boccato, C,

et al. 2006) or it could simply be stored as

XHTML/HTML pages and thus organized

hierarchically by tags. The goal is to create a unique

feed containing items coming from various sources

and distributing it to each institution to reside on

their home pages. Every publishing system

frequently makes available a library of automatic

feed creation tools, and there are many scrapers

which extract web page content and create feeds. In

these ways, each produced feed may be merged with

others to produce a solution, just as happens in many

on-line web site aggregators.

Unfortunately such an approach involves

incompatible feed formats and therefore requires

further processing. The solution is thus to collect all

the information to be published in a unique database

from which data can be extracted to create the

requested feed according to the format. This

approach implies that information is twice

published, but it provides more flexibility in

successive feed processing.

2.2 Feed Creation: Syndication

Specifications and Standards

Content syndication has various implementations

and there is no ruling body. It is therefore necessary

to choose the standard which suits the need. Each

implementation however shares a common logical

structure following an XML syntax. Content is

organized into a so-called “channel”, an entity to

which refers as information provider. Each channel

consists of single chunks of information (the so-

called item), each one possessing attributes (title,

description, a reference to the information, etc.). The

main technology used is RSS. This consists of

specifications developed by specific groups of

interested people (as in the case of RSS version 1.0,

http://web.resource.org/rss/1.0/) or by organizations

(as in the case of RSS version 2.0,

http://www.rssboard.org/). The Atom 1.0

syndication format (http://www.ietf.org/

html.charters/atompub-charter.html) grew out of

RSS and is a standard developed by the Internet

Engineering Task Force (IETF). Moreover,

Microsoft has introduced Simple Sharing Extensions

(SSE) (http://msdn2. microsoft.com/ en-gb/xml/

bb510102.aspx ), which extends the Atom 1.0 and

RSS 2.0 specifications, while Javascript Object

Notation (JSON, http://www.json.org) is a data-

interchange format used by Google as another feed

format. A comparison of the different attributes used

by the main feed specifications is shown in figure 2.

USING CONTENT SYNDICATION TECHNOLOGIES IN DISTRIBUTING AND PUBLISHING INFORMATION TO

REACH ALL USERS

229

Figure 2: A comparison between RSS and Atom formats.

2.2.1 RSS 1.0, RSS 2.0 and ATOM 1

Despite having the same acronym, RSS 1.0 and RSS

2.0 are distinct and incompatible formats. RSS 1.0

stands for RDF Site Summary and incorporates the

Resource Description Framework (RDF,

http://www.w3.org/RDF/) and its tags and attributes

to better describe resources. The basic structure of

RSS 1.0 involves wrapping the entire feed in the

<rdf:RDF> element which contains the

definition, attributes and list of items of a

<channel> (the source of information) and each

item and its attributes specifically described in the

<item>. Specification flexibility allows the use of

metadata to attach information to the feed by

integrating other standards (i.e. the Dublin Core,

http://dublincore.org/) useful for semantic

processing, even if they are a bit verbose. RSS 2.0,

which follows on from various RSS 0.9x

specifications, was developed by Netscape and later

by Useland. It stands for Really Simple Syndication

to emphasize its ease of use. According to this

format, the feed is described inside the <rss> tag

and includes a <channel> metadata with a set of

attributes (which contain more information than in

the previous format) and then the list of items and

their attributes (i.e. standard as link, title and

description metadata and other facilities like

enclosure which allows attachments to be

automatically downloaded, or a <guid> element

that identifies the item uniquely). Finally Atom, as

defined by IETF in the last 1.0 version, is a standard

which defines both a feed representation format (the

Atom Syndication Format, RFC 4287,

http://www.ietf.org/rfc/rfc4287.txt) and an

interaction protocol (the Atom Syndication Format

an internet drafts, http://www.ietf.org/internet-

drafts/draft-ietf-atompub-protocol-17.txt) with

enhanced interoperability. In the Atom format, the

feed is specified by the <feed> metadata that

initially describes the channel (even if it does not

associate it with a specific tag) and its attributes and

then specifies each item inside the <entry> tag.

Most client feed applications deal with each format.

A web application which creates syntactically

corrected and validated feeds following the different

formats, may however guarantee a spread

information delivering.

2.3 Feed Processing

Despite having different standards, feed formats are

XML files and may be managed and processed by

many libraries and tools developed using different

programming languages (i.e. PHP MagPie RSS,

http://magpierss.sourceforge.net/, the Java ROME

https://rome.dev.java.net/, or Python RSS.py

(http://www.mnot.net/python/RSS.py). Many are

distributed as on-line tools (for example, a lot of

scraping tools are used as web aggregators like

xpath2rss, http://freshmeat.net/projects/xpath2rss/)

despite the fact they do not provide a packaged

solution to be delivered to each website. However,

the common underlying concept is the extraction of

information, its formatting according to XML syntax

and the processing and parsing of the visualization

inside a Web page or other application. Focusing on

the visualization inside a Web page, the simplest

way is to include an external feed by pointing to a

RSS parser developed with every language which

processes it and then presents the content according

a specific style through CSS technologies (Schmitt,

C., 2006). The choice of the language and the

platform is subjective.

3 THE DEVELOPED SOLUTION

After the analysis of constraints and issues, the

developed solution has followed the rule of easy

implementation and requires the design of the

content database, the development of a Web

application for producing the feeds and the

establishment of simple visualization procedure as

means of a set of scripts and CSS templates. The

LAMP platform has been chosen for the first two

phases, while other more interactive technologies,

collectively known as the Ajax paradigm (Gross, C.,

2006), have been adopted for the visualization

phase. The MySQL schema design requires more

work to include all the attributes related to each feed

specification needed for successive validation. The

WEBIST 2008 - International Conference on Web Information Systems and Technologies

230

Web application for producing and publishing the

feed has been developed as a Web form user

interface (Figure 3) used by authors to insert content.

Thus the application, taking advantage of the

FeedCreator.class.php tool

(http://www.bitfolge.de/rsscreator-en.html), creates

the three different validated feeds by extracting the

needed information from the database, according to

RSS 1.0, RSS 2.0 and Atom 1.0 formats and

publishes them in a specific directory of the Web

server. Then the feeds may be distributed as they are

to several Web site to be integrated into the

publishing systems or into another application. The

feeds may also be directly integrated into Web pages

using the <link> tag (inside the <head> section of

the HTML page) and the type specification within

the type attribute (i.e.

type=”application/atom+xml” href=”file).

At this point the methods are different for

visualizing the feeds inside each home page (Figure

3) because they make use of specific produced

stylesheets to tailor the presentation.

Figure 3: Interface of the Web application which creates

the feed and examples of feed visualization.

A first implementation uses a static solution of a

simple list by including a PHP RSS parser inside the

webpage and thus simple HTML/PHP code even if it

requires that the target web server supports PHP

language. Other interactive techniques use Javascript

languages both in a synchronous (client-side) or

asynchronous (server-side), allowing scrolling of the

content (Figure 3). Ajax technology in particular

enhances interactivity and usability, since

information is represented, processed and

dynamically displayed according the Document

Object Model (DOM, http://ww.w3.org/DOM) and

JavaScript with a fast message exchange with the

server. An example of this technique is the Google’s

Ajax feed API (http://code.google.com/

apis/ajaxfeeds/), a library which manipulates Atom

or RSS feeds, through an Ajax interface. Its

application requires the creation of a key usable

within all URLs of a site directory where content is

stored and the API invocation is simply included

inside a script tag in the page:

src="http://www.google.com/jsapi?key=AA

A"></script>

google.load("feeds", "1");

</script>

However, the followed solution gives more freedom

to each local webmaster which could decide to

choice the feed format, a visualization solution or

even use his publishing system.

4 CONCLUSIONS

It is necessity that information to be published and

distributed in a simple, straightforward way to reach

as many end users as possible. This need has led

INAF developers to study methods to display

specific information as feeds in the local Web sites

of its organizational entities. Approaching different

specifications nowadays used to describe

syndication, a specific web application has been

developed which creates web feeds from several

content sources following several formats and

allows their dynamic visualization by adopting

interactivity technologies like Ajax used in the web

2.0 paradigm. Moreover other web 2.0 applications

such as web mashups could be integrated with feeds

to guarantee a better user experience.

REFERENCES

Moller A., and Schwartzbach, M., 2006. An Introduction

to XML and Web technologies, Pearson Education.

Murugesan S., 2007. Understanding Web 2.0. IT

Professional, Vol. 9, Issue 4, July-Aug. 07, pp. 34-41.

Boccato, C., Pastore, S., 2006. The Web Information

System of the INAF: different actors contributing to

disseminate information, In Current Research in

Information Sciences and Tech. Multidisciplinary

approaches to global information systems, Volume I,

Open Institute of Knowledge, pp. 507-511.

Davis, M.E., Philips J.A., 2007. Learning PHP & Mysql.

O’Reilly

Schmitt, C., 2006. CSS Cookbook. O’Reilly

Gross, C. Ajax Patterns and Best practices. Apress. 2006.

USING CONTENT SYNDICATION TECHNOLOGIES IN DISTRIBUTING AND PUBLISHING INFORMATION TO

REACH ALL USERS

231