INTEGRATING COMMUNICATION SERVICES

INTO MOBILE BROWSERS

Joachim Zeiß

, Marcin Davies

, Goran Lazendic

, Rene Gabner

and Janusz Bartecki

FTW Telecommunications Research Center Vienna, Vienna, Austria

Kapsch CarrierCom, Vienna, Austria

Keywords:

Convergence, VoIP, Browser-APIs, SIP, IMS, HTML5, Websockets, Real-time Communication.

Abstract:

This paper introduces a novel approach on how to integrate communication services into Web applications

running in the browser. The solution is based on two major design decisions: To resolve the need for a

business-to-business (B2B) relationship between Web provider and communication service provider, and to

distribute the Model, View and Controller components of an application across different processes. Our ap-

proach helps to answer the question on how to efﬁciently integrate network operator’s assets into applications

from over the top (OTT) players. The separation between application control by the Web page and the actual

command execution by the native capabilities of the user device opens new opportunities for global reacha-

bility of telco services, easy deployment and re-deployment of applications with zero conﬁguration need for

users and developers as well as privacy protection by keeping sensitive data within the user domain, e.g. the

user’s communication device.

1 INTRODUCTION

More and more innovative applications created for the

Web are integrating typical telco services. Users in

turn, get accustomed to the business model of the Web

and perceive telecommunication services offered out-

side the web context as reliable and of high quality

although being a bit old fashioned, detached from the

social web community and too technical. Application

developers concentrate on globally marketable prod-

ucts with simple and uniﬁed interfaces. Telcos, even

when operating globally, serve a smaller community

compared to Google, Apple, Facebook or other over

the top service providers. They struggle to unify their

activities to participate in the application business and

to avoid becoming only bit pipes.

Therefore, the following question needs to be an-

swered: How can telco assets be efﬁciently and com-

mercially feasible integrated in applications from over

the top (OTT) players? Or, to put it down in a more

provocative statement: OTT players providing Web

applications use other OTT player communication

technologies to accomplish their services. How can

operators achieve that Web applications from OTT

players and content providers preferably use their

communication services? In order to make this possi-

ble the ”Advanced Prosumer Service Integration Inte-

lligence” project, called APSINT, provides a software

architecture that integrates seamlessly into the mobile

operators network infrastructure.

Telco operators do a good job in reliability, qual-

ity of service, network convergence and interoperabil-

ity when it comes to connecting people by text, voice

and video. On the other hand operators lack in of-

fering their services globally and easy to be used by

Web developers and in delivering simple yet powerful

human interfaces to end-users. The APSINT architec-

ture resolves the need for B2B relationships between

operator and application providers and developers for

them to use operators services. This is done by intro-

ducing the user as a man-in-the-middle between telco

service and Web page. While browsing a Web site,

pages rendered in the users browser will use commu-

nication facilities of the local device but which are

programmed and controlled by Javascript code within

the Web page. By this way the users B2C business re-

lationship to the telco is acting on behalf of a B2B

relationship between Web application and operator.

This separation between control (by the Web page)

and actual execution (on the user’s device) has the ad-

vantages of (i) global reachability of telco services,

(ii) easy deployment, (iii) zero conﬁguration need for

user and developer as well as (iv) privacy protection

as there is no need for the user to share authentica-

753

Zeiß J., Davies M., Lazendic G., Gabner R. and Bartecki J..

INTEGRATING COMMUNICATION SERVICES INTO MOBILE BROWSERS.

DOI: 10.5220/0003934907530762

In Proceedings of the 8th International Conference on Web Information Systems and Technologies (WEBIST-2012), pages 753-762

ISBN: 978-989-8565-08-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

tion credentials with Web pages for 3rd party service

usage.

The remaining paper is organized as follows: Sec-

tion 2 provides an overview over related work in this

area, Section 3 describes our solution and architec-

ture, Section 4 provides implementation details, Sec-

tion 5 discusses our outcome and experiences made

while evaluating the prototype and ﬁnally Section 6

gives an outlook on further work.

2 RELATED WORK

This section aims to introduce and compare existing

solutions to enable mobile browser-based communi-

cation. There are two main approaches to realize

real-time communication via the Web browser. The

ﬁrst one (A) takes advantage of remote communica-

tion services offered by 3rd parties via the Web. In

this case, as mentioned in Section 1, a B2B relation-

ship is needed between the developer of the Web site

and the telco. Approach (B) utilizes communication

capabilities available at the client device (e.g. smart

phone).

As depicted in Figure 1, method (A), the devel-

oper of a Web site uses a well deﬁned Javascript li-

brary to access server side communication features

(e.g Tropo (Tropo.com, 2011)). Real-time commu-

nication is initiated by sending an HTTP request to

the Web server, which establishes a network initiated

call between the two users. Another approach to en-

able media handling in the browser is to use Adobe’s

generic Flash Plugin, which is more ﬂexible as almost

every browser is Flash enabled. This way it is possi-

ble to stream media directly to and from the browser.

However Adobe announced in a blog post (Winokur,

2011) that they will discontinue the development of

their mobile Flash plugin because of the increasing

popularity of HTML5.

3rd

party

Web

Smart

Phone

Smart

Phone

1) request

2a) initiate

2b) initiate

3) established

Web

Browser

Figure 1: Communication initiated remotely (method A).

Looking at method (B) as shown in Figure 2, a

common solution to integrate local telephony fea-

tures into the browser is via plugins for already in-

stalled applications like Skype. This way the user can

access the locally installed application via the Web

Smart

Phone

Smart Phone

2) initiate

3) established

Web Browser

Local App

1) request

speciﬁc Plugin

Figure 2: Communication initiated locally (method B).

browser. Main drawbacks for plugin-based solutions

are: (i) only browsers with the plugin installed are

supported, (ii) media (e.g. voice) cannot be handled

by the browser directly, (iii) the communication soft-

ware has to be installed at the client, and (iv) most

plugins are not available for mobile browsers.

A hybrid solution of (A) and (B) is offered by Sip-

gate (Sipgate.com, 2011). Sipgate is using a browser

plugin to interface with their locally installed soft-

phone, but offers also the integration of SIP based

hardware phones. Thus by using the Sipgate plugin,

it is possible to trigger calls either originated locally,

or remotely at the 3rd party infrastructure at Sipgate.

Integration with the existing communication

provider as for method (A) has the obvious advantage

of an easy way to achieve terminal connectivity, qual-

ity of service and interworking with other services to

provide users with a mature solution. The obstacles

of this approach are resulting as mentioned from the

necessity to enter into a B2B relationship with every

provider who would like to enable browser based real-

time communication for his users. APSINT’s goal

was exactly to remove this obstacle. The proposed so-

lution is generic enough to be applied with different

types of telecommunication architectures, e.g. VoIP.

Special attention was paid to integration with the IMS

architecture. IMS is the most advanced carrier-grade

service delivery architecture which becomes the stan-

dard used by all mobile network operators.

Lately a couple of people pushed the standard-

ization of browser based APIs to access local mo-

bile device capabilities including real time commu-

nication. A team from the Mozilla foundation started

to work on their WebAPI (Mozilla.org, 2011) which

allows access to telephony and messaging APIs via

JavaScript, besides that WebAPI also offers interfaces

to battery status, contacts, camera, ﬁlesystem, ac-

celerometer, and geo-location. WebAPI can control

local communication, but audio is not handled in the

browser.

Furthermore, (Nishimura et al., 2009) suggest a

system that uses an architecture similar to the one pre-

sented in this paper. They also envisioned the pos-

sibility to deploy such software either locally on the

client or remote at a server. However their Web-IMS

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

754

cooperation is based on ﬂash plugins and transcoding

of media. Our solution presented in this paper does

not need any modiﬁcation or additional plugin in a

browser and uses HTML5 instead. Also transcoding

is not necessary in our solution.

Another initiative is W3C WebRTC (Google,

2011), whose main purpose is to enable streamed real-

time communication from and to the browser without

interacting with local device capabilities.

3 OUR SOLUTION &

ARCHITECTURE

This section gives an overview of the APSINT archi-

tecture. Its components and interfaces are depicted

in Figure 3. An APSINT enabled Web site is down-

loaded from Webserver A via a standard HTTP con-

nection (a). A Web browser, running on a smart-

phone, renders and executes a Javascript (JS) as soon

as it is downloaded to the client. The developer of

the Web site can easily integrate real-time commu-

nication by just using JS API calls. The APSINT.js

connects to the endpoint via local Websockets (b).

This communication is transparent to the Web site

developer. There is a 1:1 relationship between Web

browser and endpoint. Only one Web site can con-

nect to the endpoint at the same time. Usually this is

the last page which sent a communication request to

the endpoint. In case of an existing session (e.g. an

active call) the Web site used to initiate/terminate the

call would keep full control over the endpoint. The

phone’s local features are accessed via the endpoint

which uses device speciﬁc APIs (c) e.g. for SMS mes-

saging or circuit-switched (CS) voice calls. An inter-

face (d) connects the endpoint with the Telco operator.

In our case this is realized via SIP/IMS signaling.

3.1 APSINT Protocol

The software architecture of APSINT is shown in

Figure 3, which contains the endpoint component

to link communication between a Web browser and

a signalling stack for real-time communication such

as SIP. In general the endpoint must implement

asynchronous event driven communication between

Javascript on browser side and the SIP stack on the

other side. Browser side communication is interfaced

by Websockets (Fette and Melnikov, 2011) as the

transport protocol for bidirectional message delivery

to and from the SIP stack. Websockets avoid strong

binding of the endpoint to the browser on the one side,

like with browser speciﬁc plugins, and minimize pro-

tocol speciﬁc overhead on the other side, as it is the

Smart Phone A

Webserver

Web Browser

APSINT.js

A.html,

A.css / A.js

Websocket

Client

Endpoint

Stack

e.g.:

SIP

Websocket

Server

Device

MSG

GPS

(a)

(b)

(c)

Telco

Provider

(d)

Figure 3: APSINT system architecture.

case when long polling is used.

The Websockets protocol as proposed in (Fette

and Melnikov, 2011) enables Web browsers to estab-

lish a bidirectional channel to servers by upgrading

a HTTP connection using an initial handshake. In

APSINT the endpoint software component is required

to implement a Websocket server, while the client

side of Websockets is implemented by Web browsers.

As the Websockets protocol is part of the HTML5

standard all major browsers offer a Websocket client.

Hence the APSINT solution beneﬁts from bidirec-

tional connections and the low latency of the Web-

sockets protocol while making browser speciﬁc plug-

ins obsolete.

In the APSINT endpoint the Websocket server and

the SIP stack run independently in their own event

loops. Messages originating from Javascript are re-

ceived by the Websocket server stack and passed over

to the SIP stack in an asynchronous manner. In the

opposite direction SIP events are interpreted by the

SIP stack and passed as messages to the Websocket

server service.

Messages originating from Javascript are

• RESERVE, Web page is registered to the end-

point,

• INITIATE, call is initiated by the Web page,

• MESSAGE, message is sent by the Web page,

• END, call is ended by the Web page.

Messages originating from the SIP stack consist of

• CALL EVENT, triggers various events in a call

session,

• STATUS, forward states of endpoint and SIP stack

to the registered Web page,

INTEGRATINGCOMMUNICATIONSERVICESINTOMOBILEBROWSERS

755

oco

ico

web

page

Figure 4: APSINT Javascript call objects.

• MESSAGE, forward received message by SIP

stack to Web page,

• END, ending call session by remote party.

The APSINT architecture relies on SIP for sig-

nalling purposes. Although other signalling protocols

exist, SIP was adopted by most telephony providers

in connection with IMS. Hence for the APSINT end-

point it is necessary to integrate a SIP stack to make

use of the telco IMS infrastructure. The SIP stack

is an integrated service of the APSINT endpoint run-

ning uncoupled from the Websocket service, to pro-

vide asynchronous event triggering coming from the

SIP layer. Furthermore the SIP stack service controls

the media engine within the APSINT endpoint. The

media engine is responsible for receiving and trans-

mitting RTP media streams, and to play ring tones and

ringback tones when triggered by the SIP stack.

3.2 Javascript Library and API

The APSINT Javascript library and API (apsint.js) is

composed out of a set of object types and their rela-

tions as shown in Figure 4. A Web page would obtain

a single endpoint object (ep) on successfully reserv-

ing the endpoint service. This ep object may be used

to initiate new calls or sending messages. As well,

call backs to be overriden by the Web page will in-

form about incoming calls and messages.

Initiating a new call via the ep object will instan-

tiate a new outgoing call object oco which is handed

over to the Web page. One oco object per call session

will exist. Same as for the ep object the Web page is

responsible to override the callbacks of that object to

get notiﬁed on important call events.

On receiving a new incoming call the APSINT li-

brary will invoke a dedicated callback on the ep object

passing along a newly created incoming call object

(ico). For each incoming call session one ico object

is instantiated. Similar to the oco this object contains

callback methods for notifying the Web page on cer-

tain events and to provide utility methods to answer

or end a call.

In case a callback function in any of the ep, ico

or oco objects is not overriden by the Web page, the

APSINT library will invoke a default implementation

which may lead to automatically accepting or declin-

ing a call or displaying information in a default man-

ner.

The mixer object (mx) is instantiated with the ep

object at the time the Web page grabs the endpoint

and is responsible for coordinating multiple call ses-

sions, keeping track of active calls and other call han-

dling tasks. By setting certain coordination methods

to different (predeﬁned) functions, behavior of how to

deal with multiple session can be inﬂuenced. The ar-

chitecture also gives respect to future enhancements

of the library for toggling between calls or setting up

three-way calls or conferences. Currently options for

declining and reporting new calls to the Web page or

taking over new calls while quitting existing sessions

is implemented. The mixer object does its job by in-

tercepting and manipulating the messages of endpoint

end call objects towards and from the endpoint.

All objects except the mx object contain their own

Websocket (ws in ﬁgure 4) for communication with

the endpoint. Lifetime of the ep, oco and ico ob-

jects is tightly coupled to the lifetime of their Web-

socket. As long as the Websocket towards the end-

point is open the related object exists.

4 IMPLEMENTATION

4.1 Endpoint

The APSINT endpoint is designed as a background

service. As shown in Figure 5 the endpoint imple-

ments three services:

• Websocket Server, for receiving and sending mes-

sages to the Web page and the SIP stack. The

Websocket server is an asynchronous task in non-

blocking mode.

• SIP Stack, for handling SIP protocol messages

and controlling the media engine. The SIP stack

has to be an asynchronous service like the Web-

socket server.

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

756

• Media Engine, for sending/receiving RTP media

streams and for audio playback of ringtones. The

media engine is completely controlled by the SIP

stack.

Media engine

Websocket

Server

SIP Stack

ENDPOINT

Async messaging

Figure 5: Overview of the APSINT endpoint.

A prototype of the APSINT endpoint was imple-

mented for x86 personal computers on Linux and for

Android 2.2 and above. Software and libraries used

for the Linux PC prototype are:

• Soﬁa-SIP (Pessi et al., 2011), a SIP stack to imple-

ment SIP functionality in the APSINT endpoint

software,

• libwebsockets (Green, 2011), a C library imple-

menting the Websockets protocol,

• GStreamer framework for the media engine im-

plementation.

A second prototype for the Android platform utilizes

the following software components:

• IMSDroid (Diop, 2011), an IMS compliant SIP

stack with integrated media engine for Android

phones and tablets.

• Java-Websocket (Rajlich, 2011), a pure Java li-

brary implementing the Websockets protocol both

for servers and clients.

On both platforms the endpoints can handle multiple

SIP sessions and support simple SIP messaging. The

endpoint on Android was extended and is capable to

access system services offered by the phone platform.

As an example, messaging on Android includes SMS

sending and receiving.

4.2 Javascript Library

Any Web page on the system, even across browsers,

may reserve the endpoint. However, only one Web

page at a time may use the endpoint services. The

last one asking for the endpoint may use it, if it is not

busy with serving another Web page. The rules for

reserving the endpoint are:

• If no other Web page has reserved the endpoint

then the actual requesting Web page can use the

endpoint

• If some other Web page has reserved the endpoint

but is no longer present (i.e. the page forgot to

free the endpoint or crashed before freeing) then

the actual requesting Web page can use the end-

point. The endpoint running in a separate process

will detect that the Web page that has reserved it

is no longer present because it lost the Websocket

connection.

• If some other Web page has reserved the endpoint

but is not using it any more, i.e. has no open com-

munication session, then this Web page loses its

reservation. It may regain the endpoint services

by reserving the endpoint again some time later.

• If some other Web page has a communication ses-

sion still running the reservation request will be

denied

The initial call to obtain endpoint services is the

invocation of the reserve function as part of the END-

POINT class. User ID and user credentials may be

passed depending on optional security mechanisms.

In addition, any local or remote Websocket URL to-

wards the endpoint can be conﬁgured. If not provided

a default local address will be used. However, the

website must provide an activation callback for reser-

vation. Once connection with the endpoint is estab-

lished this function is called by the APSINT library

providing the target function with the actual endpoint

object (ep as depicted in ﬁgure 4).

The ep object may in turn create ico and oco, in-

coming and outgoing call objects depending on who

is the call originator (ico for incoming calls, oco for

self initiated outgoing calls). The three objects have a

common basic structure, which is related to the need

of communicating independently with the APSINT

endpoint. This is guaranteed by each object instance

using its own Websocket. The following list shows

common and distinct functions of the APSINT com-

munication objects:

Common functions for ep, ico and oco objects

are:

• parseEvent and dispatchEvent for message

and event handling in interaction with the end-

point

• makeWebSocket used by the object creation fac-

tory to obtain and connect the instance to a Web-

socket

• sendMSG for sending a (SIP or SMS) message

INTEGRATINGCOMMUNICATIONSERVICESINTOMOBILEBROWSERS

757

• onDestroy - is called when the object is about

to be released by the library or if the Websocket

connection went down.

Functions of the ep object only are:

• initiateCall to make a new call (session) pro-

viding address string media (audio and/or video)

and a notiﬁcation callback which will deliver the

new oco call object for the session once the end-

point started to process the connection

• sendMessage to send a text message via SIP or as

SMS (Android only)

• onIncomingCall invoked on being called via au-

dio or video providing the new ico object used to

deal with the new connection

• onMessage called if a SIP Message message or an

SMS was received

• onStatusChange used to communicate status

changes of the endpoint

Functions common for both ico and oco objects:

• endCall to take down the communication session

regardless of its current state

• onCallEnded to inform about the termination of

he session by the communication partner at any

point in time

• onError to inform about call related errors

• audio and video prepared for future releases to

turn on/off media during the session

Functions for ico objects only:

• answerCall used to acknowledge the incoming

call request leading to immediate call establish-

ment

• onRinging to inform the Web page that the call

originator has been signaled a ”free line”

Functions for oco objects only:

• onRingBack to inform that the called party has

signalled a ”free line”

• onCallAnswered to inform that the call has been

answered be the terminator and that the session is

now established

The mixer object (mx) for multi session coordina-

tion is a little different. It does not inherit from the

current base class of the APSINT objects and does not

use its own Websockets. However, the factory for cre-

ating ep, ico and oco objects ties in observation calls

so that the mx object is informed on each incoming

and outgoing message that any of the other objects is

sending towards or receiving from the endpoint. The

mx object can also modify, insert or remove these mes-

sages to perform coordination actions. The member

functions of the mixer object are:

• getCalls to obtain a list of all currently managed

call objects

• getActiveCall to obtain the call object which is

currently active, meaning the call currently trans-

mitting or receiving media

• setActiveCall to make some other connection

(i.e. call session) the active call

• declineAdditionalCall may be used to decline

all incoming calls during an ongoing session

• acceptAdditionalCall makes the new incom-

ing call the active one and terminates the former

active session

• onAdditionalCall may be set to one of the two

functions above or to some other function cus-

tomized by the Web page

All functions/methods starting with ”on” in its

name are notiﬁcation callbacks meant to be overrid-

den by the Web page. In a minimum conﬁguration

the onIncomingCall of the ep object needs to be pro-

vided to get interaction capabilities with the endpoint.

More callbacks may be customized by the web de-

signer as appropriate (cf. next section).

4.3 Web Application

We have implemented a web application to showcase

the possibilities of our solution. Sushify is a Twitter-

like microblogging platform written in Ruby on Rails.

The main entities of Sushify are:

• User.

• Micropost.

• Relationship.

These are also reﬂected as Rails models and stored

in a SQLite database. Sushify is fully based on a

REST architecture (Fielding, 2000) thus rendering all

the models as resources that can be manipulated via

standard HTTP methods. Similar to Twitter a user can

create microposts and users can follow each other thus

creating a relationship. Posts from followed users are

shown in an aggregated micropost feed.

Building upon this feature set we have included

the possibility to trigger calls and send messages us-

ing the Javascript API discussed above. Fig. 6 shows

the Sushify UI with an active call window. Basically

three Javascript ﬁles are used and included in the ﬁle

config/application.rb.

• apsint.js The Javascript library ﬁle outline

above

• apsint-vts.js Contains speciﬁc call and mes-

sage handling code by overwriting methods of

apsint.js such as answercall().

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

758

• apsint-ui.js Methods of apsint-vts.js usu-

ally call UI-related functions in this ﬁle, e.g.

opening a message box with the caller id. User in-

put (like declining a call) is handled here as well,

and the corresponding method in apsint-vts.js

is called (e.g. declinecall()).

Figure 6: Sushify with active call window.

Users can conﬁgure a SIP address where they can

be reached. We have also implemented a feature that

allows an auto-reply message to be sent when a user

declines a call. In order to prevent unsolicited calls

a user can only call and message users (and see their

SIP/email address) that are his/her followers.

5 DISCUSSION

The advantages of our approach over other architec-

tures is based on the following two design decisions:

1. Resolve the need for a B2B relationship be-

tween Web provider and communication service

provider.

2. Distribute the Model, View and Control compo-

nents of an application across different processes

and even across the network.

The ﬁrst point will allow applications to use any

communication service provider the user has sub-

scribed to and hence there is no need for conﬁguration

or contract signing to use the application. Further-

more, the resolution of the B2B relationship will not

only work for delivering communications services via

local device functionality but also to trigger any ser-

vice invocation from the users device instead of from

the providers server towards network provider or any

other 3rd party service the user can authenticate with.

The second point enables mobile applications to

be always up to date while running stable in close in-

tegration with mobile device capabilities. View and

Controller will be downloaded via HTTP and pre-

sented and controlled via HTML, CSS and Javascript

kept up do date on the server. At the same time apps

are executed locally with all the required function-

ality provided natively by the device (in the Model

component running in a separate process). View and

main parts of the controller are downloaded from a

web server and executed in the browser whereas, low

level controller functionality and the Model compo-

nent dealing with the SIP stack are running as a native

background process communicating with the browser.

It is not necessary for the user to share his secrets

with third parties such as authentication credentials,

address book data or calendar information. The re-

quired data may be added only at the time of local

execution in the mobile browser. The user keeps con-

trol over private data. For example, the user wants

to call a friend on a social web platform. The name

or nick name of the friend may be found on the Web

page but not his phone number which is associated to

the friends nick in the local address book of the users

device. During APSINT Javascript execution both are

combined in the browser of the users device. A call is

made via the Web page without the Web page know-

ing the details.

Other real-time communication solutions or native

functionality execution like WebRTC, plugin based

solutions (Adeyeye et al., 2012) or implementation of

a Web app with Googles NativeClient are not capable

of reacting on external stimula, e.g. if no browser is

running it is not possible to take an incoming commu-

nication request. With the APSINT endpoint running

as a service on the local device it is always possible to

start a browser and download a Web page to take an

incoming call.

Also, while experimenting with our prototype, we

introduced a simple method of rendering video di-

rectly in the browser by using the data URI scheme

to put base64 encoded data right into the src attribute

of a plain HTML img tag. Together with this func-

tionality and the Websocket based communication be-

tween browser and communication stack it was pos-

sible to distribute functionality across devices. Do-

ing this, it is possible to communicate via a smart-

phones SIP connection while displaying the related

Web page GUI for communication and video display

on the monitor of a nearby PC (in the same LAN).

One would talk via the phone but control communi-

cation and watch video on the PC.

No plugin, especially no mobile ﬂash support is

required. Our architecture supports multiple commu-

INTEGRATINGCOMMUNICATIONSERVICESINTOMOBILEBROWSERS

759

nication sessions to be supported yet limits the us-

age of communication resources to one Web page at a

time. The endpoint will provide its services for other

Web pages if the current using Web app does have no

open sessions and will detect crashes of a Web page or

the browser automatically as reserving the communi-

cation stack is tied to keeping the Websocket session

open.

Unlike other approaches, the APSINT architecture

offers the possibility to communicate with clients of

other type and via telco network features protocols

and networks of other kind. With an APSINT en-

abled Web page one can communicate with any other

SIP client reachable in the network or via breakout

functionality of telco providers with any other party

in the mobile or ﬁxed line network. These solutions

are standardized and available worldwide.

5.1 Security Issues

Security issues have a paramount importance for the

all parties involved in consuming and providing ser-

vices build on top of APSINT architecture. This is

because of the speciﬁc setup which allows a Web page

to take control over audio and video sessions started

from the users devices like Smartphone. Hence on the

usability level security requirements revolve around

achieving trust in this new functionality by providing

a reliable solution in term of authentication and au-

thorization of communication sessions started by the

Web pages, as well as, providing privacy and conﬁ-

dentiality. Security measures need to address speciﬁc

focus of all parties involved, as discussed below.

Users may be accustomed to the dangers of the

internet and accept the risks because of vital impor-

tance of this platform. However, they may be difﬁcult

to persuade to grant control over their phones to an in-

ternet application unless they can trust their security

requirements are met. The users requirements cover

different areas which stretch from preventing of start-

ing unsolicited communication sessions and possibly

turning their devices to spy on users communication

or hijacking them for SPIT attacks, securing privacy

of communication, up to mitigation of phishing, e.g.

in form of persuading users to call a costly service

numbers.

Similarly, network providers are interested in pre-

venting SPIT or DoS attacks on their customers which

may be caused when malicious Web pages could ob-

tain control over devices connected to the providers

network. They would as well prefer situation when

they could unambiguously identify sessions origi-

nated by the Web pages and associate them with

the speciﬁc page. This may be especially important

when APSINT architecture capable devices would be

branded by the operator. In this case users would

surely expect the operator to take at least partial re-

sponsibility in case when allowing Web pages to con-

trol communication sessions would inﬂict substantial

cost s to the user, e.g. due to phishing attack.

For the owners of the Web pages with capabilities

to control communication sessions on the APSINT

devices winning user trust is very important. User

need to have a guarantee that they do not give con-

trol to start audio or video sessions to the malicious

Web page. Therefore stealing the functionality for

controlling APSINT terminals embedded in a speciﬁc

Web page and reusing it in the context of a different

Web page should be prevented. If the revenues from

the voice and video trafﬁc generated by the Web page

need to be shared with the network operator, then un-

ambiguous identiﬁcation of such sessions is needed.

Traditional architecture for Web-IMS conver-

gence is based on Parlay X interface in the IMS

application server and it has well deﬁned security

framework. Critics of the efﬁciency of the Parlay X

based architecture brought alternative proposal based

on new functional entity on the IMS network border

named Web Session Controller which shields the IMS

terminal from direct interaction with Web application.

APSINT project gives a new concept to Web and IMS

convergence which takes place mainly in users ter-

minal. Direct interactions between IMS terminal and

Web applications require to explore ways of combin-

ing security solutions for Web applications with IMS

security standards applicable for this novel architec-

ture. Approach taken by the APSINT team tries to

ﬂexibly adapt security restrictions in consuming Web

applications to the degree of trust that user expresses

against a Web page with embedded APSINT applica-

tion. Web applications being consumed in the users

browser will be secured by the known technologies

like SSH or digital signatures, however, user will be

allowed to grant his permit to speciﬁc Web pages for

establishing audio and video sessions either perma-

nent or for the actual session only depending on his

trust toward this Web page. IMS security standards

will be fully supported. Additionally a new secu-

rity measure is studied for providing to IMS identity

of Web application which was allowed to start au-

dio/video session from the particular IMS terminal.

6 CONCLUSIONS AND FUTURE

WORK

As the APSINT solution integrates the endpoint on

local device the browser communicates with the end-

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

760

point locally and all signalling is handled by the lo-

cal endpoint. An other approach is to move the end-

point to a remote host as shown in Figure 7 where a

local browser communicates over Websockets with a

remote endpoint. For this solution to be applicable the

local device still needs to implement a local media en-

gine. In such an approach all signalling is handled by

a remote server whereas the media stream is handled

by the respective end devices. Finally the media en-

gine could be integrated in the browser so the media

resources both hardware and software are managed

by the Web browser itself as proposed by WebRTC

(Google, 2011).

local1

Remote

Endpoint

SIP/WS

local2

Media engine Media engine

Browser Browser

Figure 7: Architecture of a remote endpoint.

Extensions to the endpoint could be added by uti-

lizing platform services offered by the operating sys-

tem on user devices like for example on Android

phones. Candidates for such usage are location ser-

vices, camera, phonebook, GSM calls and SMS. A

similar approach is made by Mozilla in WebAPI

project (Mozilla.org, 2011).

6.1 Porting to other Systems

Currently the endpoint is running on the Android plat-

form as well as on Linux and Mac (OS X) desktop

systems. We have also investigated possibilities of

porting the endpoint to other operating systems:

• iOS: The Doubango SIP stack has been already

ported to this platform, thus porting the endpoint

software should be relatively easy.

• Symbian: Should be also relatively straightfor-

ward as the Soﬁa SIP stack (developed by Nokia

and also used in desktop versions of the endpoint)

is fully supported on that platform.

• Windows Phone 7: To our knowledge it is not pos-

sible to use/compile external libraries for this plat-

form due to security restrictions (mainly caused

by the lack of a multiuser concept in the kernel).

As a consequence the only way for implementa-

tion would be from the ground up with the Sil-

verLight IDE, which is clearly not a feasible ap-

proach.

Finally, we expect no major problems in support-

ing Windows desktops (since all the necessary li-

braries/compilers are available).

6.2 API Evaluation

We are planning to perform an evaluation of our

Javascript library to assess acceptance among the de-

veloper community. As a ﬁrst step we will review and

simplify our API, write documentation and provide

code examples, best practices and such. The evalua-

tion should be carried out in two phases:

• Laboratory Test: A two-hour test with 8-10 de-

velopers that should solve 2-3 tasks. We are con-

sidering quantitative criteria such as: task com-

pletion time, lines of code, iteration steps needed

(Clarke, 2004). Also think-aloud and maybe

video observation might reveal more hidden is-

sues.

• Real World Test: Developers get the

API/documentation to use it for free

tasks/projects. They give feedback in form

of diaries and are supported by us throughout the

study (4-6 weeks).

Combining both a laboratory test and a longitudi-

nal real-world study (Gerken et al., 2011) is a novel

approach in evaluating a API and we expect richer re-

sults with this two-phase-approach.

ACKNOWLEDGEMENTS

The Competence Center FTW Forschungszentrum

Telekommunikation Wien GmbH is funded within the

program COMET - Competence Centers for Excel-

lent Technologies by BMVIT, BMWA, and the City

of Vienna. The COMET program is managed by the

FFG.

We would like to thank the APSINT project team

and especially our colleagues Vincenzo Scotto di

Carlo and Hans-Heinrich Grusdt from Nokia Siemens

Networks Germany and Marco Happenhofer from Vi-

enna University of Technology for their contributions

to this work.

INTEGRATINGCOMMUNICATIONSERVICESINTOMOBILEBROWSERS

761

REFERENCES

Adeyeye, M., Ventura, N., and Foschini, L. (2012). Con-

verged multimedia services in emerging web 2.0 ses-

sion mobility scenarios. Wireless Networks, 18:185–

197. 10.1007/s11276-011-0394-z.

Clarke, S. (2004). Measuring API usability. Dr. Dobbs

Journal, pages 6–9.

Diop, M. (2011). High Quality Video SIP/IMS client for

Google Android. http://code.google.com/p/imsdroid/.

Accessed: 15/11/2011.

Fette, I. and Melnikov, A. (2011). The WebSocket

protocol. http://tools.ietf.org/html/draft-ietf-hybi-

thewebsocketprotocol-17. Accessed: 15/11/2011.

Fielding, R. T. (2000). Architectural Styles and the Design

of Network-based Software Architectures. PhD thesis,

University of California, Irvine.

Gerken, J., Jetter, H.-C., Z

ollner, M., Mader, M., and Reit-

erer, H. (2011). The concept maps method as a tool to

evaluate the usability of APIs. In Proceedings of the

2011 annual conference on Human factors in comput-

ing systems, CHI ’11, pages 3373–3382, New York,

NY, USA. ACM.

Google (2011). WebRTC is a free, open project that

enables web browsers with Real-Time Communica-

tions (RTC) capabilities via simple Javascript APIs.

http://www.webrtc.org/home. Accessed: 15/11/2011.

Green, A. (2011). C Websockets Server Library.

http://git.warmcat.com/cgi-bin/cgit/libwebsockets.

Accessed: 15/11/2011.

Mozilla.org (2011). WebAPI is an effort by Mozilla to

bridge together the gap, and have consistent APIs that

will work in all web browsers, no matter the operating

system. https://wiki.mozilla.org/WebAPI. Accessed:

15/11/2011.

Nishimura, H., Ohnimushi, H., and Hirano, M. (2009). Ar-

chitecture for Web-IMS Cooperative Services for Web

Terminals. In Intelligence in Next Generation Net-

works, 2009. ICIN 2009. 13th International Confer-

ence on, ICIN 2009, New York, NY, USA. IEEE.

Pessi, P. et al. (2011). Soﬁa-SIP - a RFC3261 compliant SIP

User-Agent library. http://soﬁa-sip.sourceforge.net/.

Accessed: 15/11/2011.

Rajlich, N. (2011). A barebones WebSocket client

and server implementation written in 100% Java.

https://github.com/TooTallNate/Java-WebSocket. Ac-

cessed: 15/11/2011.

Sipgate.com (2011). Move your phones to the cloud.

http://sipgate.com. Accessed: 20/11/2011.

Tropo.com (2011). Tropo - cloud api for voice, sms, and in-

stant messaging services. https://www.tropo.com. Ac-

cessed: 20/11/2011.

Winokur, D. (2011). Flash to Focus on PC

Browsing and Mobile Apps; Adobe to

More Aggressively Contribute to HTML5.

http://blogs.adobe.com/ﬂashplatform/2011/11/ﬂash-

to-focus-on-pc-browsing-and-mobile-apps-adobe-

to-more-aggressively-contribute-to-html5.html.

Accessed: 20/11/2011.

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

762