Electronic Programming Guide Recommender for

Viewing on a Portable Device

Matthew Y. Ma

, Jinhong K. Guo

, Jingbo Zhu

, Guiran Chang

Panasonic Digital Networking Laboratory

Two Research Way, Princeton, NJ 08540, USA

Institute of Computer Software and Theory, Northeastern University

Shenyang, P.R. China

Abstract. With the merge of DTV and the exponential growth of broadcasting

network, an overwhelmingly amount of information have become available at

views’ homes. Therefore, it becomes increasingly challenging how consumers

can receive the right amount of information at the right time for their enter-

tainment needs. We proposed an electronic programming guide (EPG) recom-

mender based on natural language processing techniques. Particularly, the re-

commender has been implemented as a service on a home network that facili-

tates the browsing and recommendation of TV programs on a portable remote

device and such system is found to be feasible. Preliminary experiments have

shown a precision of 81%.

1 Introduction

As the number of channels available on the broadcasting network increases, it be-

comes more challenging to deal with the overwhelmingly expanding amount of in-

formation provided by the electronic programming guide (EPG) and delivering per-

sonalized information to the consumer. Consumers can access the EPG via subscrip-

tion based cable network, the Internet, or services offered by device vendors such as

Tivo. However, existing method of multicasting of EPG feeds static contents to users

on the same network and do not provide personalized contents. Additionally, EPGs

provided through the cable operators are proprietary and do not interface with other

data format on the Internet or from other sources. Thirdly, set-top boxes with pro-

gram suggestion are generally primitive as most systems employ simple category,

title, and keyword matching on the EPG contents.

To address such problems, previous work such as Ehrmantraut et. al. [0] and Gena

[0] adopted both implicit and explicit feedback for personalized program guide. Ta-

kagi et. al. [0] proposed a conceptual matching scheme to be applied to TV program

recommendation by fusing of conceptual fuzzy sets and ontology. This work is lim-

ited to drama category and the approach is primarily based on program sub-categories

of drama as the top layer of the ontological structure to represent user’s taste. In re-

cent research, Isobe et. al [0] described a STB based scheme that associates the de-

Y. Ma M., K. Guo J., Zhu J. and Chang G. (2005).

Electronic Programming Guide Recommender for Viewing on a Portable Device.

In Proceedings of the 1st International Workshop on Web Personalisation, Recommender Systems and Intelligent User Interfaces, pages 79-88

DOI: 10.5220/0001422800790088

 SciTePress

gree of interest of each program with viewer’s age, sex, occupation, combined with

favorite program categories in sorting the EPG. Yu et. al [0] proposed an agent based

system for program personalization under TV Anytime environment [0] using simi-

larity measurement based on VSM. This work, however, assumes that the program

information is available on a large storage media and does not address the problem of

data sparseness and limited categories supported by most EPG providers. Pigeau et.

al. [0] presented a TV recommender system using fuzzy linguistic summarization

technique to coupe with both implicit and explicit user profile. This system largely

depends on the quality of meta-data and solely on DVB-SI standard [0].

Cotter et. al [0] describes an Internet based personalized TV program guide using

an explicit profile and a collaborative approach. Xu et. al [0] also presented some

interesting conceptual framework for TV recommendation system based on Internet

WAP/SOAP. For portable devices, however, this system inherits the limitations of

SOAP/HTTP based technologies, which are considerable network overhead on a

portable device.

Our work attempts to address two important perspectives in EPG recommender

systems: 1) a home network based framework to support the EPG recommender sys-

tem for viewing on a portable device; 2) a linguistic based approach to extract from

available information source good feature vectors that can be utilized for recom-

mender classifier. Details are discussed in the later sections.

2 Overview

Figure 1 shows the architecture of the EPG recommender system. A portable device

communicates with the EPG recommender system via various network protocols,

such as infrared, Wi-Fi, WAP or SIP [0]. The EPG recommender consists of pro-

gram information acquisition module, user profile module, EPG recommendation

module, and EPG management and query modules.

Fig. 1. EPG recommendation system architecture.

The EPG management module is responsible for packing and unpacking data bun-

dles to and from the portable device. The data bundle generally refers to a package

that includes application types (such as user requests) and associated data (such as

user defined EPG categories for browsing).

Program information acquisition module collects program information from web

sites, parses the text data, converts the data into structural data, and stores the struc-

tured data in the EPG database. Meanwhile, user profile acquisition module collects

user profile data and stores it in the user profile database.

The EPG query module receives and parses the XML data in the bundle to get the

content information specified by the user. The query result is packaged in XML

format, and delivered to EPG management module in a data bundle. One copy of the

query result is delivered to the user profile acquisition module for acquisition of user

profile data.

EPG recommendation and learning module dynamically adjusts the parameters of

the recommendation algorithm according the user profile. EPG recommendation

module recommends programs in the database based on users’ preferences.

3 EPG Recommendation System

Fig. 2. EPG recommendation.

The EPG recommendation system utilizes the EPG data and user profile to rec-

ommend programs. Automatic recommender combined with user preset filters are

used to enhance the accuracy of the recommended programs and decrease the search

range. The overall architecture is shown in Fig. 2. Five filters: time, station, category,

domain, and content filter, are implemented in the recommendation process. The user

can predefine a filter setting, for example, a time period from 2004-10-6::0:00 to

2004-10-8::24:00. A default time setting can also be defined, such as the current

week. Time filtering can remove all programs that do not play within the specified

time period. Station filtering removes the programs that are not on the defined sta-

tions from the remaining candidate programs.

Category refers to the genre of the program. Domain information, on the other

hand, refers to users’ area of interest. Examples of domain information include sports,

politics etc. If a user is interested in sports, he may be interested in all the categories

that are related to sports, for example, sports news, movies about sports, and docu-

mentary about sports.

In both category and domain filter setting, user is provided with three choices: to

bypass recommendation; to use automatic recommendation; or to manually select one

or more categories/domains. Trained classifiers are used to recommend the program

once the user selects to use automatic category or domain recommendation.

Content filter is designed to recommend programs based on the EPG contents. It is

more comprehensive as the contents are comprised of all information in an EPG data

such as station names, program titles, program descriptions, time interval, and actors.

Similarly, in content filter setting, a user can choose to bypass or use automatic con-

tent recommendation, which invokes a trained content recommendation classifier.

The recommendation classifiers are further explained and illustrated in Fig. 3.

Three classifiers are built for the recommendation via a learning process. Program

category data is extracted from user profile database for a particular user by category

data extractor. The probability of these extracted categories is computed as:

∑

)(

cP ,

where C denotes the set of categories, c

denotes a category, and N(c

) denotes the

frequency of c

. Trained category classifier can therefore recommend the programs

using the sorted category list in the order of these probabilities.

Fig. 3. EPG recommendation learning.

Similarly, at the program domains level, the probability of these extracted domains

is computed as:

∑

)(

dP ,

where D denotes the set of Domains, d

denotes a domain, and N(d

) denotes the

frequency of d

. .

At the program content level, a corpus is constructed that includes preferred and

non-preferred programs. The content classifier is trained from the corpus using

maximum entropy. The details of maximum entropy classifier will be described in the

next section.

After the filtering process, recommended program generator places the recom-

mended programs into a human readable format, e.g. XML format. The formatted

program information are packaged in a data bundle and sent to the portable device for

presentation according to the user’s predefined style sheet.

4 Maximum Entropy Classifier

Maximum entropy classifier has been employed in two processes. Since domain in-

formation is not readily available from the EPG data, maximum entropy technique is

used for text classification. Domain information is classified from EPG data via a

maximum entropy text classifier that is trained from a corpus. In the second process,

maximum entropy model is used to obtain the content classifier for recommendation

as shown in Fig. 3. Such maximum entropy model is obtained from a trained EPG

database with integrated user profile.

A. Domain Information and Text Classification

We utilize detailed program information (abstract or description) in EPG to further

extract characteristics of programs, particularly the domain information. Program

information data can be obtained either directly from the service providers or from

Internet professional websites, such as TV Guide [0] and TitanTV [0]. This informa-

tion forms the basis of the EPG database and is in a semi-structural text format such

as HTML and/or XML.

For text classification, a training corpus is collected by tagging a collection of pro-

grams into predefined domains. Fig. 4 shows the classification process. First, program

vectors that construct the vocabulary are formed by using the bag-of-words model.

Because the count matrix is high dimensional in the feature space due to the complex-

ity of high dimensional text data, feature selection is performed to lower the feature

space. When constructing vocabulary, stop words are removed from the list in the

training corpus.

Fig. 4. Classification from detailed program information.

The χ2 statistic measures the lack of independence between a word t and a domain

c. Using the two-way contingency table of a word t and a domain c, where A is the

number of times t and c co-occur, B is the number of time the t occurs without c, C is

the number of times c occurs without t, D is the number of times neither c nor t oc-

curs, and N is the total number of documents, the term “goodness measure” is defined

to be:

()

()() ()

()

CDABCD

Ν× Α − Β

Α+ × Β+ × + × +

The χ2 statistic is zero if t and c are independent. For each domain, the χ2 statistic

can be computed between each entity in a training sample and that domain to extract

the features.

The programs can be represented as a vector of features and the frequency of the

occurrence of that feature in the form of P = <tf1, tf2, …, tfi,… , tfn >, where n de-

notes the size of features set, and tfi is the frequency of the i

feature.

Maximum entropy (ME) model is a general-purpose machine-learning framework

that has been successfully applied to a wide range of text processing tasks [0][0].

Given a set of training samples T={(x

, y

), (x

, y

), …, (x

, y

)} where x

is a real

value feature vector and y

is the target domain, the maximum entropy principle states

that data T should be summarized with a model that is maximally noncommittal with

respect to missing information. Among distributions consistent with the constraints

imposed by T, there exists a unique model with highest entropy in the domain of

exponential models of the form:

⎥

⎦

⎤

⎢

⎣

⎡

∑

),(exp

)(

)|(

yxf

xyP

(1)

where

},...,,{

21 n

Λ are parameters of the model, ),( yxf

's are arbitrary

feature functions of the model, and

∑

yxfxZ ]),(exp[)(

is the

normalization factor to ensure

)|( xyP

is a probability distribution. Furthermore,

it has been shown that the maximum entropy model is also the Maximum Likelihood

solution on the training data that minimizes the Kullback-Leibler divergence between

and the uniform model. Since the log-likelihood of

)|( xyP

on training data

is concave in the model's parameter space

, a unique maximum entropy solution is

guaranteed and can be found by maximizing the log-likelihood function:

)|(log),(

xypyxpL

∑

where

),(

yxp

is an empirical probability distribution. Our current implementa-

tion uses the Limited-Memory Variable Metric method, called L-BFGS, to find

Λ .

Applying L-BFGS requires evaluating the gradient of the object function L in each

iteration, which can be computed as:

ipip

fEfE

−=

∂

where

and

fE denote the expectation of f

under empirical distribu-

tion

and model

respectively.

The feature function in our algorithm is defined as the following:

⎩

⎨

⎧

≠

),(

ccdwn

cdf

(2)

where, n(w,d) denotes the frequency of the word w in program d.

The training programs are represented as follows:

TP: tp

, tp

, …,tp

, …, tp

-> T = (V, C): (v

, c

), (v

, c

), …, (v

, c

), …, (v

, c

)

where TP denotes training programs set, tp

denotes training program i, V denotes

the vectors, and C denotes the domains. The feature function set F can be constructed

using Equation (2) and the parameters

},...,,{

21 n

of the ME model are

estimated using the feature function set F and the training samples (V, C). Using

Equation (1), P(c

|tp), P(c

|tp), …, P(c

|tp) for each domain can be com-

puted. Finally, the domain

c: c = argmax(P(c

|tp))

is selected.

B. Content Classifier for Recommendation

Unlike some existing systems that prompt user to provide keywords to establish a

user profile, we utilize explicit feedback system that allows mobile users to indicate

their preferences relating to each program information viewed. The user preference is

later integrated into the EPG database. The EPG recommendation process is also

utilizing maximum entropy model and works in a similar way as shown in Fig. 4.

In EPG content recommendation, upon user’s choice of preference on each pro-

gram, several features were extracted from the raw EPG database. These features are

divided into several groups. 1) Station-Name Feature: The corresponding value for

the selected station is 1. 2) Time Feature: time the program is played. We divide a day

into 24 intervals. 3) Lexicon Feature: Title, Episode Title, and Program Information.

First, we construct a vocabulary using these three fields in training data. The string of

the token w, which is included in the vocabulary, is used as a feature. 4) Category

Feature: This information is usually contained in EPG data from content providers. 5)

Actors Feature.

As shown in Fig. 4, feature functions are obtained from feature vectors. EPG rec-

ommendation and learning module dynamically adjusts the parameters of the recom-

mendation algorithm according to the user profile by calculating the maximum en-

tropy model

. The calculation of

parameters requires the use of feature vectors

and training corpus, which consists of raw EPG database and added user profile. In

an extreme case, if user is only interested in one domain, the recommendation classi-

fier would be a binary classifier that only outputs “like” or “dislike” for all input

program content.

5 Prototype and Experiments

In our experiments, EPG recommender was implemented on a small corpus, about

one month’s EPG for 30 channels, resulting in 1Mbytes of EPG data. In addition, we

built a prototype framework to enable the downloading of EPG from home network

and viewing on a portable device. The EPG collection and recommendation system is

implemented on a home network, where EPG algorithm is running on a home server

that supports OSGi[0] framework. The OSGi (Open Service Gateway Initiative)

framework provides an open execution environment for applications to run on het-

erogeneous devices, particularly, it provides flexibility for content providers to up-

load updates to consumers’ devices. The portable device is a mobile device that sup-

ports SIP[0], which allows simple text based messages to be carried between the

mobile device and the home network devices. Additionally, it provides streaming

support for our future extension.

The prototype also enables a mobile client with three functions - EPG browsing

(by date, channel etc.), Program Details (for specific program) and EPG recommen-

dation. Fig.5 shows a mobile user interface for (a) EPG program details and (b) a

recommended program list. As shown at the bottom of Fig.5(a), a “like” and “dislike”

button is provided so user can give some relevance feedback to the recommendation

module after reviewing the program details.

Fig.5. (a) EPG program details and (b) recommended program list on a mobile device.

We have conducted a preliminary experiment and used only program information

in the training. Among the four weeks of EPG data, one week is used for generating

user's profile data as training corpus, whereas the other three weeks are used for test-

ing. The training corpus is collected when a user provides relevance feedback to the

training EPG. In our experiment, user is only concerned with sports domain in both

training and recommendation. The recommendation on the other three weeks EPG

data is judged by the same user. A precision rate of 81% was achieved. Because each

recommendation for a specific domain is likely independent from that of another

domain, we can expect similar performance once we expand the recommendation

engine to multiple domains in the future.

6 Conclusion

Among home entertainment services, electronic programming guide (EPG) is perhaps

the most appealing applications for television, and its services continue to grow in the

emergence of new digital TV market. Our proposed system features EPG collection

from non-proprietary data sources (i.e. HTML on the Internet) and an EPG recom-

mender based on text classification and maximum entropy model. As we are aware,

the proposed work is the first of its kind using natural language processing techniques

for TV recommender and the result is promising. A relevance feedback is also im-

plemented to provide dynamic personalized EPG service. The prototype of EPG re-

commender is implemented under OSGi environment and the viewing of EPG on a

portable device is enabled through SIP network.

The presented work and prototype have suggested a feasible architecture and tech-

nology for providing personalized home network based EPG service. Our next step is

to systematically collect EPG training corpus and also conduct text classification and

EPG recommender evaluation. In addition, how relevance feedback can be best pro-

vided through user’s daily TV viewing experience implicitly on the portable device or

on a home server would be a challenge. Third, there is a future need to address

browsing/sending graphics and streaming in EPG information via the home network.

References

1. TV Guide. http://www.tv-guide.com

2. TV Anytime Forum. http://www.tv-anytime.org

3. TitanTV Guide. http://www.titantv.com

4. Specification for Service Information (SI) in DVB Systems, DVB Document A038 Rev. 1,

May 2000.

5. OSGi: Open Services Gateway Initiative. http://www.osgi.org

6. SIP: Session Initiation Protocol. http://ietf.org/html.charters/sip-charter.html.

7. T. Isobe, M. Fujiwara, H. Kaneta, U. Noriyoshi and T. Morita, Development and features of

a TV navigation system, IEEE Transactions on Consumer Electronics, Vol. 49, Issue 4,

Nov. 2003, pp. 1035-1042.

8. T. Takagi, S. Kasuya, M. Mukaidono and T. Yamaguchi, Conceptual matching and its appli-

cations to selection of TV programs and BGMs, IEEE SysInt. Conf. On Systems, Man and

Cybernetics, Vol. 3, Oct, 1999, pp. 269-273.

9. J. Xu, L. Zhang, H. Lu and Y. Li, The development and prospect of personalized TV pro-

gram recommendation systems, Proceedings of the IEEE 4th Int. Symposium on Multime-

dia Software Engineering (MSE), 2002.

10. A. Pigeau, G. Raschia, M. Gelgon, N. Mouaddib, R. Saint-Paul, A fuzzy linguistic summa-

rization technique for TV recommender systems, Proceeding of the IEEE Int. Conf. On

Fuzzy Systems, 2003, pp. 743-748.

11. Z. Yu, X. Zhou, X. Shi, J. Gu and A. Morel, Design, implementation, and evaluation of an

agent-based adaptive program personalization system, Proceedings of the IEEE 5th Int.

Symposium on Multimedia Software Engineering (MSE), 2003.

12. A. Berger, S. Della Pietra , V. Della Pietra, A maximum entropy approach to natural lan-

guage processing, Computational Linguistics, 1996, 22(1): 58-59

13. C. Gena, Designing TV viewer stereotypes for an Electronic Program Guide, Proceedings

of the 8th International Conference on User Modeling, 2001, 3:274-276

14. M. Ehrmantraut, T. Herder, H. Wittig and R. Steinmetz, The personal Electronic Program

Guide- towards the pre-selection of individual TV programs, In Proc. Of CIKM’96, 1996,

pp.243-250

15. P. Cotter and B. Smyth, PTV: Intelligent Personalised TV Guides. Proceedings of the 12th

Innovative Applications of Artificial Intelligence (IAAI) Conference, 2000.

16. L. Zhang, J. Zhu and T. Yao, An Evaluation of Statistical Spam Filtering Techniques,

ACM Transactions on Asian Language Information Processing (TALIP), Vol. 3, No.4,

pages 243-269, December 2004.