
tried to understand the real difference between each
pair of cloned pages and we have observed that this
difference only consisted on their SQL code.
Moreover, the references to the database tables often
were the only difference in the SQL queries. It is
worth noting that the file names of these pages are
also very similar, meaning that there are groups of
different pages that actually implement the same or
very similar functionality for the different events
(main conference and workshops). Indeed, these
files might be clustered into groups of clones in a
similar way to the approach by Di Lucca et al.
(2002). In general, clustering is possible if the
similarity threshold required is very high, otherwise
our experience has demonstrated that clustering does
not produce meaningful results.
5 CONCLUSION
In this paper an approach to clone analysis for Web
applications has been proposed together with a
prototype implementation for JSP web pages. Our
approach analyzes the page structure, implemented
by specific sequences of HTML tags, and the
content displayed for both dynamic and static pages.
Moreover, for a pair of JSP pages we also consider
the similarity degree of their java source. The
similarity degree can be adapted and tuned in a
simple way for different web applications.
We have reported the results of applying our
approach and tool in a case study. The results have
confirmed that the lack of analysis and design of the
Web application has effect on the duplication of the
pages. In particular, these results allowed us to
identify some common features for the SEKE
conference and the collocated workshops that could
be integrated, by deleting the duplications.
Moreover, the clone analysis of the JSP pages
enabled to acquire information to improve the
general quality and conceptual/design of the
database of the web application. Indeed, we plan to
exploit the results of the clone analysis method to
support web application reengineering activities
(Antoniol et al., 2000).
REFERENCES
Antoniol, G., Canfora, G., Casazza, G., and De Lucia, A.,
2000. Web Site Reengineering using RMM. Proc. of
International Workshop on Web Site Evolution,
Zurich, Switzerland, pp. 9-16.
Aversano, L., Canfora, G., De Lucia, A., and Gallucci, P.,
2001. Web Site Reuse: Cloning and Adapting. Proc. of
3
rd
International Workshop on Web Site Evolution,
Florence, Italy, IEEE CS Press, pp. 107-111.
Baker, B. S., 1995. On finding duplication and near
duplication in large software systems. Proc. of 2
nd
Working Conference on Reverse Engineering,
Toronto, Canada, IEEE CS Press, pp 86-95.
Balazinska, M., Merlo, E., Dangenais, M., Lague, B. and
Kontogiannis, K., 1999. Measuring Clone Based
Reengineering Opportunities. Proc. of 6
th
International Symposium on Software Metrics, Boca
Raton, Florida, IEEE CS Press, pp. 292-303.
Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., and
Bier, L., 1998. Clone Detection Using Abstract Syntax
Trees, Proc. of International Conference on Software
Maintenance, IEEE CS Press, pp. 368-377.
Bieber, M. and Isakowitz, T., 1995. Special issue on
Designing Hypermedia Applications. Communications
of the ACM, 38(8).
Boldyreff, C., Munro, M., and Warren, P., 1999. The
evolution of websites. Proc. of 7
th
International
Workshop on Program Comprehension, Pittsburgh,
Pennsylvania, IEEE CS Press, pp. 178-185.
Conallen, J., 2000. Building Web application with UML,
Addison Wesley.
Di Lucca, G. A., Di Penta, M., and Fasolino, A. R., 2002.
An Approach to Identify Duplicated Web Pages. Proc.
of 26
th
Annual International Computer Software and
Application Conference (COMPSAC’02), Oxford, UK,
IEEE CS Press, pp. 481-486.
Ginige, A. and Murugesan, S., (eds.) 2001. Special issue
on Web Engineering. IEEE Multimedia, 8(1-2).
Kamiya, T., Kusumoto, S., and Inoue, K., 2002.
CCFinder: A Multilinguistic Token-Based Code Clone
Detection System for Large Scale Source Code. IEEE
Transactions on Software Engineering, 28(7),
pp. 654-670.
Lanubile, F. and Mallardo, T., 2003. Finding Function
Clones in Web Application. In Proc. of 7
th
European
Conference on Software Maintenance and
Reengineering, Benevento, Italy, IEEE CS Press,
pp. 379-386.
Levenshtein, V. L., 1966. Binary codes capable of
correcting deletions, insertions, and reversals,
Cybernetics and Control Theory, 10, 707-710.
Ricca, F. and Tonella, P., 2001. Understanding and
Restructuring Web Sites with ReWeb. IEEE
Multimedia, 8(2), 40-51.
Ricca, F. and Tonella, P., 2003. Using Clustering to
Support the Migration from Static to Dynamic Web
Pages. Proc. of 11
th
International Workshop on
Program Comprehension, Portland, Oregon, IEEE CS
Press, pp. 207-216.
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
396