
 
We experimentally measured how the average 
number of downloaded pages per second per thread 
changes as the number of crawling machines 
changes. For this, we used 250,000 Korean sites 
randomly selected as seeds. We increased crawling 
machines from one to five by increment of one. We 
set the number of crawling threads for each machine 
to 10, 15 and 20, and each thread ran with 10 seeds 
simultaneously. 
Figure 9 shows how the average number of pages 
downloaded per second per thread changes as the 
number of crawling machines increases. The solid 
line, long dashes, and short dashes represent web 
crawling using 20 threads, 15 threads, and 10 
threads, respectively. The more systems are scalable, 
the more lines are horizontal. From the results, we 
believe that SCrawler is scalable almost linearly 
with the number of crawling machines. One might 
notice that the lines are not completely horizontal. 
This could be attributed to the limitations of our 
network resources. We ran this experiment in a 
campus network where the network status is likely 
to vary over time. 
 
Figure 9: Average number of pages crawled per second 
per thread. 
4 CLOSING REMARKS 
The development of SCrawler is ongoing. 
Dynamically generated contents are constantly 
created on the Web. The Web is growing 
tremendously. Our next expansion of SCrawler 
would be to selectively crawl web pages that are 
relevant to a pre-defined set of topics. 
ACKNOWLEDGEMENTS 
This work was supported by Seoul R&BD Program 
(10581cooperateOrg93112). 
REFERENCES 
Boldi, P., Codenotti, B., Santini, M., Vigna, S., 2004. 
UbiCrawler: a scalable fully distributed Web crawler. 
Software-Practice and Experience, Vol. 34, No. 8, 
711-726. 
Brin, S., Page, L., 1998. The anatomy of a large-scale 
hypertextual Web search engine. In Computer 
Networks and ISDN Systems, Vol. 30, No.1-7, 107-
117. 
Burner, M., 1997. Crawling towards Eternity: Building An 
Archive of The World Wide Web. In Web Techniques 
Magazine, Vol. 2, No. 5, 37-40. 
Cho, J., Garcia-Molina, H., 2002. Parallel Crawlers. In 
WWW’02, 11th International World Wide Web 
Conference, 124-135. 
Cho, J., Garcia-Molina, H., 2000. The Evolution of the 
Web and Implications for an Incremental Crawler. In 
VLDB’00, 26th International Conference on Very 
Large Data Bases, 200-209. 
Cho, J., Garcia-Molina, H., Haveliwala, T., Lam, W., 
Paepcke, A., Raghavan, S., Wesley, G., 2006. Stanford 
WebBase Components and Applications. In ACM 
Transactions on Internet Technology. Vol. 6, No. 2, 
153-186. 
Gray, M., 1996. Internet Statistics: Growth and Usage of 
the Web and the Internet, http://www.mit. 
edu/people/mkgray/net/. 
Heydon, A., Najork, M., 1999. Mercator: A scalable, 
extensible Web crawler. In World Wide Web, Vol. 2, 
No. 4, 219-229. 
Kim, S.J., Lee, S.H., 2003. Implementation of a Web 
Robot and Statistics on the Korean Web. In HSI’03, 
2nd International Conference of Human.Society@ 
Internet, 341-350. 
Najork, M., Heydon, A., 2001. High-performance web 
crawling. In SRC Research Report 173. Compaq 
Systems Research Center. 
Najork, M., Wiener, J.L., 2001. Breadth-First Search 
Crawling Yields High-Quality Pages. In WWW’01, 
10th International World Wide Web Conference, 114-
118. 
Shkapenyuk, V., Suel, T., 2002. Design and 
Implementation of a High-Performance Distributed 
Web Crawler. In ICDE’02, 18th International 
Conference on Data Engineering, 357-368. 
ICE-B 2007 - International Conference on e-Business
156