Web crawler with URL signature - a performance study

Lay Ki Soon, Yee Ern Ku, Sang Ho Lee

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)

Abstract

URL signature was proposed to be implemented in web crawling, aiming to avoid processing duplicated web pages for further web crawling. In this paper, we present our performance study on an open source web crawler - WebSPHINX, in which we have embedded URL signature. The experimental result indicates that URL signature is able to reduce the processing of duplicated web pages significantly for further web crawling at a negligible cost compared to the one without URL signature.

Original languageEnglish
Title of host publicationProceedings - 2012 4th Conference on Data Mining and Optimization, DMO 2012
Pages127-130
Number of pages4
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventConference on Data Mining and Optimization 2012 - Langkawi, Malaysia
Duration: 2 Sep 20124 Sep 2012
Conference number: 4th
https://ieeexplore.ieee.org/xpl/conhome/6322848/proceeding (Proceedings)

Publication series

NameConference on Data Mining and Optimization
ISSN (Print)2155-6938
ISSN (Electronic)2155-6946

Conference

ConferenceConference on Data Mining and Optimization 2012
Abbreviated titleDMO 2012
CountryMalaysia
CityLangkawi
Period2/09/124/09/12
Internet address

Keywords

  • URL normalization
  • URL signature
  • web crawling

Cite this