Enhancing URL normalization using metadata of web pages

Lay Ki Soon, Sang Ho Lee

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)

Abstract

In this paper, we present our proposed method of incorporating metadata of Web pages to identify equivalent URLs in addition to the standard URL normalization methodology. The metadata considered are the page size and the body text of Web pages. These metadata can be obtained during HTML parsing in the process of crawling without incurring unnecessary cost. Our experiment shows an accuracy of up to 95.38% in identifying equivalent URLs by using the body text of Web pages.

Original languageEnglish
Title of host publicationProceedings of the 2008 International Conference on Computer and Electrical Engineering, ICCEE 2008
Pages331-335
Number of pages5
DOIs
Publication statusPublished - 2008
Externally publishedYes
EventInternational Conference on Computer and Electrical Engineering 2008 - Phuket, Thailand
Duration: 20 Dec 200822 Dec 2008
https://ieeexplore.ieee.org/xpl/conhome/4740925/proceeding (Proceedings)

Conference

ConferenceInternational Conference on Computer and Electrical Engineering 2008
Abbreviated titleICCEE 2008
Country/TerritoryThailand
CityPhuket
Period20/12/0822/12/08
Internet address

Cite this