Tracing similarity within strongly connected components for intelligent web crawling

Yong Jin Tee, Lay Ki Soon

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Finding and obtaining information efficiently from the Web is one of the important elements in realizing Smart Home environment. Users expect to nd most relevant information within the shortest possible time. In this paper, we investigate the similarity of Web pages within Strongly Connected Components (SCCs). SCCs are overlapping groups of Web pages that may imply a relationship between the Web pages of the same component. Therefore, we seek to trace the similarity of these groups of Web pages using Cosine Similarity. Our experiment performed on Malaysian Web pages indicates that Web pages within same SCC carry a common topic or theme. This nding proves that we may locate Web pages with similar topic using the hyperlinks structure, without performing expensive analysis on the contents of the Web pages.

Original languageEnglish
Pages (from-to)89-94
Number of pages6
JournalInternational Journal of Smart Home
Volume6
Issue number2
Publication statusPublished - 2012
Externally publishedYes

Cite this