An empirical study on harmonizing classification precision using IE patterns

Lay Ki Soon, Kyu Baek Hwang, Sang Ho Lee

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

4 Citations (Scopus)

Abstract

Web pages are conventionally represented by the words found within the contents for classification purpose. However, word-based web page representation suffers several limitations such as synonymy and homonymy. Motivated by the limitations of word-based representation, we explore the potential of representing web pages using information extraction patterns, in addition to words that are identified within the web contents. In this paper, we share the results as well as the findings learned from our experiments. Our empirical study conducted using WebKB dataset indicates that the addition of information extraction patterns in web page representation helps to improve the classification precision, especially in the categories which have much diversified web content.

Original languageEnglish
Title of host publication2nd International Conference on Software Engineering and Data Mining, SEDM 2010
Pages251-256
Number of pages6
Publication statusPublished - 2010
Externally publishedYes
EventInternational Conference on Software Engineering and Data Mining 2010 - Chengdu, China
Duration: 23 Jun 201025 Jun 2010
Conference number: 2nd

Conference

ConferenceInternational Conference on Software Engineering and Data Mining 2010
Abbreviated titleSEDM 2010
Country/TerritoryChina
CityChengdu
Period23/06/1025/06/10

Keywords

  • Information extraction
  • Information retrieval
  • Web classification
  • Web mining

Cite this