TY - JOUR
T1 - Automated classification and localization of daily deal content from the Web
AU - Cuzzola, John
AU - Jovanović, Jelena
AU - Bagheri, Ebrahim
AU - Gašević, Dragan
PY - 2015
Y1 - 2015
N2 - Websites offering daily deal offers have received widespread attention from the end-users. The objective of such Websites is to provide time limited discounts on goods and services in the hope of enticing more customers to purchase such goods or services. The success of daily deal Websites has given rise to meta-level daily deal aggregator services that collect daily deal information from across the Web. Due to some of the unique characteristics of daily deal Websites such as high update frequency, time sensitivity, and lack of coherent information representation, many deal aggregators rely on human intervention to identify and extract deal information. In this paper, we propose an approach where daily deal information is identified, classified and properly segmented and localized. Our approach is based on a semi-supervised method that uses sentence-level features of daily deal information on a given Web page. Our work offers (i) a set of computationally inexpensive discriminative features that are able to effectively distinguish Web pages that contain daily deal information; (ii) the construction and systematic evaluation of machine learning techniques based on these features to automatically classify daily deal Web pages; and (iii) the development of an accurate segmentation algorithm that is able to localize and extract individual deals from within a complex Web page. We have extensively evaluated our approach from different perspectives, the results of which show notable performance.
AB - Websites offering daily deal offers have received widespread attention from the end-users. The objective of such Websites is to provide time limited discounts on goods and services in the hope of enticing more customers to purchase such goods or services. The success of daily deal Websites has given rise to meta-level daily deal aggregator services that collect daily deal information from across the Web. Due to some of the unique characteristics of daily deal Websites such as high update frequency, time sensitivity, and lack of coherent information representation, many deal aggregators rely on human intervention to identify and extract deal information. In this paper, we propose an approach where daily deal information is identified, classified and properly segmented and localized. Our approach is based on a semi-supervised method that uses sentence-level features of daily deal information on a given Web page. Our work offers (i) a set of computationally inexpensive discriminative features that are able to effectively distinguish Web pages that contain daily deal information; (ii) the construction and systematic evaluation of machine learning techniques based on these features to automatically classify daily deal Web pages; and (iii) the development of an accurate segmentation algorithm that is able to localize and extract individual deals from within a complex Web page. We have extensively evaluated our approach from different perspectives, the results of which show notable performance.
KW - Information extraction
KW - Segmentation
KW - Web classification
UR - http://www.scopus.com/inward/record.url?scp=84953888283&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2015.02.029
DO - 10.1016/j.asoc.2015.02.029
M3 - Article
AN - SCOPUS:84953888283
SN - 1568-4946
VL - 31
SP - 241
EP - 256
JO - Applied Soft Computing
JF - Applied Soft Computing
ER -