Efficient parallel lists intersection and index compression algorithms using graphics processing units

Naiyong Ao, Fan Zhang, Di Wu, Douglas Stones, Gang Wang, Xiaoguang Liu, Jing Liu, Sheng Lin

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

36 Citations (Scopus)

Abstract

Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines lists intersection and index compression. For lists intersection, we develop techniques for efficient implementation of the binary search algorithm for parallel computation. We inspect some representative real-world datasets and and that a succiently long inverted list has an overall linear rate of increase. Based on this observation, we propose Linear Regression and Hash Segmentation techniques for contracting the search range. For index compression, the traditional d-gap based compression schemata are not well-suited for parallel computation, so we propose a Linear Regression Compression schema which has an inherent parallel structure. We further discuss how to efficiently intersect the compressed lists on a GPU. Our experimental results show signifcant improvements in the query processing throughput on several datasets.
Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment
EditorsH V Jagadish, Jose Blakeley, Joseph M Hellerstein, Nick Koudas, Wolfgang Lehner, Sunita Sarawagi, Uwe Rohm
Place of PublicationUSA
PublisherAssociation for Computing Machinery (ACM)
Pages470 - 481
Number of pages12
Volume4
Publication statusPublished - 2011
EventInternational Conference on Very Large Databases 2011 - The Westin, Seattle, United States of America
Duration: 29 Aug 20113 Sep 2011
Conference number: 37th

Conference

ConferenceInternational Conference on Very Large Databases 2011
Abbreviated titleVLDB 2011
CountryUnited States of America
CitySeattle
Period29/08/113/09/11

Cite this

Ao, N., Zhang, F., Wu, D., Stones, D., Wang, G., Liu, X., ... Lin, S. (2011). Efficient parallel lists intersection and index compression algorithms using graphics processing units. In H. V. Jagadish, J. Blakeley, J. M. Hellerstein, N. Koudas, W. Lehner, S. Sarawagi, & U. Rohm (Eds.), Proceedings of the VLDB Endowment (Vol. 4, pp. 470 - 481). USA: Association for Computing Machinery (ACM).
Ao, Naiyong ; Zhang, Fan ; Wu, Di ; Stones, Douglas ; Wang, Gang ; Liu, Xiaoguang ; Liu, Jing ; Lin, Sheng. / Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proceedings of the VLDB Endowment. editor / H V Jagadish ; Jose Blakeley ; Joseph M Hellerstein ; Nick Koudas ; Wolfgang Lehner ; Sunita Sarawagi ; Uwe Rohm. Vol. 4 USA : Association for Computing Machinery (ACM), 2011. pp. 470 - 481
@inproceedings{0c9ea22808b24ca99399fd035df09602,
title = "Efficient parallel lists intersection and index compression algorithms using graphics processing units",
abstract = "Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines lists intersection and index compression. For lists intersection, we develop techniques for efficient implementation of the binary search algorithm for parallel computation. We inspect some representative real-world datasets and and that a succiently long inverted list has an overall linear rate of increase. Based on this observation, we propose Linear Regression and Hash Segmentation techniques for contracting the search range. For index compression, the traditional d-gap based compression schemata are not well-suited for parallel computation, so we propose a Linear Regression Compression schema which has an inherent parallel structure. We further discuss how to efficiently intersect the compressed lists on a GPU. Our experimental results show signifcant improvements in the query processing throughput on several datasets.",
author = "Naiyong Ao and Fan Zhang and Di Wu and Douglas Stones and Gang Wang and Xiaoguang Liu and Jing Liu and Sheng Lin",
year = "2011",
language = "English",
volume = "4",
pages = "470 -- 481",
editor = "Jagadish, {H V} and Jose Blakeley and Hellerstein, {Joseph M} and Nick Koudas and Wolfgang Lehner and Sunita Sarawagi and Uwe Rohm",
booktitle = "Proceedings of the VLDB Endowment",
publisher = "Association for Computing Machinery (ACM)",
address = "United States of America",

}

Ao, N, Zhang, F, Wu, D, Stones, D, Wang, G, Liu, X, Liu, J & Lin, S 2011, Efficient parallel lists intersection and index compression algorithms using graphics processing units. in HV Jagadish, J Blakeley, JM Hellerstein, N Koudas, W Lehner, S Sarawagi & U Rohm (eds), Proceedings of the VLDB Endowment. vol. 4, Association for Computing Machinery (ACM), USA, pp. 470 - 481, International Conference on Very Large Databases 2011, Seattle, United States of America, 29/08/11.

Efficient parallel lists intersection and index compression algorithms using graphics processing units. / Ao, Naiyong; Zhang, Fan; Wu, Di; Stones, Douglas; Wang, Gang; Liu, Xiaoguang; Liu, Jing; Lin, Sheng.

Proceedings of the VLDB Endowment. ed. / H V Jagadish; Jose Blakeley; Joseph M Hellerstein; Nick Koudas; Wolfgang Lehner; Sunita Sarawagi; Uwe Rohm. Vol. 4 USA : Association for Computing Machinery (ACM), 2011. p. 470 - 481.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Efficient parallel lists intersection and index compression algorithms using graphics processing units

AU - Ao, Naiyong

AU - Zhang, Fan

AU - Wu, Di

AU - Stones, Douglas

AU - Wang, Gang

AU - Liu, Xiaoguang

AU - Liu, Jing

AU - Lin, Sheng

PY - 2011

Y1 - 2011

N2 - Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines lists intersection and index compression. For lists intersection, we develop techniques for efficient implementation of the binary search algorithm for parallel computation. We inspect some representative real-world datasets and and that a succiently long inverted list has an overall linear rate of increase. Based on this observation, we propose Linear Regression and Hash Segmentation techniques for contracting the search range. For index compression, the traditional d-gap based compression schemata are not well-suited for parallel computation, so we propose a Linear Regression Compression schema which has an inherent parallel structure. We further discuss how to efficiently intersect the compressed lists on a GPU. Our experimental results show signifcant improvements in the query processing throughput on several datasets.

AB - Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines lists intersection and index compression. For lists intersection, we develop techniques for efficient implementation of the binary search algorithm for parallel computation. We inspect some representative real-world datasets and and that a succiently long inverted list has an overall linear rate of increase. Based on this observation, we propose Linear Regression and Hash Segmentation techniques for contracting the search range. For index compression, the traditional d-gap based compression schemata are not well-suited for parallel computation, so we propose a Linear Regression Compression schema which has an inherent parallel structure. We further discuss how to efficiently intersect the compressed lists on a GPU. Our experimental results show signifcant improvements in the query processing throughput on several datasets.

UR - http://www.vldb.org/pvldb/vol4/p470-ao.pdf

M3 - Conference Paper

VL - 4

SP - 470

EP - 481

BT - Proceedings of the VLDB Endowment

A2 - Jagadish, H V

A2 - Blakeley, Jose

A2 - Hellerstein, Joseph M

A2 - Koudas, Nick

A2 - Lehner, Wolfgang

A2 - Sarawagi, Sunita

A2 - Rohm, Uwe

PB - Association for Computing Machinery (ACM)

CY - USA

ER -

Ao N, Zhang F, Wu D, Stones D, Wang G, Liu X et al. Efficient parallel lists intersection and index compression algorithms using graphics processing units. In Jagadish HV, Blakeley J, Hellerstein JM, Koudas N, Lehner W, Sarawagi S, Rohm U, editors, Proceedings of the VLDB Endowment. Vol. 4. USA: Association for Computing Machinery (ACM). 2011. p. 470 - 481