Document level assessment for pairwise system evaluation

Prabha Rajagopal, Sri Devi Ravana

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review


Topic scores commonly determine the performance of retrieval systems where averaging is performed. Averaging causes loss of unit document scores that should be a measure of system effectiveness. This study aims to use document scores as a unit of measurement in a pair wise system evaluation to overcome the loss of unit document scores. Precision scores of each document per topic are calculated and used in statistical significance tests to determine if a system pair is significantly different. Aggregation of p-values from 50 topics per system pair is performed to determine the system pair significance. These are then compared to significance values from topic scores from average precision. Experimentation shows significance tests using document scores have higher numbers of statistically significant (p <= 0.01) system pairs compared to using averaged precision scores. Usage of document scores could be an alternative to averaged topic scores in pair wise system comparison.

Original languageEnglish
Title of host publication2016 3rd International Conference on Information Retrieval and Knowledge Management, CAMP 2016 - Conference Proceedings
EditorsFatimah Dato Ahmad, Nurazzah Abd Rahman, Alan F. Smeaton, Alistair Moffat, Muthukkaruppan Annamalai, Fakhrul Hazman, Shahrul Azman Mohd Noah, Zainab Abu Bakar
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages5
ISBN (Electronic)9781509029549
Publication statusPublished - 2016
Externally publishedYes
EventInternational Conference on Information Retrieval and Knowledge Management 2016 - Malacca, Malaysia
Duration: 23 Aug 201624 Aug 2016
Conference number: 3rd (Proceedings)


ConferenceInternational Conference on Information Retrieval and Knowledge Management 2016
Abbreviated titleCAMP 2016
Internet address


  • pairwise system comparison
  • retrieval system
  • significance test
  • TREC

Cite this