Relevance judgments exclusive of human assessors in large scale information retrieval evaluation experimentation

Prabha Rajagopal, Sri Devi Ravana, Maizatul Akmar Ismail

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)


Inconsistent judgments by various human assessors' compromises the reliability of the relevance judgments generated for large scale test collections. An automated method that creates a similar set of relevance judgments (pseudo relevance judgments) that eliminate the human efforts and errors introduced in creating relevance judgments is investigated in this study. Traditionally, the participating systems in TREC are measured by using a chosen metrics and ranked according to its performance scores. In order to generate these scores, the documents retrieved by these systems for each topic are matched with the set of relevance judgments (often assessed by humans). In this study, the number of occurrences of each document per topic from the various runs will be used with an assumption, the higher the number of occurrences of a document, the possibility of the document being relevant is higher. The study proposesa method with a pool depth of 100 using the cutoff percentage of >35% that could provide an alternate way of generating consistent relevance judgments without the involvement of human assessors.

Original languageEnglish
Pages (from-to)80-94
Number of pages15
JournalMalaysian Journal of Computer Science
Issue number2
Publication statusPublished - 2014
Externally publishedYes


  • Information retrieval
  • Large scale experimentation
  • Relevance judgments
  • Retrieval evaluation

Cite this