Systematic Assessment of Factual Knowledge in Large Language Models

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

8 Citations (Scopus)

Abstract

Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context.

Original languageEnglish
Title of host publicationEMNLP 2023, The 2023 Conference on Empirical Methods in Natural Language Processing, Findings of the Association for Computational Linguistics: EMNLP 2023
EditorsNadi Tomeh, Atsushi Fujita, Aixin Sun, Bin Wang, Rong Tong, Ryan Cotterell
Place of PublicationStroudsburg PA USA
PublisherAssociation for Computational Linguistics (ACL)
Pages13272-13286
Number of pages15
ISBN (Electronic)9798891760615
DOIs
Publication statusPublished - 2023
EventEmpirical Methods in Natural Language Processing 2023 - , Singapore
Duration: 6 Dec 202310 Dec 2023
https://2023.emnlp.org/
https://aclanthology.org/volumes/2023.findings-emnlp/ (Proceedings)
https://aclanthology.org/volumes/2023.emnlp-demo/ (Proceedings)

Conference

ConferenceEmpirical Methods in Natural Language Processing 2023
Abbreviated titleEMNLP 2023
Country/TerritorySingapore
Period6/12/2310/12/23
Internet address

Cite this