Extracting Cultural Commonsense Knowledge at Scale

Tuan Phong Nguyen, Simon Razniewski, Aparna Varde, Gerhard Weikum

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents Candle, an end-to-end methodology for extracting high-quality cultural commonsense knowledge (CCSK) at scale. Candle extracts CCSK assertions from a huge web corpus and organizes them into coherent clusters, for 3 domains of subjects (geography, religion, occupation) and several cultural facets (food, drinks, clothing, traditions, rituals, behaviors). Candle includes judicious techniques for classification-based filtering and scoring of interestingness. Experimental evaluations show the superiority of the Candle CCSK collection over prior works, and an extrinsic use case demonstrates the benefits of CCSK for the GPT-3 language model. Code and data can be accessed at https://candle.mpi-inf.mpg.de/.

Original languageEnglish
Title of host publicationACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023
PublisherAssociation for Computing Machinery, Inc
Pages1907-1917
Number of pages11
ISBN (Electronic)9781450394161
DOIs
StatePublished - 30 Apr 2023
Event2023 World Wide Web Conference, WWW 2023 - Austin, United States
Duration: 30 Apr 20234 May 2023

Publication series

NameACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023

Conference

Conference2023 World Wide Web Conference, WWW 2023
Country/TerritoryUnited States
CityAustin
Period30/04/234/05/23

Cite this