TY - GEN
T1 - Extracting Cultural Commonsense Knowledge at Scale
AU - Nguyen, Tuan Phong
AU - Razniewski, Simon
AU - Varde, Aparna
AU - Weikum, Gerhard
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/4/30
Y1 - 2023/4/30
N2 - Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents Candle, an end-to-end methodology for extracting high-quality cultural commonsense knowledge (CCSK) at scale. Candle extracts CCSK assertions from a huge web corpus and organizes them into coherent clusters, for 3 domains of subjects (geography, religion, occupation) and several cultural facets (food, drinks, clothing, traditions, rituals, behaviors). Candle includes judicious techniques for classification-based filtering and scoring of interestingness. Experimental evaluations show the superiority of the Candle CCSK collection over prior works, and an extrinsic use case demonstrates the benefits of CCSK for the GPT-3 language model. Code and data can be accessed at https://candle.mpi-inf.mpg.de/.
AB - Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents Candle, an end-to-end methodology for extracting high-quality cultural commonsense knowledge (CCSK) at scale. Candle extracts CCSK assertions from a huge web corpus and organizes them into coherent clusters, for 3 domains of subjects (geography, religion, occupation) and several cultural facets (food, drinks, clothing, traditions, rituals, behaviors). Candle includes judicious techniques for classification-based filtering and scoring of interestingness. Experimental evaluations show the superiority of the Candle CCSK collection over prior works, and an extrinsic use case demonstrates the benefits of CCSK for the GPT-3 language model. Code and data can be accessed at https://candle.mpi-inf.mpg.de/.
UR - http://www.scopus.com/inward/record.url?scp=85159356003&partnerID=8YFLogxK
U2 - 10.1145/3543507.3583535
DO - 10.1145/3543507.3583535
M3 - Conference contribution
AN - SCOPUS:85159356003
T3 - ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023
SP - 1907
EP - 1917
BT - ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023
PB - Association for Computing Machinery, Inc
T2 - 2023 World Wide Web Conference, WWW 2023
Y2 - 30 April 2023 through 4 May 2023
ER -