Our colleagues at Idiap Research Institute recently participated in a CASE22@ EMNLP challenge on Cause-Effect-Signal extraction with two papers addressing the task of (1) identifying whether a sentence from the news media contains some Cause-Effect-Signal triplet and (2) extracting all Cause-Effect-Signal triplets from the sentence.
- IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach by Sergio Burdisso, Juan Zuluaga-Gomez, Esau Villatoro-Tello, Martin Fajcik, Muskaan Singh, Pavel Smrz, Petr Motlicek (to be presented at the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text)
- IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model by Martin Fajcik, Muskaan Singh, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Pavel Smrz (Camera-ready for CASE@EMNLP)
The two papers written as a result of the challenge will be officially included in the CASE 2022 Workshop at the 2022 Conference on Empirical Methods in Natural Language Processing EMNLP’22 proceedings.
The pre-print of the papers is available via our publications portal as well as via the links below:
- IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach – access here
- IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model – access here
"Efficient Causal Relation Identification & Extracting Cause-Effect-Signal Triplets" - explanation by the team at Idiap
Automatically identifying what-caused-what causal events across the news media is a challenging task, especially in low-resource scenarios. We addressed this very issue in our submission to the CASE-2022@EMNLP challenge, focusing on Event Causality Identification and Extraction on the newly presented Casual News Corpus (Tan et al., 2022). In subtask 1, we built a system which identified sentences containing causal events, whereas in subtask 2, we proposed a system that extracts all causes, effects, and connecting signal triplets from these sentences.
In subtask 1, we apply a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction approach for fine-tuning LMs in which the identification task is treated as a masked language modeling problem (MLM). This approach allows LMs natively pre-trained on MLM problems to directly generate textual responses to identification-specific prompts. We compare the performance of this method against ensemble techniques trained on the entire dataset. Our best-performing submission was trained only with 256 instances per class, a small portion of the entire dataset, and yet was able to obtain the second-best precision (0.82), third-best accuracy (0.82), and an F1-score (0.85) very close to what was reported by the team that ultimately won the challenge (0.86).
In subtask 2, we detect cause-effect-signal spans in a sentence using T5 — a pre-trained autoregressive language model. We iteratively identify all cause-effect-signal span triplets, always conditioning the prediction of the next triplet on the previously predicted ones. To predict the triplet itself, we consider different causal relationships such as cause→effect→signal. Each triplet component is generated via a language model conditioned on the sentence, the previous parts of the current triplet, and previously predicted triplets. Despite training on an extremely small dataset of 160 samples, our approach achieved competitive performance, being placed second in the competition. Furthermore, we show that assuming either cause→effect or effect→cause order achieves similar results.
Our code and model predictions will be released online.
Find out more about the work done by the Idiap Research Institute here.