CRiTERIA Publications

Welcome to the CRiTERIA Project website’s section featuring the project’s scientific papers and publications! 

Contact us via this form in case you have any questions about the materials available for download. 

We are on Zenodo!

View the curated CRiTERIA project publications in our Zenodo community.

Publication

Bridging Qualitative Data Silos: The Potential of Reusing Codings Through Machine Learning Based Cross-Study Code Linking

Authors: Sergej Wildemann, Claudia Niederée, and Erick Elejalde | L3S Research Center

For qualitative data analysis (QDA), researchers assign codes to text segments to arrange the information into topics or concepts. These annotations facilitate information retrieval and the identification of emerging patterns in unstructured data. However, this metadata is typically not published or reused after the research. Subsequent studies with similar research questions require a new definition of codes and do not benefit from other analysts’ experience. Machine learning (ML) based classification seeded with such data remains a challenging task due to the ambiguity of code definitions and the inherent subjectivity of the exercise. Previous attempts to support QDA using ML rely on linear models and only examined individual datasets that were either smaller or coded specifically for this purpose. However, we show that modern approaches effectively capture at least part of the codes’ semantics and may generalize to multiple studies. We analyze the performance of multiple classifiers across three large real-world datasets. Furthermore, we propose an ML-based approach to identify semantic relations of codes in different studies to show thematic faceting, enhance retrieval of related content, or bootstrap the coding process. These are encouraging results that suggest how analysts might benefit from prior interpretation efforts, potentially yielding new insights into qualitative data.

Social Science Computer Review | November 13, 2023

Publication

Migration Reframed? A multilingual analysis on the stance shift in Europe during the Ukrainian crisis

Authors: Sergej Wildemann, Claudia Niederée, and Erick Elejalde | L3S Research Center

The war in Ukraine seems to have positively changed the attitude toward the critical societal topic of migration in Europe — at least towards refugees from Ukraine. We investigate whether this impression is substantiated by how the topic is reflected in online news and social media, thus linking the representation of the issue on the Web to its perception in society. For this purpose, we combine and adapt leading-edge automatic text processing for a novel multilingual stance detection approach. Starting from 5.5M Twitter posts published by 565 European news outlets in one year, beginning September 2021, plus replies, we perform a multilingual analysis of migration-related media coverage and associated social media interaction for Europe and selected European countries.

The results of our analysis show that there is actually a reframing of the discussion illustrated by the terminology change, e.g., from “migrant” to “refugee”, often even accentuated with phrases such as “real refugees”. However, concerning a stance shift in public perception, the picture is more diverse than expected. All analyzed cases show a noticeable temporal stance shift around the start of the war in Ukraine. Still, there are apparent national differences in the size and stability of this shift.

This paper is published in The Web Conference 2023 | April 30 – May 4, 2023

Publication

Stance Inference in Twitter through Graph Convolutional Collaborative Filtering Networks with Minimal Supervision

Authors: Zhiwei Zhou and Erick Elejalde | L3S Research Center

Social Media (SM) has become a stage for people to share thoughts, emotions, opinions, and almost every other aspect of their daily lives. This abundance of human interaction makes SM particularly attractive for social sensing. Especially during polarizing events such as political elections or referendums, users post information and encourage others to support their side, using symbols such as hashtags to represent their attitudes. However, many users choose not to attach hashtags to their messages, use a different language, or show their position only indirectly. Thus, automatically identifying their opinions becomes a more challenging task. To uncover these implicit perspectives, we propose a collaborative filtering model based on Graph Convolutional Networks that exploits the textual content in messages and the rich connections between users and topics. Moreover, our approach only requires a small annotation effort compared to state-of-the-art solutions. Nevertheless, the proposed model achieves competitive performance in predicting individuals’ stances. We analyze users’ attitudes ahead of two constitutional referendums in Chile in 2020 and 2022. Using two large Twitter datasets, our model achieves improvements of 3.4% in recall and 3.6% in accuracy over the baselines.

This paper is published in The Web Conference 2023 | April 30 – May 4, 2023

Publication

Learning Faithful Attention for Interpretable Classification of Crisis-Related Microblogs under Constrained Human Budget

Authors: Thi Huyen Nguyen and Koustav Rudra

The recent widespread use of social media platforms has created convenient ways to obtain and spread up-to-date information during crisis events such as disasters. Time-critical analysis of crisis data can help human organizations gain actionable information and plan for aid responses. Many existing studies have proposed methods to identify informative messages and categorize them into different humanitarian classes. Advanced neural network architectures tend to achieve state-of-the-art performance, but the model decisions are opaque. While attention heatmaps show insights into the model’s prediction, some studies found that standard attention does not provide meaningful explanations. Alternatively, recent works proposed interpretable approaches for the classification of crisis events that rely on human rationales to train and extract short snippets as explanations. However, the rationale annotations are not always available, especially in real-time situations for new tasks and events. In this paper, we propose a two-stage approach to learn the rationales under minimal human supervision and derive faithful machine attention. Extensive experiments over four crisis events show that our model is able to obtain better or comparable classification performance (~86% Macro-F1) to baselines and faithful attention heatmaps using only 40-50% human-level supervision. Further, we employ a zero-shot learning setup to detect actionable tweets along with actionable word snippets as rationales.

This paper is published in The Web Conference 2023 | April 30 – May 4, 2023

Publication

Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

Authors: Martin Fajcik, Petr Motlicek, and Pavel Smrz

We present Claim-Dissector: a novel latent variable model for fact-checking and fact-analysis, which given a claim and a set of retrieved provenances allows learning jointly: (i) what are the relevant provenances to this claim (ii) what is the veracity of this claim. We propose to disentangle the per-provenance relevance probability and its contribution to the final veracity probability in an interpretable way – the final veracity probability is proportional to a linear ensemble of per-provenance relevance probabilities. This way, it can be clearly identified the relevance of which sources contributes to what extent towards the final probability. We show that our system achieves state-of-the-art results on FEVER dataset comparable to two-stage systems typically used in traditional fact-checking pipelines, while it often uses significantly less parameters and computation.

July 2022

ABSTRACT

Impact of COVID-19 on Chile’s Internal Migration

Authors: Erick Elejalde, Victor Navarro, Loreto Bravo, and Leo Ferres

The “Impact of COVID-19 on Chile’s Internal Migration” paper was submitted to the NetSci-X 2023 conference. The full paper will be available soon.

ABSTRACT

Migration Reframed? Multilingual analysis on the stance shift in Europe during the Ukrainian crisis

Authors: Sergej Wildemann and Erick Elejalde | L3S Research Center

The “Migration Reframed? Multilingual analysis on the stance shift in Europe during the Ukrainian crisis” paper was presented at the NetSci-X 2023 conference

Publication

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022

Authors: Damianos Galanopoulos and Vasileios Mezaris | CERTH-ITI

Matching images to articles is challenging and can be considered a special version of the cross-media retrieval problem. This working note paper presents our solution for the MediaEval NewsImages benchmarking task. We investigated the performance of two cross-modal networks, a pre-trained network and a trainable one, the latter originally developed for text-video retrieval tasks and adapted to the NewsImages task. Moreover, we utilize a method for revising the similarities produced by either one of the cross-modal networks, i.e., a dual softmax operation, to improve our solutions’ performance. We report the official results for our submitted runs and additional experiments we conducted to evaluate our runs internally.

Multimedia Evaluation Workshop (MediaEval’22), January 12-13, 2023, Bergen, Norway.

SLIDES

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022

Authors: Damianos Galanopoulos and Vasileios Mezaris | CERTH-ITI

The following slides were part of the paper presentation at the Multimedia Evaluation Workshop (MediaEval’22), January 12-13, 2023, Bergen, Norway.

Publication

VERGE in VBS 2023

Authors: Nick Pantelidis, Stelios Andreadis, Maria Pegia, Anastasia Moumtzidou, Damianos Galanopoulos, Konstantinos Apostolidis, Despoina Touska, Konstantinos Gkountakos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, and Ioannis Kompatsiaris | CERTH-ITI

This paper describes VERGE, an interactive video retrieval system for browsing a collection of images from videos and searching for specific content. The system utilizes many retrieval techniques as well as fusion and reranking capabilities. A Web Application is also part of VERGE, where a user can create queries, view the top results and submit the appropriate data, all in a user-friendly way.

International Conference on Multimedia Modeling (MMM2023), January 9-12, 2023, Bergen, Norway.

Publication

Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using a New Frame Selection Policy and Gating Mechanism

Authors: Nikolaos Gkalelis, Dimitrios Daskalakis, and Vasileios Mezaris | CERTH-ITI

In this paper, Gated-ViGAT, an efficient approach for video event recognition, utilizing bottom-up (object) information, a new frame sampling policy and a gating mechanism is proposed. Specifically, the frame sampling policy uses weighted in-degrees (WiDs), derived from the adjacency matrices of graph attention networks (GATs), and a dissimilarity measure to select the most salient and at the same time diverse frames representing the event in the video. Additionally, the proposed gating mechanism fetches the selected frames sequentially, and commits early-exiting when an adequately confident decision is achieved. In this way, only a few frames are processed by the computationally expensive branch of our network that is responsible for the bottom-up information extraction. The experimental evaluation on two large, publicly available video datasets (MiniKinetics, ActivityNet) demonstrates that Gated-ViGAT provides a large computational complexity reduction in comparison to our previous approach (ViGAT), while maintaining the excellent event recognition and explainability performance.

IEEE International Symposium on Multimedia 2022, December 2022, in Naples, Italy.

SLIDES

Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using a New Frame Selection Policy and Gating Mechanism

Authors: Nikolaos Gkalelis, Dimitrios Daskalakis, and Vasileios Mezaris | CERTH-ITI

The following slides were part of the paper presentation at the IEEE International Symposium on Multimedia 2022, December 2022, in Naples, Italy.

Publication

TAME: Attention Mechanism Based Feature Fusion for Generating Explanation Maps of Convolutional Neural Networks

Authors: Mariano Ntrougkas, Nikolaos Gkalelis, and Vasileios Mezaris | CERTH-ITI

The apparent “black box” nature of neural networks is a barrier to adoption in applications where explainability is essential. This paper presents TAME (Trainable Attention Mechanism for Explanations), a method for generating explanation maps with a multi-branch hierarchical attention mechanism. TAME combines a target model’s feature maps from multiple layers using an attention mechanism, transforming them into an explanation map. TAME can easily be applied to any convolutional neural network (CNN) by streamlining the optimization of the attention mechanism’s training method and the selection of target model’s feature maps. After training, explanation maps can be computed in a single forward pass. We apply TAME to two widely used models, i.e. VGG-16 and ResNet-50, trained on ImageNet and show improvements over previous top-performing methods. We also provide a comprehensive ablation study comparing the performance of different variations of TAME’s architecture.

IEEE International Symposium on Multimedia 2022, December 2022, in Naples, Italy.

SLIDES

TAME: Attention Mechanism Based Feature Fusion for Generating Explanation Maps of Convolutional Neural Networks

Authors: Mariano Ntrougkas, Nikolaos Gkalelis, and Vasileios Mezaris | CERTH-ITI

The following slides were part of the “TAME: Attention Mechanism Based Feature Fusion for Generating Explanation Maps of Convolutional Neural Networks” paper presentation delivered at the IEEE International Symposium on Multimedia 2022, December 2022, in Naples, Italy.

Publication

ITI-CERTH participation in ActEV and AVS Tracks of
TRECVID 2022

Author: Konstantinos Gkountakos, Damianos Galanopoulos, Despoina Touska, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, and Ioannis Kompatsiaris | CERTH-ITI

This report presents the overview of the runs related to Ad-hoc Video Search (AVS) and Activities in Extended Video (ActEV) tasks on behalf of the ITI-CERTH team. Our participation in the AVS task is based on a cross-modal deep network architecture utilizing several textual and visual features. As part of the retrieval stage, a dual-softmax approach is utilized to revise the calculated text-video
similarities. For the ActEV task, we adapt our framework to fit the new dataset and overcome the challenges of detecting and recognizing activities in a multi-label manner while experimenting with two separate activity classifiers.

Proc. TRECVID 2022 Workshop, December, 2022

SLIDES

Explaining the Decisions of Image/Video Classifiers

Author: Vasileios Mezaris | CERTH-ITI

The following slides were presented at the 1st Nice Workshop on Interpretability, November 17-18, 2022, Université Côte d’Azur, Nice, France.

Publication

L3S at TREC 2022 CrisisFACTS track

Authors: Thi Huyen Nguyen and Koustav Rudra

This paper describes our proposed approach for the multi-stream summarization of the crisis-related events in the TREC 2022 CrisisFACTS track. We apply a retrieval and ranking-based two-step summarization approach. First, we employ a sparse retrieval framework where content texts from multiple online streams are treated as a document corpus, and a term matching-based retrieval strategy is used to retrieve relevant contents, so-called facts, to the set of queries in a given event day. Next, we use several pre-trained models to measure semantic similarity between query-fact or fact-fact pairs, score and rank the facts for the extraction of daily event summaries.

TREC2022: 31st Text Retrieval Conference (TREC) | November 14-18, 2022

Publication

CrisICSum: Interpretable Classification and Summarization Platform for Crisis Events from Microblogs

Authors: Thi Huyen Nguyen, Miroslav Shaltev, and Koustav Rudra

Microblogging platforms such as Twitter, receive massive messages during crisis events. Real-time insights are crucial for emergency response. Hence, there is a need to develop faithful tools for efficiently digesting information. In this paper, we present CrisICSum, a platform for classification and summarization of crisis events. The objective of CrisICSum is to classify user posts during disaster events into different humanitarian classes (i.e., damage, affected people, etc.) and generate summaries of class-level messages. Unlike existing systems, CrisICSum employs an interpretable by design backend classifier. It can generate explanations for output decisions. Besides, the platform allows user feedback on both classification and summarization phases. CrisICSum is designed and run as an easily integrated web application. Backend models are interchangeable. The system can assist users and human organizations in improving response efforts during disaster situations. CrisICSum is available at https://crisicsum.l3s.uni-hannover.de 

CIKM’22: Proc. of the 31st ACM International Conference on Information & Knowledge Management | October 17-21, 2022

Publication

Rationale Aware Contrastive Learning Based Approach to Classify and Summarize Crisis-Related Microblogs

Authors: Thi Huyen Nguyen and Koustav Rudra

Recent fashion of information propagation on Twitter makes the platform a crucial conduit for tactical data and emergency responses during disasters. However, the real-time information about crises is immersed in a large volume of emotional and irrelevant posts. It brings the necessity to develop an automatic tool to identify disaster-related messages and summarize the information for data consumption and situation planning. Besides, explainability of the methods is crucial in determining their applicability in real-life scenarios. Recent studies also highlight the importance of learning a good latent representation of tweets for several downstream tasks. In this paper, we take advantage of state-of-the-art methods, such as transformers and contrastive learning to build an interpretable classifier. Our proposed model classifies Twitter messages into different humanitarian categories and also extracts rationale snippets as supporting evidence for output decisions. The contrastive learning framework helps to learn better representations of tweets by bringing the related tweets closer in the embedding space. Furthermore, we employ classification labels and rationales to efficiently generate summaries of crisis events. Extensive experiments over different crisis datasets show that (i). our classifier obtains the best performance-interpretability trade-off, (ii). the proposed summarizer shows superior performance (1.4%-22% improvement) with significantly less computation cost than baseline models.

CIKM’22: Proc. of the 31st ACM International Conference on Information & Knowledge Management | October 17-21, 2022

SLIDES

Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval – Slides

Authors: Damianos Galanopoulos and Vasileios Mezaris | CERTH-ITI

The following slides were presented at the ECCV 2022 Workshop on AI for Creative Video Editing and Understanding (CVEU) in October 2022 to discuss the “Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval” paper.

Publication

Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval

Authors: Damianos Galanopoulos and Vasileios Mezaris | CERTH-ITI

In this paper we tackle the cross-modal video retrieval problem and, more specifically, we focus on text-to-video retrieval. We investigate how to optimally combine multiple diverse textual and visual features into feature pairs that lead to generating multiple joint feature spaces, which encode text-video pairs into comparable representations. To learn these representations our proposed network architecture is trained by following a multiple-space learning procedure. Moreover, at the retrieval stage, we introduce additional softmax operations for revising the inferred query-video similarities. Extensive experiments in several setups based on three large-scale datasets (IACC.3, V3C1, and MSR-VTT) lead to conclusions on how to best combine text-visual features and document the performance of the proposed network.

ECCV 2022 Workshop on AI for Creative Video Editing and Understanding (CVEU) | October 16, 2022

SLIDES

Learning Visual Explanations for DCNN-Based Image Classifiers Using an Attention Mechanism – Slides

Authors: Ioanna Gkartzonika, Nikolaos Gkalelis, Vasileios Mezaris | CERTH-ITI

The following slides were presented at the ECCV 2022 Workshop on Vision with Biased or Scarce Data (VBSD) in October 2022 to discuss the “Learning Visual Explanations for DCNN-Based Image Classifiers Using an Attention Mechanism” paper.

Publication

Learning Visual Explanations for DCNN-Based Image Classifiers Using an Attention Mechanism

Authors: Ioanna Gkartzonika, Nikolaos Gkalelis, and Vasileios Mezaris | CERTH-ITI

In this paper two new learning-based eXplainable AI (XAI) methods for deep convolutional neural network (DCNN) image classifiers, called L-CAM-Fm and L-CAM-Img, are proposed. Both methods use an attention mechanism that is inserted in the original (frozen) DCNN and is trained to derive class activation maps (CAMs) from the last convolutional layer’s feature maps. During training, CAMs are applied to the feature maps (L-CAM-Fm) or the input image (L-CAM-Img) forcing the attention mechanism to learn the image regions explaining the DCNN’s outcome. Experimental evaluation on ImageNet shows that the proposed methods achieve competitive results while requiring a single forward pass at the inference stage. Moreover, based on the derived explanations a comprehensive qualitative analysis is performed providing valuable insight for understanding the reasons behind classification errors, including possible dataset biases affecting the trained classifier.

ECCV 2022 Workshop on Vision with Biased or Scarce Data (VBSD) | October 24, 2022

Publication

IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

Authors: Martin Fajcik, Muskaan Singh, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, and Pavel Smrz

In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 — a pre-trained autoregressive language model. We iteratively identify all cause-effect-signal span triplets, always conditioning the prediction of the next triplet on the previously predicted ones. To predict the triplet itself, we consider different causal relationships such as cause→effect→signal. Each triplet component is generated via a language model conditioned on the sentence, the previous parts of the current triplet, and previously predicted triplets. Despite training on an extremely small dataset of 160 samples, our approach achieved competitive performance, being placed second in the competition. Furthermore, we show that assuming either cause→effect or effect→cause order achieves similar results.

CASE@EMNLP 2022: 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text | December 7-8, 2022

Publication

IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

Authors: Sergio Burdisso, Juan Zuluaga-Gomez, Esau Villatoro-Tello, Martin Fajcik, Muskaan Singh, Pavel Smrz, and Petr Motlicek

In this paper, we describe our participation in subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM). This approach allows LMs natively pre-trained on MLM problems to directly generate textual responses to CRI-specific prompts. We compare the performance of this method against ensemble techniques trained on the entire dataset. Our best-performing submission was fine-tuned with only 256 instances per class, 15.7% of all available data, and yet obtained the second-best precision (0.82), third-best accuracy (0.82), and an F1-score (0.85) very close to what was reported by the winner team (0.86).

CASE@EMNLP 2022: 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text | December 7-8, 2022

Publication

ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network

Authors: Nikolaos Gkalelis, Dimitrios Daskalakis, and Vasileios Mezaris | CERTH-ITI

In this paper, a pure-attention bottom-up approach, called ViGAT, utilizes an object detector together with a Vision Transformer (ViT) backbone network to derive object and frame features, and a head network to process these features for the task of event recognition and explanation in video is proposed. The ViGAT head consists of graph attention network (GAT) blocks factorized along the spatial and temporal dimensions in order to capture effectively both local and long-term dependencies between objects or frames. Moreover, using the weighted in-degrees (WiDs) derived from the adjacency matrices at the various GAT blocks, we show that the proposed architecture can identify the most salient objects and frames that explain the decision of the network. A comprehensive evaluation study is performed, demonstrating that the proposed approach provides state-of-the-art results on three large, publicly available video datasets (FCVID, MiniKinetics, ActivityNet). The source code is made publicly available at: https://github.com/bmezaris/ViGAT

IEEE Access | October 2022