EU borders are constantly faced with a multiplicity of challenges, including human trafficking, smuggling, document fraud and illegal migration. Risk analysis and vulnerability assessment methodologies, e.g., CIRAM have several limitations, in particularly in terms of scope and data sources exploited as well as methodology employed to their analysis. Concerning the data sources, currently, risk assessment reports come predominantly from official databases and reports rather than open-source information, including traditional and social media. In the CRiTERIA project, special focus is therefore put to the analysis of narratives, events, attitudes, and to the vulnerability of borders and humans using the multimodal and multi-lingual data. In the following, we outline an exemplary use of a set of developed methods facilitating the prompt exploration of multilingual, cross-media dataset, combining the identification of use-case relevant set of events with document enrichment, geo-parsing, image captioning and narrative summarization at document level aiming at the collection and validation of risk-related evidence and the support of decision processes in risk analysis.
The GDELT project is the largest, comprehensive, high resolution open-access spatio-temporal dataset that indexes and inter-connects events published across media outlets in 65 live translated languages. For this use case, a selection of 14,626 articles indexed by GDELT between 1.1.2023 and 14.3.2023, with events linked to Estonia were collected. The article content was machine translated to English and geo-parsed to identify the instances of location names and their corresponding coordinates within Estonia and its neighbouring countries. The set of articles was selected based on the presence of project-related keywords and the location names within a 500km radius of the geographical centre of Estonia. The applied text classifier ranked the passages within each article depending on the relevance to the project use cases. The resulting documents were further enriched with narrative summarization; additionally, image captions were generated for the main image of the corresponding article. The map below presents 250 placemarks to media reports for Estonia, divided into the folders corresponding to the following event types: “arrest”, “asylum”, “drugs”, “migration”, “refugee”, “smuggling”, and “trafficking”. An example document along with the annotations is displayed in the top right corner of the map (including the core annotations).
While focused on the detection of CRiTERIA-relevant events and the creation of narrative summarizations from text, the presented use case also demonstrates a potentially efficient approach for the joint analysis of the diverse media sources, across platforms and across multiple languages. The rapid filtering and visualisation of the events based on, for example, their location, type, participants, or date-ranges can be useful for assessing and interconnecting evidence linked to risk analysis. The comparisons of the narratives used across different media outlets or languages could also contribute to enhancing the understanding of how topics are being presented and discussed, the identification of false or misleading narratives, the analysis of perception and misperceptions, and the detection of biases at the media outlet or language levels. The presented harvesting and enrichment technologies are complemented by project partners’ backend-technologies based on big-data technologies to store media and make them accessible for rapid visualization and analysis purposes.
Marcin Skowron
Marcin Skowron was an AI scientist at HENSOLDT Analytics. He holds a Ph.D. (Doctor of Engineering) from Signal Processing and Language Information Science Laboratory, Hokkaido University, Japan and a Master Degree in Economics, University of Gdańsk, Poland. Before joining HENSOLDT Analytics, he worked as a Senior Research Scientist at Austrian Research Institute for Artificial Intelligence, Postdoctoral Research Associate at the Department of Computational Perception, Johannes Kepler University, and a lecturer at Hokkaido Institute of Technology. His technical expertise includes Natural Language Processing, Information Extraction and Conversational Agents.