You are currently viewing Towards Factuality Assessment

Towards Factuality Assessment

The generation of news around the world is growing at a fast speed, even faster with continuously improved versions of AI content generators like GPT. Such text generators (for example: Grammarly) – either in the form of API or online services – facilitate writing and readability and are able under certain scenarios to write a complete blog.

As content creation evolves, automatic verification of websites (i.e., reliability) becomes crucial. In response and with the objective of verifying the authenticity of content on the internet, several browser plugins have been developed, for example:

  • newsGuard. This browser extension shows trust score icons and a summary of certain websites. The assessment is performed by trained analysts with journalism experience. The paid service is available in the US, Canada and Europe.
  • trustedNews. Based on AI algorithms, this tool assists in evaluating the English quality content providing an objectivity score. It is only working on long-form articles.
  • the-factual-news-evaluator. Using four factors (diversity, opinion, expertise and historical reputation), this service instantly evaluates how informative a displayed content is (including social media).

Additionally, online services for fact-checking, such as Google fact check, PolitiFact, or the SPJ Toolbox,  evaluate the credibility of news content either semi-automatically or manually where authenticated publishers or journalists manually compile and evaluate the evidence.

Semi-automatic frameworks are usually based on the identification of relevant information by searching articles given a query claim. Retrieved articles are ranked based on the similarity to the provided phrase (query claim). However, top-ranked results do not necessarily represent the truthfulness value of the original claim but rather its similarity to the retrieved articles.

The technology track of the CRITERIA project aims to bridge the gap between (combined) evidence for events, trends, biases, risks, threats etc., and reliable risk and threat analysis results in the context of migration. To address this objective, the project proposes to design and widely share a factual assessment system trying to mimic the competent duo “Dr. Watson” and “Sherlock Holmes“. On the one hand, the “Dr. Watson” component processes information given a claim and lists the most relevant and promising results. On the other hand, the “Sherlock Holmes” component takes care of verifying the retrieved results based on a strict analysis on the content and the sources to finally provide a verdict (Support, Refute or Not enough information).

Following this idea, our colleagues Martin Fajčík, Petr Motlíček, and Pavel Smrž developed a model called “Claim-Dissector“. Given a claim and based on the content of the retrieved results, the Claim-Dissector takes care of the verdict. This model is described in detail in the recently published article: “Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction“. As shown in the figure below, the model is composed of two main modules:

  • The retriever, given a claim (e.g., “Safety and security in Syria are allowing Syrian refugees to return to their homes”), collects all possible sources of evidence from Wikipedia and presents them to the verifier in a pre-ranked fashion. The retriever already gets rid-off articles that are not relevant (e.g., an hyperlink redirecting to a page with completely different content) or with highly repetitive content (i.e., very slight changes in content and still published from different sources).
  • The verifier’s task is to question the relevance of each piece of information as well as the source and to assign a probability of the claim’s veracity. The relevance is evaluated at different levels starting from paragraphs and sentences to word level (tokens). Eventually, documents are separated into three groups, the ones supporting, the ones refuting the statement, and the third group contains documents that are relevant but with no clear orientation. Each document is stored with its respective hyperlink, language and publication date. The verifier evaluates the ranked list by its relevance (RS: Relevant Score) and validity of relevant terms (highlighted by importance from gray non relevant to red highly relevant, and in the case of Support-statements highlights range from gray to green). The identification at the word level assures that the top ranked selected sources are indeed most relevant to the questioned topic and allows the final reader to visualize how much they contribute to the verdict.
Visualization of the Claim-Dissector model. In color: decision-making terms. Relevance score (RS): the probability that the sentence is relevant. Prediction score (PS): how much its RS contributed.

This first version of the Claim-Dissector model is trained with evidence from Wikipedia and does not consider other factors such as reliability of the source. The Claim-Dissector is focused on modeling reranking and veracity prediction jointly in an interpretable way. The key capability of the model is its uniqueness to contrast conflicting evidence that can be further analyzed if the prediction score is considered low.

To enhance the verifier’s capabilities, we are currently exploring different methods to estimate source reliability scores. An a priori score of website historical credibility is considered to compute a more robust verdict.

More results and in-depth analysis can be found in the paper “Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction” now available for download in the CRiTERIA’s publication hub.

Dairazalia Sanchez-Cortes, PhD

Dairazalia Sanchez-Cortes, PhD

Dairazalia Sanchez-Cortes is currently working as a Postdoctoral Researcher at the Idiap Research Institute in Switzerland. She joined the Speech and Audio Processing Group in 2023. She holds a PhD in Sciences from the EPFL’s School of Engineering. Her research interests include machine learning, human activity modeling, nonverbal behavior and applied research.

Sergio Burdisso, PhD

Sergio Burdisso, PhD

Sergio Burdisso is currently a Postdoctoral Researcher at Idiap research institute in Switzerland. He's actively collaborating with Dr. Petr Motlicek in the Speech and Audio Processing Group. Sergio holds a Ph.D. in Computer Science specialized in Natural Language Processing applied to Early Risk Identification on Social Media. His main research interests include topics such as interpretable machine learning, few-shot learning, and representation learning for dialogue modeling.

Dr. Petr Motlicek

Dr. Petr Motlicek

Petr Motlicek has been a research scientist in the Speech and Audio Processing Group since 2005 at the Idiap Research Institute in Switzerland. His research activities are focused on audio and speech processing technologies (voice coding and recognition, and speaker recognition), conversation analysis and machine learning. Many of the designed applications are developed in collaboration with security/government (LEA) bodies in Switzerland, or at the EU level. He has significantly contributed to Kaldi- open-source software developed for speech and speaker recognition tasks, with many new libraries for signal processing being provided by Idiap.

Banner image by Agence Olloweb on Unsplash.