Explainable Semantic Text Relations: A Question-Answering Framework for Comparing Document Content

Does one text completely substitute for the other, do they only partly overlap, or does each contain unique information? Questions of this nature underlie many information and document management tasks, but they are not always easy to answer with accuracy.

This study proposes a new formal approach for understanding the semantic relations between texts, based on the answerable question sets (AQA) for each text. In this approach, semantic relations such as equivalence, inclusion, or partial overlap can be defined using systematic comparison between the questions each text enables to answer.

Based on this definition, a specialized synthetic database was built, featuring pairs of texts with various semantic relationships. The database was compiled using controlled paraphrasing and deliberate removal of information items in a way that enables to test which information items were retained, which were eliminated, and which were added. The database was used to evaluate various models of natural language processing, and to test whether they can identify real differences of meaning, rather than mere surface-level lexical similarity.

The study’s findings indicate that representing meaning through questions allows not only to compare texts, but also to understand precisely, explicably, and formally what is truly the same in them and what is different.

READ THE PAPER