Title: Consolidating and Exploring Open Textual Knowledge
How can we capture effectively the information expressed in multiple texts? How can we allow people, as well as computer applications, to easily explore it? The current semantic NLP pipeline typically ends at the single sentence or text level, putting the burden on applications to consolidate and present related information across multiple texts. Further, semantic representations, which may provide the basis for text consolidation, are often based on non-trivial schemata which require expert annotation, making it a huge effort to create large scale corpora for training.
In this talk, I will outline a research program whose goals are to represent consolidated information conveyed in multiple texts and to communicate it effectively to users. This program builds upon three quite unexplored research lines. First, we aim to establish a “natural” semantic representation for individual texts, which is based solely on crowdsourcable natural language expressions rather than on pre-specified schemata. To that end, we follow and extend the recent Question-Answer Semantic Role Labeling (QA-SRL) approach, through which we decompose sentence information to question-answer pairs, each representing an individual statement. Second, we are developing approaches for consolidating information structures of different texts, while requiring substantial extension of cross-text co-reference detection. The goal is to yield a consolidated structure that may be seen as an “open” analogous to traditional knowledge graphs, representing real-world elements and statements relating them. Third, we are developing a framework for interactive exploration of multi-text information, while addressing the challenging task of systematic and replicable evaluation of such interactive methods. I will provide an overview of the framework and its three research lines and illustrate different types of the evolving research tasks.
Ido Dagan is a Professor at the Department of Computer Science at Bar-Ilan University, Israel and a Fellow of the Association for Computational Linguistics (ACL). His interests are in applied semantic processing, focusing on textual inference, natural language based knowledge representation and acquisition, and text exploration. Dagan and colleagues established the textual entailment recognition paradigm. He was the President of the ACL in 2010 and served on its Executive Committee during 2008-2011. In that capacity, he led the establishment of the journal Transactions of the Association for Computational Linguistics. Dagan received his B.A. summa cum laude and his Ph.D. (1992) in Computer Science from the Technion. He was a research fellow at the IBM Haifa Scientific Center (1991) and a Member of Technical Staff at AT&T Bell Laboratories (1992-1994). During 1998-2003 he was co-founder and CTO of FocusEngine and VP of Technology of LingoMotors. He is currently heading the initiative of setting up the Bar-Ilan University Data Science Institute.
96 Euston Rd