Coreference in Long Documents using Hierarchical Entity Merging
Published in SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL), 2024
Recommended citation: Talika Gupta, Hans Ole Hatzel, Chris Biemann, “Coreference in Long Documents using Hierarchical Entity Merging”, SIGHUM Workshop (LaTeCH-CLfL), EACL 2024 https://www.inf.uni-hamburg.de/en/inst/ab/lt/publications/2024-gupta-et-al-sighum.pdf
Current top-performing coreference resolution approaches are limited with regard to the maximum lengths of texts they can accept. We explore a recursive merging technique of entities that allows us to apply coreference models to texts of arbitrary length. In experiments on established datasets we quantify the drop in resolution quality caused by this approach. Finally we use an under-explored resource in the form of a fully coreference annotated novel to illustrate our models performance for long documents in practice. On this novel we achieve state-of-the-art performance, outperforming previous systems capable of handling long documents.
