Replication data for: The beginning of a beautiful friendship: rule-based and statistical analysis of Middle Russianhttps://doi.org/10.18710/T9NQ9LBerdicevskis, AleksandrsEckhoff, HanneGavrilova, TatjanaDataverseNO2017-08-102023-09-28T20:25:26ZWe describe and compare two tools for processing Middle Russian texts.
Both tools provide lemmatization, part-of-speech and morphological annotation.
One (“RNC”) was developed for annotating texts in the Russian
National Corpus and is rule-based. The other one (“TOROT”) is being used
for annotating the eponymous corpus and is statistical. We apply the two
analyzers to the same Middle Russian text and then compare their outputs
with high-quality manual annotation. Since the analyzers use different annotation
schemes and spelling principles, we have to harmonize their outputs
before we can compare them. The comparison shows that TOROT
performs considerably better than RNC (lemmatization 69.8% vs. 47.3%,
part of speech 89.5% vs. 54.2%, morphology 81.5% vs. 16.7%). If, however,
we limit the evaluation set only to those tokens for which the analyzers provide
a guess and in addition consider the RNC response correct if one of the
multiple guesses it provides is correct, the numbers become comparable
(88.5% vs. 91.9%, 93.9% vs. 95.2%, 81.5% vs. 86.8%). We develop a simple
procedure which boosts TOROT lemmatization accuracy by 8.7% by using
RNC lemma guesses when TOROT fails to provide one and matching them
against the existing TOROT lemma database. We conclude that a statistical
analyzer (trained on a large material) can deal with non-standardised
historical texts better than a rule-based one. Still, it is possible to make the
analyzers collaborate, boosting the performance of the superior one.Arts and HumanitiesMiddle RussianmorphologydiachronicinflectionEnglishBerdicevskis, Aleksandrs, Hanne Eckhoff and Tatjana Gavrilova. 2016. The beginning of a beautiful friendship: rule-based and statistical analysis of Middle Russian. Computational linguistics and intellectual technologies. Papers from the annual international conference "Dialogue", 15: 99–111, http://www.dialog-21.ru/media/3384/berdi%C4%8Devskisaetal.pdf2016Conzett, Philipp2016-04-18Russian FederationCC0 1.0