Supporting data for: LLM-Assisted Keymorph Analysis of Grammatical Case in RT's Israeli–Palestinian Conflict Coverage (doi:10.18710/YTSGDM)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Supporting data for: LLM-Assisted Keymorph Analysis of Grammatical Case in RT's Israeli–Palestinian Conflict Coverage

Identification Number:

doi:10.18710/YTSGDM

Distributor:

DataverseNO

Date of Distribution:

2026-05-15

Version:

1

Bibliographic Citation:

Lu, Tingting, 2026, "Supporting data for: LLM-Assisted Keymorph Analysis of Grammatical Case in RT's Israeli–Palestinian Conflict Coverage", https://doi.org/10.18710/YTSGDM, DataverseNO, V1

Study Description

Citation

Title:

Supporting data for: LLM-Assisted Keymorph Analysis of Grammatical Case in RT's Israeli–Palestinian Conflict Coverage

Identification Number:

doi:10.18710/YTSGDM

Authoring Entity:

Lu, Tingting (https://ror.org/00jdr0662)

Producer:

Beijing Foreign Studies University

Distributor:

DataverseNO

Distributor:

The Tromsø Repository of Language and Linguistics (TROLLing)

Access Authority:

Lu, Tingting

Access Authority:

TROLLing curator

Depositor:

Lu, Tingting

Date of Deposit:

2026-04-21

Holdings Information:

https://doi.org/10.18710/YTSGDM

Study Scope

Keywords:

Arts and Humanities, Keymorph Analysis, grammatical case, cognitive linguistics, conflict discourse, Large Language Models (LLMs)

Abstract:

<b>Dataset description:</b> <p>The dataset for this study supports a Keymorph Analysis of grammatical cases in Russian-language news headlines concerning the the 2023-2025 Israeli-Palestinian conflict, collected from RT's official news website.</p> <p>The dataset comprises four main components:</p> <ol> <li>Raw Headlines and Filtered Corpus: This component includes the initial collection of Russian-language headlines from RT (2023-10-07 to 2025-01-19) and the subsequently filtered corpus of 8,757 distinct headlines containing specified keywords related to the conflict (e.g., 'Israel', 'Palestine', 'Gaza', 'Hamas').</li> <li>Reference Corpus: The reference corpus was constructed from the National Media Subcorpus of the Russian National Corpus (RNC).</li> <li>Annotated Corpus of Grammatical Cases: This core component features the grammatical case annotations for 11 identified target keywords across the corpus. The annotations were generated using an LLM (ChatGPT-5 mini API) with a 20% human-reviewed and corrected sample integrated into the final dataset to ensure high quality and accuracy.</li> <li>Derived Analytical Data and Visualizations: This includes statistical summaries of keyword frequencies and grammatical case distributions, standardized Pearson residual values and log-likelihood (LL) ratio values crucial for keymorph identification, and various visualizations such as word frequency charts and residual heatmaps, all derived from the annotated corpus to support the keymorph analysis.</li> </ol>

<b>Related article abstract:</b> <p>This study applies and extends Keymorph Analysis (KMA) with cognitive linguistic theory to investigate the representation of the Israeli–Palestinian conflict in Russia Today (RT)’s Russian-language headlines. Unlike traditional keyword analysis, which primarily focuses on lexical content, KMA reveals underlying narrative orientations by examining how systematic morphosyntactic choices contribute to the construal of participant roles. Our approach integrates three analytical layers: (1) a Quantitative Layer that identifies statistically significant keymorphs using a novel dual-reference framework (Standardized Residuals for internal distinctiveness and Log-likelihood tests against a broad reference corpus) via LLM-enhanced annotation (98.58% accuracy); (2) a Contextual Analysis Layer that maps these grammatical patterns to their specific lexical and semantic environments through corpus-assisted analysis; and (3) a Cognitive-Semantic Interpretation Layer grounded in the cognitive-semantic networks of the Russian case system. Through this integrated analysis, we identify a core-periphery hierarchy in case usage, revealing three contrastive cognitive schemas: military agents vs. humanitarian space, active entities vs. constrained subjects, and external dominance vs. regional passivity. Ultimately, this study provides a scalable, LLM-enhanced methodology for analyzing morphologically rich languages, advancing our understanding of how grammatical case assignment functions as a systematic mechanism for organizing participant positioning and constructing divergent narrative framings.</p>

Time Period:

2023-10-07-2025-01-19

Date of Collection:

2023-10-07-2025-01-19

Country:

Russian Federation

Kind of Data:

corpus data

Methodology and Processing

Sources Statement

Data Sources:

RT (<a href="https://russian.rt.com" title="URL" target="_blank">russian.rt.com</a>). Insubstantial parts of this source are reused in this dataset under exceptions and limitations to intellectual property protection set out in the <a href="https://lovdata.no/dokument/NL/lov/2018-06-15-40" title="URL" target="_blank">Norwegian Copyright Act</a> and <a href="https://eur-lex.europa.eu/eli/dir/1996/9" title="URL" target="_blank">EU Database Directive</a>.

Media Subcorpus of the Russian National Corpus (<a href="https://ruscorpora.ru/" title="URL" target="_blank">ruscorpora.ru</a>). Insubstantial parts of this source are reused in this dataset under exceptions and limitations to intellectual property protection set out in the <a href="https://lovdata.no/dokument/NL/lov/2018-06-15-40" title="URL" target="_blank">Norwegian Copyright Act</a> and <a href="https://eur-lex.europa.eu/eli/dir/1996/9" title="URL" target="_blank">EU Database Directive</a>.

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Lu, T. LLM-Assisted Keymorph Analysis of Grammatical Case in RT’s Israeli-Palestinian Conflict Coverage. <i>Russian Linguistics</i> (accepted)

Bibliographic Citation:

Lu, T. LLM-Assisted Keymorph Analysis of Grammatical Case in RT’s Israeli-Palestinian Conflict Coverage. <i>Russian Linguistics</i> (accepted)

Other Study-Related Materials

Label:

00_ReadMe.txt

Notes:

text/plain

Other Study-Related Materials

Label:

01_RT_data_headlines_raw.txt

Notes:

text/plain

Other Study-Related Materials

Label:

02_RT_top200.py

Notes:

text/x-python

Other Study-Related Materials

Label:

03_RT_top200_words.txt

Notes:

text/plain

Other Study-Related Materials

Label:

04_RT_top20_word_frequency_chart.py

Notes:

text/x-python

Other Study-Related Materials

Label:

05_RT_top20_word_frequency_chart.png

Notes:

image/png

Other Study-Related Materials

Label:

06_RT_data_with_keywords_raw.txt

Notes:

text/plain

Other Study-Related Materials

Label:

07_RT_annotation_AI.py

Notes:

text/x-python

Other Study-Related Materials

Label:

08_RT_data_annotated_AI.txt

Notes:

text/plain

Other Study-Related Materials

Label:

09_RT_annotation_statistics_AI.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

10_RT_recall_calculation.txt

Notes:

text/plain

Other Study-Related Materials

Label:

11_RT_sample_AI.txt

Notes:

text/plain

Other Study-Related Materials

Label:

12_RT_sample_human.txt

Notes:

text/plain

Other Study-Related Materials

Label:

13_RT_discrepancies_between_AI_and_human_annotations.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

14_RT_data_annotated_reviewed.txt

Notes:

text/plain

Other Study-Related Materials

Label:

15_RT_annotation_statistics_human.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

16_keyword_Izrail_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

17_keyword_CAXAL_ruscorpora_content_2618.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

18_keyword_Gaza_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

19_keyword_XAMAS_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

20_keyword_Palestina_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

21_keyword_Livan_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

22_keyword_Iran_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

23_keyword_Xezbolla_ruscorpora_content_20 .csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

24_keyword_USA_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

25_keyword_ООN_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

26_keyword_Rossija_ruscorpora_content_4000.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

27_REF_Prompt_AI_annotate.txt

Notes:

text/plain

Other Study-Related Materials

Label:

28_REF_AI_annotated.txt

Notes:

text/plain

Other Study-Related Materials

Label:

29_REF_annotation_statistics_AI.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

30_REF_samples_AI_annotated.txt

Notes:

text/plain

Other Study-Related Materials

Label:

31_REF_discrepancies_between_AI_and_human_annotations.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

32_REF_annotation_statistics_human.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

33_RT_chi_square_keyword_analysis.py

Notes:

text/x-python

Other Study-Related Materials

Label:

34_RT_keywords_analysis_report.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

35_RT_keywords_grammatical_case_standardized_residuals_precise.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

36_RT_grammatical_case_standardized_residuals_heatmap.py

Notes:

text/x-python

Other Study-Related Materials

Label:

37_RT_grammatical_case_standardized_residuals_heatmap.png

Notes:

image/png

Other Study-Related Materials

Label:

38_REF_Log-Likelihood.py

Notes:

text/x-python

Other Study-Related Materials

Label:

39_REF_Log-Likelihood.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

40_Identification_ KMA_LL_SR.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

41_Primary_Secondary_Keymorphs_Viz.py

Notes:

text/x-python

Other Study-Related Materials

Label:

42_Fig_Primary_Keymorphs_Viz.png

Notes:

image/png

Other Study-Related Materials

Label:

43_Fig_Secondary_Keymorphs_Viz.png

Notes:

image/png