|
Persistent Identifier
|
doi:10.18710/473JEF |
|
Publication Date
|
2026-03-03 |
|
Title
| NorwegianNewsTopics (2023): Topic Modeling of Norwegian News |
|
Author
| Steen SteensenOsloMet – Oslo Metropolitan UniversityORCID0000-0003-2675-1817 |
|
Point of Contact
|
Use email button above to contact.
Steen Steensen (OsloMet – Oslo Metropolitan University) |
|
Description
| NorwegianNewsTopics is a tabular dataset containing metadata and topic distribution of online news articles published in 2023 by 22 Norwegian news outlets. Each row represents one article and includes publication metadata and topic model outputs derived from a 28-topic Latent Dirichlet Allocation (LDA) model. The dataset was constructed to analyse topic diversity in Norwegian journalism and to enable comparisons across editorial types and distribution platforms. (2026-02-23) |
|
Subject
| Computer and Information Science; Social Sciences |
|
Keyword
| Topic modeling
Journalism
News diversity
Platformisation
Social media |
|
Language
| Norwegian |
|
Producer
| OsloMet – Oslo Metropolitan University (OsloMet) https://www.oslomet.no/en
Høyskolen Kristiania https://www.kristiania.no/ |
|
Production Date
| 2025-08-07 |
|
Production Location
| Norway |
|
Contributor
| Data Collector: Guneshwar Singh Manhas |
|
Funding Information
| Medietilsynet |
|
Distributor
| OsloMet – Oslo Metropolitan University (OsloMet) https://dataverse.no/dataverse/oslomet |
|
Distribution Date
| 2026-02-23 |
|
Depositor
| Steensen, Steen |
|
Deposit Date
| 2026-02-23 |
|
Time Period
| Start Date: 2023-01-01; End Date: 2023-12-31 |
|
Date of Collection
| Start Date: 2024-05-01; End Date: 2024-08-01 |
|
Data Type
| Tabular metadata of Norwegian news articles, including topic distribution after LDA topic modeling |
|
Software
| RStudio, Version: 2026.01.1+403 |
|
Related Material
| Sampling information and methodological procedures are described in the file 4_Methods_LDA_and_SoMe_matching |
|
Data Source
| The majority of the original data was retrieved from content databases owned by media companies Amedia, Schibsted and Polaris. A smaller part of the data was scraped from the websites of TV2 Nyheter, Klassekampen, Morgenbladet, Vårt land and Nationen. Data on social media distribution was collected through CrowdTangle (a public insights tool operated by Meta, discontinued in august 2024) for Facebook and Instagram, and TikToks research API for TikTok. |
|
Characteristic of Sources
| The original dataset used to conduct the LDA Topic modeling included full-text articles (leads + body texts). Full texts are removed from the deposited dataset due to copy right regulations and terms in data sharing agreements with sources |
|
Documentation and Access to Sources
| Data from sources Amedia, Schibsted and Polaris was accessed after signing data sharing agreements with the three companies. Data was retrieved through API access (Amedia) or transferal of file batches (Scibsted and Polaris). For TV2 Nyheter, Klassekampen, Morgenbladet, Vårt land and Nationen, data was access through scraping of their online news websites after permission was given by the media companies. The scraping was set up using Python, leveraging libraries such as Selenium for browser automation and BeautifulSoup for HTML parsing |