Datasets
Navigating the Open Measures datasets
Last updated
Navigating the Open Measures datasets
Last updated
The data found via Open Measures, whether through the API or Analysis Tools, is straight from the source. That means we share everything that is publicly available. Use this resource to map the fields found in each dataset.
Data is delivered via JSON blobs that look like this when making an API request.
To get to the actual data, you need to navigate into hits.hits
and then each result will be under a number in the nested structure.
When we add a post to our database we generate a number of fields that can be found at the level above the actual data responses. If we are looking in response "0", the meta-fields continue until "_source" which is the beginning of the actual data. These are generated by Elasticsearch and are not part of the data from the site. Learn more about these fields in Elasticsearch's guide.
Each dataset will be described in detail in the following sections with more fields, but this chart can function as a quick-start guide for a few "Key Fields".
Platform: This is the plain text readable name of the platforms not the name of the platform field in the api.
Platform Endpoint: This is the name of that platform when you're making an API request.
Username: This is the handle (especially user slug) for the account that is posting the message, though not necessarily the author of the post contained (i.e. if it's a forward). This is their @, not their full name.
Post content: This is the actual body of an individual post or message.
Platform | API Site Parameter | Username Field | Content Field |
---|---|---|---|
|
| htmlparsedcom | |
|
| htmlparsedcom | |
|
| text | |
|
| content | |
|
| N/A but look at meta.description or meta.title | |
|
| content_cleaned | |
|
| content | |
|
| txt | |
|
| comment | |
|
| N/A but look at value.title and value.description | |
|
| content | |
|
| body | |
|
| content | |
|
| body | |
|
| content | |
|
| text | |
|
| N/A but look at full_description and metadata.name | |
|
| text | |
|
| N/A but look at value.title and value.description | |
|
| content | |
|
| message | |
|
| desc | |
|
| text | |
|
| content_cleaned | |
|
| text | |
|
| content |
4chan is an imageboard website where users can anonymously post. Users primarily participate in threaded discussions in response to an original post containing an image. Threads are categorized into “boards”, which are a many-to-one relationship between a thread and a forum room. One of the most popular boards is “/pol/” or “politically incorrect” which rebranded from the “/new/” board and is where many internet attacks and threats of real world violence have been found. Content and users across the site suggest a free-speech maximalist ideology.
8kun, previously called 8chan, is an imageboard site where anonymous users respond in a threaded format to an original post. 8kun was created in 2013 as a free speech alternative to 4chan after 4chan began banning topics. Like 4chan, 8kun threads are categorized into various “boards”. Activity on the site has been linked to several mass shootings and terrorist events, including three in 2019 (Christchurch, New Zealand, Poway, CA, and El Paso, TX). The site has also been known as the home of Q, the user behind the notorious QAnon conspiracies. 8kun’s founder no longer controls the site and has since advocated for it to be shut down due to the real world violence attributed to its usage.
BitChute is a British alt-tech video sharing platform and an alternative to YouTube. Founded in 2017 by Ray Vahey, BitChute hosts content involving QAnon conspiracies, hate speech, and neo-Nazi propaganda. Its users often turn to the platform after being kicked off other video sharing platform like YouTube and even Rumble. As with other alt-tech sites which promote “free speech”, the site is full of videos and comments containing racist slurs, Nazi imagery, and calls for violence.
Bluesky is an American text-based, decentralized social network created by a group of former Twitter employees. Also known as “Bluesky Social”, it is a microblogging social network that uses the “AT Protocol”. It officially branched out from Twitter in 2021, but maintains a similar feel and user experience. Moderation, however, works very differently at Bluesky compared to Twitter or other legacy platforms. Called ‘Composable Moderation’, Bluesky’s moderation begins with a ‘basic default’ level of moderation followed by additional layers that are left to individual users to determine.
Fediverse is comprised of a network of decentralized platforms that gives users more control and autonomy. These networks are often backed by servers running open-source code and maintained by pseudonymous administrators. The open-source libraries running on these servers, like Mastodon or Lemmy, implement a shared communications protocols that allow the servers to “federate” and share information with one another. One of these protocols is called “ActivityPub”, which operates as a server-to-server federation communication network.
Gab is an American social media platform that was launched in 2016 as an uncensored alternative to mainstream social media platforms.
Gab has been a subject of controversy due to concerns about the presence of extremist and controversial content on the platform. Notably, the gunman responsible for killing 11 at a Pittsburgh synagogue in 2018 had previously posted antisemitic content on the platform.
Gettr is an American social media platform that was launched in July 2021. Gettr positions itself as a platform for free speech and an alternative to mainstream social media. It has found some traction among Brazil’s far right with Jair Bolsonaro owning an account. There have been reported connections tying the self-exiled businessman Guo Wengui to the platform as its source of funding.
LBRY is a blockchain-based, peer-to-peer file-sharing and payment network. It was founded in 2015 and served as the foundation for decentralized platforms like social networks and video sharing platforms. One of its founders described it as the most censorship-resistant system to ever exist. It was shut down in 2023 following a lawsuit brought by the SEC for selling unregistered securities.
Odysee, a subsidiary of LBRY, is a fringe decentralized alternative to YouTube and has emerged as the LBRY successor. White supremacists and other extremists have naturally found a home at Odysee due to its stance on moderation.
MeWe is an American alt-tech social networking platform launched in 2011, originally under the name Sgrouples. It is billed as the anti-Facebook as it does not moderate content on its platform, which has allowed for the proliferation of extremism and disinformation. MeWe gained a lot of popularity and many new users in early 2021 when Donald Trump and many of his supporters were banned or removed from platforms like YouTube, Facebook, and Twitter.
Minds is an American peer-to-peer blockchain-based social network. It was launched in 2015 as a free speech, minimally moderated alternative to Facebook where users can earn crypto rewards for platform engagement. Its founders say they allow extremist content as part of an effort to deradicalize users through discourse. Like many of the other datasets, Minds saw a large influx of users following the January 6th US Capitol attack and the removal of many thousands of users from Twitter and Facebook.
Odnoklassniki (OK), or “Classmates” in Russian, is a social media network founded in 2006. For the first few years of its existence, OK was the most popular website in Russia. In 2010, OK merged with VK and monopolized the Russian social media landscape. Like many other datasets, OK has little to no content moderation. The Texas man who killed 8 in a mass shooting in 2023 had previously posted neo-Nazi content to the platform.
Parler is an American alt-tech microblogging social network. Temporarily shuttered in April 2023 following an acquisition, Parler has reemerged as a place for maximal free speech and little content moderation. Parler is known as one of the primary social networking sites used to coordinate the January 6th storming of the US Capitol.
Poal is an American alt-tech threaded forum site modeled after the more mainstream Reddit. Poal insists they maintain a free speech approach with their community and have implemented very little content oversight. Like in other datasets, this lack of oversight leads to content containing harmful and harassing posts including online disinformation campaigns in addition to antisemitic and white nationalist propaganda.
Rumble is a Canadian video sharing platform and web host founded in 2013. Billed as a free speech alternative to YouTube, it seeks to “restore the internet to its roots by making it free and open once again”. It has made a series of acquisitions in years’ past to compete in a consolidating video-sharing market. Recently, Rumble entered into an agreement to serve as cloud services provider for Truth Social. It is known to host videos containing propaganda and extremist ideologies.
RuTube is a Russian video platform and alternative to YouTube founded in 2006. Now owned by Gazprom Media, RuTube has been used to push Wagner and state-authored talking points, online disinformation, and propaganda. State-sponsored material via a library of licensed content includes movies, series, cartoons, shows, and live broadcasts. It also hosts blogs, podcasts, and video game streams.
GET
https://api.smat-app.com/content
Get RUTUBE comment content from SMAT's public API.
Name | Type | Description |
---|---|---|
term* | String | ukraine |
site* | String | rutube_comment |
Scored (formerly known as Communities.win, Win Communities, and The Donald) is a collection of alt-tech threaded based conversation forums that operates very similarly to its more mainstream counterpart, Reddit. The sites first came into existence when Reddit banned the subreddit r/The_Donald in 2020. Users responded by creating their own site, thedonald.win. Scored claims to “unblur the lines between entertainment and politics”. The Scored community c/TheDonald remains a very popular channel for users to discuss January 6th, conspiracy theories, and extremist rhetoric.
Telegram Messenger, commonly known as Telegram, is an encrypted, cross-platform, cloud-based messaging application. Telegram was founded in 2013 by the founders of VK and hosts its operational center in Dubai. Telegram data schema consists of channels which users can join to post messages, images, videos or other media. The Open Measures Telegram dataset includes activity from extremist and neo-Nazi groups in the United States and coordinated state-backed disinformation campaigns throughout Europe and Africa.
TikTok is a social media platform that allows users to create, share, and discover short-form videos. The app was developed by the Chinese tech company ByteDance and was launched in September 2016 under the name Douyin in the Chinese market. It was later released internationally as TikTok in September 2017.
Open Measures focuses its attention on TikTok content that ranges from harassment, online disinformation, white supremacy, and dangerous conspiracies.
Truth Social is an American microblogging social network platform created by Trump Media & Technology Group. The site bills itself as a “Big Tent” social media platform and alternative to Twitter. It was listed publicly in October 2023 via special purpose acquisition company. The platform has been home to a host of conspiracies ranging from widespread voter fraud to QAnon tributes.
Founded in 2006, Vkontakte (VK) is a Russian social networking site. VK is based out of Saint Petersburg and considered to be the Russian Facebook. It was originally founded by the founders of Telegram and is still one of the most popular websites in Russia. It has light content moderation and loose enforcement on policy-violating content. In 2021, VK’s parent company (VK Group) sold majority ownership to Gazprom, effectively making VK a state-run company.
Wimkin is an American alt-tech social network founded in 2017. It promotes itself as a free speech alternative to traditional social media and the user experience is seen as a combination of Twitter and Facebook. Wimkin was pulled from major app stores in January of 2021 following calls to violence relating to the storm on the Capitol. The platform has since returned and maintains its lax policies on content moderation, describing itself as “100% Uncensored Social Media”.