Public API

Programmatically interact with Open Measures' data using the Public API

Getting Started

Open Measures values transparent openness and offering as much for free as we can in order to help push back on online and offline harm. One of the ways we do this is by offering a Public API.

Important to note: to mitigate the threat of bad actors, the Public API is rate-limited to 39 requests per day and date-limited to data that is at least six months olddf. Get in touch with us if you'd like to learn about a non-rate-limited version of the Public API.df

Open Source API: https://gitlab.com/openmeasures/backends/openmedfasures-api

There''s a link to the API on the Open Source section of the Open Measures website along with more details on the rate-and date-limiting on the Public API:

Endpoints

The API has access to the raw JSON behind all of our front-end tools and can be useful for developers and analysts who want to dive deeper into the data or make more fine-grained queries. The API has three endpoints and has an optional boolean logic query for the Content endpoint.

All API endpoints are hosted at:

Calendar and fixed intervals

When configuring a date histogram aggregation, the interval can be specified in two ways: calendar-aware time intervals, and fixed time intervals.

Calendar-aware intervals understand that daylight savings changes the length of specific days, months have different amounts of days, and leap seconds can be tacked onto a particular year.

Fixed intervals are, by contrast, always multiples of SI units and do not change based on calendaring context.

Calendar intervals

Calendar-aware intervals are configured with the calendar_interval parameter. You can specify calendar intervals using the unit name, such as month, or as a single unit quantity, such as 1M. For example, day and 1d are equivalent. Multiple quantities, such as 2d, are not supported.

The accepted calendar intervals are:

minute, 1m

All minutes begin at 00 seconds. One minute is the interval between 00 seconds of the first minute and 00 seconds of the following minute in the specified time zone, compensating for any intervening leap seconds, so that the number of minutes and seconds past the hour is the same at the start and end.

hour, 1h

All hours begin at 00 minutes and 00 seconds. One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00 minutes of the following hour in the specified time zone, compensating for any intervening leap seconds, so that the number of minutes and seconds past the hour is the same at the start and end.

day, 1d

All days begin at the earliest possible time, which is usually 00:00:00 (midnight). One day (1d) is the interval between the start of the day and the start of the following day in the specified time zone, compensating for any intervening time changes.

week, 1w

One week is the interval between the start day_of_week:hour:minute:second and the same day of the week and time of the following week in the specified time zone.month, 1MOne month is the interval between the start day of the month and time of day and the same day of the month and time of the following month in the specified time zone, so that the day of the month and time of day are the same at the start and end.

quarter, 1q

One quarter is the interval between the start day of the month and time of day and the same day of the month and time of day three months later, so that the day of the month and time of day are the same at the start and end.

year, 1y

One year is the interval between the start day of the month and time of day and the same day of the month and time of day the following year in the specified time zone, so that the date and time are the same at the start and end.

Fixed intervals

Fixed intervals are configured with the fixed_interval parameter.

In contrast to calendar-aware intervals, fixed intervals are a fixed number of SI units and never deviate, regardless of where they fall on the calendar. One second is always composed of 1000ms. This allows fixed intervals to be specified in any multiple of the supported units.

However, it means fixed intervals cannot express other units such as months, since the duration of a month is not a fixed quantity. Attempting to specify a calendar interval like month or quarter will throw an exception.

The accepted units for fixed intervals are:

milliseconds (ms)

A single millisecond. This is a very, very small interval.

seconds (s)

Defined as 1000 milliseconds each.

minutes (m)

Defined as 60 seconds each (60,000 milliseconds). All minutes begin at 00 seconds.

hours (h)

Defined as 60 minutes each (3,600,000 milliseconds). All hours begin at 00 minutes and 00 seconds.

days (d)

Defined as 24 hours (86,400,000 milliseconds). All days begin at the earliest possible time, which is usually 00:00:00 (midnight).

Boolean and Advanced

Each endpoint in the API has the ability to more advanced queries. This works by leveraging an Elasticsearch query_string_query(Elasticsearch docs).

querytype

This optional parameter is an enum and tells the API how to interpret the value provided in the mandatory term parameter.

This parameter allows the user to run one of the following conditions. It is an enum and expects a string parameter:

querytype valueOutcomeExample

content

This will run an API request to find documents where the input value for term appears in the content field.

qanon Will return docs where "qanon" appears in the content field.

boolean_content

This will run an API request to find documents where the boolean condition specified in the term parameter appears in the content field.

qanon AND wwg1wga Will return docs where "qanon" and "wwg1wga" appear in the content field

query_string

This will run an API request using the Elasticsearch query_string_query syntax. It will search across all fields unless specified in the query.

(channelusername:rtnews) AND (message:russia) Returns docs where the channelusername field is set to rtnews and where the message field contains russia

esquery

This parameter will eventually be deprecated. We recommend using querytype to avoid disruption.

The default for this parameter is False which means that the API will only search a single term through the document's content field.

NOTE: Setting esquery to True will yield the same results as setting querytype to query_string.

A full list of content fields can be found in our open-source code here:

When this is set to True the API will interpret the value of the term parameter as a raw Elasticsearch query.

Example:

import requests
PARAMS = {
    "site": "win",
    "term": "qanon OR wwg1wga OR #qanon OR #wwg1wga",
    "esquery": True
}
API_URL = "https://api.openmeasures.io/content"

# This will return results where the terms appear 
# in any of the fields in the documents
response = requests.get(API_URL, params=PARAMS)
hits = response.json()["hits"]['hits']

Examples

Notebook

Here is a link to a Quick Start Code Guide in Colab or Jupyter notebook format for making requests to our API.

Command Line Tool

We also have a CLI, originally built by a community member, that we forked. Separately, there is the following GitHub repo (https://github.com/cabalcx/smatter/tree/main), also built by community members, that implements a similar wrapper.

We are ALWAYS looking for contributions like this to our technical stack that significantly expand the utility of our data for users.

Example Content Endpoint Query

First, click the Content endpoint and then click “Try it out”.

Choose Telegram as the Site, and then click Execute. It will give you a regular URL link that will have the raw JSON content of your query. The interface will look like this:

If you Curl or go to the URL you will get a raw JSON that will look like the following (certain browsers like Firefox will automatically “prettify” the JSON to make it easier to read):

Example of the same request using curl from the command line and jq to pretty print:

Everything is nested in “hits” and “_source” for all of our data.

API Workflow Example

The Open Measures API is able to be fine-tuned to your exact needs. To show this, we will spell out the steps necessary to pull up to 10k of Guo Wengui’s posts on Gettr as a reference to our recent post outlining some of the happenings on Gettr. This post builds upon the information shown in our original API guidance blog post. The key element of this advanced usage is using the term query with Elasticsearch query string syntax while setting the es_query field to True.

After heading over to our interactive API docs click the content button:

Content button in Open Measures' interactive API docs tool

  1. Click “Try it out.”

  2. Next to “term” write any interesting word for now.

  3. On “site” select “Gettr.”

  4. Leave all the other settings default for now and click “Execute.”

  5. This will generate a “Request URL” if you copy that link into a new browser window you will be offered a JSON of the data you requested.

NOTE: JSON is just a term for a type of data format commonly used on the web. It contains nested “keys and values”. One way to think about it would be in a workplace table you would have a few classes called keys such as “employee name” or “employee position” that would each have a unique value. They can then be nested in something like the larger department or city they work in. For our data, the JSON has many different fields containing different aspects of the data such as the username, the post itself, the time posted, and other details. We recommend using a browser like Firefox because it auto-formats the JSON for you. We present the JSON as close to as exact as it was represented on the native site the data was crawled from.

Now that you have some examples of the format of the data you want to explore, dig through it to find the field (or “key”) you want to search under. In our case, we are interested in a field under “uinf” called “username” because we are doing author search. The best way to find the intended field is to look through the JSON results from this /content query.

Prettified JSON blob highlighting the username field.

We are ready to search for the posts written by a specific user on Gettr, now that we know what field corresponds to the username in the JSON.

Back into the interactive API we can now construct our input to the term field in the API. We combine uinf.username with the specific username, in this case “miles”, we are interested in searching using the following syntax: “uinf.username:miles”.

NOTE: For those wishing to learn more about the query language behind these requests check out this documentation or the "Advanced Searching" section of our Kibana guide.

Then we can configure the remaining Open Measures API arguments:

  • We can raise the “limit” (which is the limit of posts returned) to the 10k point we rate-limit it at.

  • We can then adjust the “since” field to be farther back.

  • And finally, and critically for this kind of search, we set the “esquery” boolean item to “true”. This just means that in the term box, instead of accepting a regular search phrase it’s using “Boolean logic” to search through specific fields.

Once your fields look like the following click execute and copy the URL again. It may take a second to load!

Open Measures' interactive point and click API interface

Once you have the JSON opened in a new tab (here’s a direct link to the query we've just demonstrated), you may have to click to expand some of the fields. Most of you’re interested in here will be under: hits > a number > _source. Once there you will see the contents of the message as the field named “txt” as well as other information.

Prettified JSON blog highlighting the txt (or comment body) field

Wrap up

Once you’ve got the hang of searches for all of an author’s post you can experiment with other advanced queries over any of the other fields in any of our data sources such as language, location, links, etc. As always, let us know via email at info@openmeasures.io!

Content Fields

When querytype is set to content or boolean_content the API will search through the default content field for each site. A list of the content field per site can be found in our open-source API code base which is embedded below.

Last updated