Data Integrity / Authenticity

How do you ensure data ingestion is robust and comprehensive?

Some of our crawlers iterate by spidering, some through users and others through predictable post IDs. All of these subtleties , as well as other strategies, determine how comprehensive the data is. That said, there are occasional gaps in data and sometimes web hosts change the formats of their APIs so we are constantly updating our methods. In some datasets (like Telegram) we crawl everything from select channels which means we lean on our community to help us build out our library. Got a channel you’d like to see us crawl? Reach out at info@openmeasures.io with your request.

Our datasets have also been independently verified in multiple articles by research teams including this one at New America. Open Measures has also been a part of two scholarly conference papers exploring our data. The first introduced our early model and the second published our first open-source Parler dataset.

How can I validate results?

If you make a request to our API or our Search tool, you can then seek out the original post on the platform. You can do this either by the URL we store in the individual items we crawl or by searching other available collections. In some cases, the platform may have deleted the post. In those instances we encourage you to check for archives on archive.org or archive.today.

How do you manage historical data?

We crawl the full history of data, starting at the time the crawler was initially deployed. If the data was there when we began crawling then we have it indexed.

Do you open-source your crawlers?

For the security and sustainability of our platform, we do not open-source crawlers but we do open-source as much as possible. All open-source code is available via our GitLab project.

Data Access

Do you make any endpoints available publicly?

Yes! We have several public endpoints available for querying our data store, including the /content endpoint which returns raw data. It is worth noting that our public endpoints are rate limited at 39 requests per day and data that is at least six months old. You can read more about our public endpoints in the API section of our documentation.

I need higher frequency than the public endpoints or API tooling allows. What are my options?

We support multiple options for teams in need of high volumes of data. We offer packages that include our Pro and Enterprise platforms as well as direct access to our data stores. If you are interested in learning more about how your team can get more robust tooling, fill out the following form.


How long does it take to implement Open Measures?

The implementation timeline is up to you! Most partners take anywhere from a few days to a month to fully implement depending on the complexity of their needs and in-house setup. Our team is there for you every step of the way. We begin every partnership with onboarding sessions and offer ongoing support to ensure you continue to have the resources you need for success.

Who do you partner with?

We partner with a range of organizations and individuals that are trying to better understand, and respond to, emerging online threats of disinformation and extremism. Those organizations include leading non-profits in the threat monitoring space, campaigns, researchers, news and press desks, and private research teams.

What if I don't work for an organization that would be a good fit but I still want to contribute to the open source movement?

We're proud partners with Open Source Collective, our fiscal host. Contributions from the community make our dedicated work in the open source ecosystem possible. You can make a one time or recurring contribution here.

Last updated