FAQ

Data Integrity / Authenticity

How do you ensure data ingestion is robust and comprehensive?

Coverage of our datasets are a result of spidering and others through strategies like predictable post IDs. These subtleties often determine how comprehensive the data is. That said, sometimes web hosts change formats of their APIs so we are constantly updating methods. In some cases, like with Telegram, we crawl everything from select channels which means we lean on our community to help us build out our library. Got a channel, group, or profile you'd like to research? Reach out at info@openmeasures.io with your request.

How can I validate results?

If you make a request to our API or our Search tool, you can then seek out the original post on the platform. You can do this either by the URL we store in the individual items we crawl or by searching other available collections. In some cases, the platform may have deleted the post. In those instances we encourage you to check for archives on archive.org or archive.today.

Our datasets have also been independently verified in multiple articles by research teams including this one at New America. Open Measures has also been a part of many scholarly conference papers exploring our data. The first introduced our early model and the second published our first open-source Parler dataset.

How do you manage historical data?

We crawl the full history of our datasets, starting at the time the crawler was initially deployed. If the data is there when we begin crawling then we will have it indexed.

How do you choose what to open-source?

For the security and sustainability of our platform, we do not open-source each product but we do open-source as much as possible. All open-source code is available via our GitLab project.

Do you open-source your whole platform?

We lean on our community of researchers and developers to provide insight into which of our tools are most useful and then determine collectively which can most sustainably be offered at no cost to the user. Please reach out if you want to be a part of the conversation.

Data Access

Do you make any endpoints available publicly?

Yes! We have several public endpoints available for querying our data store, including the /content endpoint which returns raw data. It is worth noting that our public endpoints are rate limited at 39 requests per day. You can read more about our public endpoints in the API section of our documentation.

I need higher frequency than the public endpoints or API tooling allows. What are my options?

We support a few options for organizations in need of high volume and recent data. If you are interested in learning more about how your team can get more robust tooling, fill out the following form.

Partnerships

How long does it take to implement Open Measures?

Most partners take anywhere from a few days to a month to fully implement depending on the complexity of their needs and in-house setup. Our team is there every step of the way. We begin each partnership with onboarding sessions and offer ongoing support to ensure you have the resources you need for success.

Who do you partner with?

We partner with a range of organizations and individuals that are trying to better understand, and respond to, emerging online threats of disinformation and extremism. Those organizations include leading civil society orgs, legal services, press desks, and private research teams.

What if I don't work for an organization that would be a good fit but I still want to contribute to the open source movement?

We're proud partners with Open Source Collective, our fiscal host. Contributions from the community make our dedicated work in the open-source ecosystem possible. You can make a one-time or recurring contribution here.

Last updated