About FastOpenData

We are data scientists, data engineers, and machine learning engineers who have worked for a variety of companies ranging from small, scrappy startups to large consulting firms with famous clients. We’ve worked in industry, the non-profit sector, for political organizations, and in academia.

Throughout our professional lives, we’ve frequently been part of the following conversation. Someone says something to the effect of, "We have our customers’ addresses, so isn’t there some relevant data from the US Census Bureau (or some other government source) that could help us here?". That person is dispatched to investigate the data. But inevitably, they report back that the data lacks documentation, consistency, quality checks, and so on. Then, the decision is made that although the data could be valuable in principle, it’s too time-consuming and tricky to actually use it, especially when there are higher priority items on the roadmap.

The fact is that there is a wealth of diverse, valuable, and high-quality open data out there, free for anyone to use. It comes from government sources as well as open data projects such as OpenStreetMap, Wikidata, and several academic projects. But the problem is that it’s typically very difficult to work with. Documentation is sparse, the data itself is not clean, it doesn’t use consistent naming conventions, and the formatting is somewhat arbitrary. On top of that, it is very difficult and time-consuming to combine these data sets; and as anyone in data science will tell you, the value of data sets is multiplied when they can be combined into a single, consistent source of truth.

With our experience in data science and adjacent fields, we understand the pain points associated with new data sources. Too often, one has to hunt down documentation, only to find that it doesn’t answer important questions. Engineering resources are necessary to write custom client software to ingest the data. Then it turns out that naming conventions and formatting are inconsistent, which forces engineers to write additional code. ETL is usually a nightmare, and it’s difficult to ingest and utilize the new data at the scale required by a typical data science project. The data sources don’t play nicely with the tools that data scientists are already using. And proprietary data is often shrouded in mystery when it comes to understanding where it came from and how it was derived.

We got sick of leaving valuable data unused. So we’ve done the hard work of collecting, understanding, combining, and cleaning many different data sets and making them easy to use through our API, Python client, and command-line utility. Lots of organizations have address information, so we’ve made it possible to retrieve all of this data by querying FastOpenData with any address in the United States. All the data is integrated into a single source, the same naming conventions are applied universally, and integration with common data science workflows is trivial.

Hence, our mission is to enable data scientists and other analytics professionals to gain new insights by making it easy to leverage the power of open source data sets. We aim to eliminate the headaches that usually come along with new data sets, and allow these professionals to focus on interesting, high-value work rather than deal with tedious, low-value, boring, rote tasks such as ETL and data cleaning.

FastOpenData

Our Mission

FastOpenData

Latest Posts

Where is AI Going? Lessons from Chess

Three Types of AI Product Deployments

Use “Quick Wins” to Achieve a More Ambitious Goal

Data Science Teams Need a Product Mindset

Categories

Pages

Tags

About FastOpenData