Behind the Scenes at Innography – Big Fun with Big Data
Big data when unstructured can be a big mess. Here the Innography team discusses the heavy lifting, parsing data, and telling the full story.
We have a saying here “the value is in the insight.” When it comes to intellectual property, true insight requires a lot of data from a lot of different sources. Some of it is structured, but much of it is not.
We hate making you do the heavy lifting to find an answer so we continually look at the our applications with an eye toward fewer clicks, more automation, better algorithms, more data, and faster results for you… all the while leveraging this massive data set.
The truth of it is that big data can be a big mess; one such example is patent legal status.
The USPTO doesn’t make status data (housed in the public PAIR system) a subscription offering, so it’s not as if we could buy a structured database of the current status of every patent from every PTO. But… legal status is a very important predictor of market trends. So, we parse a ridiculous amount of unstructured data from the public PAIR to fish out the little bits that tell us when a patent has been abandoned, when it has been assigned, when it’s being reexamined and so forth. Our data team goes through this exercise every week to get the newest status changes.
Is it 100%? Yes and No. When a patent says expired from abandonment in our platform, that’s 100% accurate. For other statuses, our best guess is 85 to 90%, based just on how the PTO manages the data. Do our competitors go to this effort? No. Is our 85% - 90% better than 0%? We absolutely think so.
We mine this data because it’s important and valuable for you to have this information, and you can’t tell the full story about a technology, patent, market or trend without understanding the legal status of the applications and existing patents… and we’re all about telling the most complete story.
wayne kindsvogel, 02.19.2014
Does Innography have instantly updated data from USPTO servers encompassing all current patents and patent applications. How does this work? For example if I use the same query on the PTO site and the Innography site (e.g., keywords) will I get the same results?
Denise Deverelle Crown, 02.21.2014
Great questions! The USPTO sends data out to its partners every Tuesday and Thursday. From there we have a rigorous process to cleanse, normalize and correlate that data with our other sources. We also incorporate reassignment data that is published daily, application data from the public PAIR, and litigation, company financials, trademark data, and more. We push all of the weekly correlated and cleansed updates from all 100+ data sources out on Sunday evenings.
As far as the search goes, Innography will have different results from the USPTO search. The USPTO site only has published documents, but not any updates, etc. For example, it is missing PAIR prosecution data, reassignment information, legal status changes, and many other updates. We will generally return more results because we have highly enriched data and triangulated data from many PTO’s. We sort them by relevance to your query, but you can sort based on a number of different factors. The results will be similar presuming the query you build to search is compatible with both platforms. Innography also has a large number of unique search operators, so this further makes the search results different, but usually much more accurate/complete to the USPTO’s site.