Bulk API Data and Query Tools
Often api providers make bulk data available. The folks at patentsview.org have this page
that lists what they've made available. They've even provided a data dictionary
(spreadsheet) which explains exactly what they've made available. Their work is covered by the Creative Commons
Attribution 4.0 License. Apparently all I have to do is credit the source, provide a link to the license and state if I've made changes. Oh, and I'm not to suggest that the licensor endorses me or my use of their data.
My android app and other pages on this site use an online version of location.tsv. I made two changes. The first was to remove the last three columns as I had no need for them. I also removed the 793 rows that had UTF-8 characters in them. Subsequent requests to the location endpoint failed when these cities and states were used. See utf8_locations.htm
. I've also begun working on a new and improved locations table as explained here
On my Government Organization page
I use an online version of government_organization.tsv with several changes. I moved the name column to the end (after level_three) and display it as an empty string if it matches
the level_one, level_two or level_three entry for that row. I also removed the the single binary character a0 in id 25.
On this page I am accessing an online version of botanic.tsv with two changes. The first was to drop the id column (first column in the spreadsheet). It's a string of 36 characters that I didn't see a need for. The second change was to delete the three rows of reissued data.
Which leaves us with 12,801 rows of data from PP15,460 issued 2005-01-04 through PP28,267 issued 2017-08-08. Once the data is loaded in a database table we can do fun things like figure out which latin names are used most often.
|patentsview.ord botanic data|
Or check for gaps in the data
I checked and the missing numbers correspond to withdrawn plant patents. Interestingly, there was another withdrawn plant patent in this time frame yet there is a row in the spreadsheet for it. PP20696 is in the patentsview database so the uspto and patentsview are not totally in sync.
has more information on the withdrawn patents patentsview returns.
Separately, from the uspto, I have the plant patent issue dates and other fields. I could do some sort of mash up to say show the most popular latin names issued by year.
So let me know if you think of something more interesting to do with the bulk data that is available.
Or you could download it yourself to see how much fun it is! The nice thing with patentsview is that they also offer a query tool to do a custom extract that will produce a csv file you could download. Your query could get you plant patent data like the issue date and title rather than downloading their 1G bulk patent file of 6.3 million patents when you just want data for the 12,804 plant patents covered in the botanic file. On the
's advanced search screen set the select boxes as shown and click "+ Add to Search" for the three conditions (with an implied boolean AND between them).
Then click Submit Search (bottom of the page). Then Select what columns you'd like returned,
click Preview Query (bottom of the page), fill in an email address and prove you are not a robot. If you do all of that they'll send you a link that will download your csv file. How fun is that? The caveat that doesn't apply here is that your result set needs to be 1G or lower in size as that seems to be the extent of their helpfulness. Any larger and you'd have to resort to dealing with the bulk data files or asking for a database dump as explained in the email you'll receive.
The pairbulk people offer a similar query tool and download but they do not let you specify what columns you'd like. You get a lot of columns whether you want them or not. They also only offer download formats of json or xml. There is no csv option. How unfun is that?
If you have an interest in plant patents I have more information about them here
on my main site.