Bulk API Data and Query Tools

Often api providers make bulk data available. The folks at patentsview.org have this page that lists what they've made available. They've even provided a data dictionary (spreadsheet) which explains exactly what they've made available. Their work is covered by the Creative Commons Attribution 4.0 License. Apparently all I have to do is credit the source, provide a link to the license and state if I've made changes. Oh, and I'm not to suggest that the licensor endorses me or my use of their data.

My android app and other pages on this site use an online version of location.tsv. I made two changes. The first was to remove the last three columns as I had no need for them. I also removed the 793 rows that had UTF-8 characters in them. Subsequent requests to the location endpoint failed when these cities and states were used. See utf8_locations.htm

On my Government Organization page I use an online version of government_organization.tsv with several changes. I moved the name column to the end (after level_three) and display it as an empty string if it matches the level_one, level_two or level_three entry for that row. I also removed the the single binary character a0 in id 25.

On this page I am accessing an online version of botanic.tsv with two changes. The first was to drop the id column (first column in the spreadsheet). It's a string of 36 characters that I didn't see a need for. The second change was to delete the three rows of reissued data.

9qskd3l0758uuilhx3w55c7cfRE46030Rubus idaeusAdvabertwee
cmrlj47ihepfnqzpriuh7hd43RE46041Rubus idaeusAdvabereen
jscvs1slb2sjau3u8zl2ayj08RE46031Rubus idaeusAdvaberimar

Which leaves us with 12,802 rows of data from PP15,460 issued 2005-01-04 through PP28,267 issued 2017-08-08. Once the data is loaded in a database table we can do fun things like figure out which latin names are used most often.

patentsview.ord botanic data
countlatin name
718Rosa hybrida
244Calibrachoa sp.
232Prunus persica
204Impatiens hawkeri
181Osteospermum ecklonis
176Rosa hybrid
156Chrysanthemum morifolium

Or check for gaps in the data

Data Gaps

I checked and the missing numbers correspond to withdrawn plant patents. Interestingly, there was another withdrawn plant patent in this time frame yet there is a row in the spreadsheet for it. PP20696 is in the patentsview database so the uspto and patentsview are not totally in sync. This page has more information on the withdrawn patents patentsview returns.

Odd Row
PP20696WITHDRAWN2010-02-02Stenotaphrum secundatumPolaris

Separately, from the uspto, I have the plant patent issue dates and other fields. I could do some sort of mash up to say show the most popular latin names issued by year.

yearcountlatin name
2008109Rosa hybrida
200695Rosa hybrida
200576Rosa hybrida
201664Rosa hybrida
201362Rosa hybrida
200660Chrysanthemum morifolium
201056Rosa hybrida
201456Rosa hybrida

So let me know if you think of something more interesting to do with the bulk data that is available. Or you could download it yourself to see how much fun it is! The nice thing with patentsview is that they also offer a query tool to do a custom extract that will produce a csv file you could download. Your query could get you plant patent data like the issue date and title rather than downloading their 1G bulk patent file of 6.3 million patents when you just want data for the 12,805 plant patents covered in the botanic file. On the http://www.patentsview.org/query's advanced search screen set the select boxes as shown and click "+ Add to Search" for the three conditions (with an implied boolean AND between them).

Then click Submit Search (bottom of the page). Then Select what columns you'd like returned,

click Preview Query (bottom of the page), fill in an email address and prove you are not a robot. If you do all of that they'll send you a link that will download your csv file. How fun is that? The caveat that doesn't apply here is that your result set needs to be 1G or lower in size as that seems to be the extent of their helpfulness. Any larger and you'd have to resort to dealing with the bulk data files or asking for a database dump as explained in the email you'll receive. The pairbulk people offer a similar query tool and download but they do not let you specify what columns you'd like. You get a lot of columns whether you want them or not. They also only offer download formats of json or xml. There is no csv option. How unfun is that?

If you have an interest in plant patents I have more information about them here on my main site.