Patentsview API Bugs

My expectation is that the patentsview data would match the data in the uspto's patft database but I'm seeing these major differences in no particular order.
  1. Nearly 8,000 withdrawn patents are returned by the api. I raised this as an issue but it was closed without being fixed. See #18 below
  2. Plant patents do not have cpcs where appropriate. It was another issue that was closed without being fixed.
  3. See numbers 3, 15, 19, 28, 33 and 41 below.
  4. There are a lot of problems with locations due to the underlying data. There needs to be a disambiguation effort like there was for inventors. See below.
  5. Design and plant patents (and reissued patents of either type) still receive uspc classifications, yet the api does not always return them. It is true that utility patents stopped receiving uspc classifications after May of 2015, but plant and design patents do still receive them. This page shows the null uspc classifications coming back from the api on plant patents. Also provided are links that go to the uspto's site showing the uspc classifications the api should be returning. The uspto pages also show cpc classifications for about 50% of the plant patents. As just mentioned above, the api does not return them either. Here's a page showing design patent data returned by the api
  6. There are 305 missing patents. They aren't in the bulk xml files provided by the uspto. See 43 and 44 below.
Numbers 1 and 4 could be fixed by the patentsview team. They should not load the withdrawn patents they encounter in the grant xml files (ones subsequently withdrawn after being issued) into their database. I don't know of any other system that returns data for withdrawn patents as they do. There is a bulk file available of uspcs so there is no reason the api should not return them. Numbers 2, 3, and 5 are problems with the underlying data the uspto makes available to anyone, including the patentsview team. The underlyling files would need to be fixed by the uspto before these problems could be fixed in the patentsview database. The locations problems (number 3) also exist in the patft database. More information about these problems is here. Specific examples of these problems follow. Most of the location bugs are due to the underlaying uspto data. It is, for example, possible to search for an inventor's city that starts with 200 (ic/200$ at uspto.gov). You might think you are getting back all the patents for a particular location from patentsview but you wouldn't if the uspto data contains errors. This can be seen on this page where the searches are by latitute and longitude. You'll most likely see odd locations nearby that have patents associated with them. Ex: patents where the city is Los Angeles International Airport. The patents associated with the odd location wouldn't be returned by the patentsview location endpoint for a query where the city is Los Angeles. To me this is a big problem that is not mentioned anywhere I could find. The location endpoint should at least have a disclaimer explaining that you may not get back all the patents you may expect. To compensate for this data problem I think there should be an endpoint or parameters on the location endpoint to search within a specific distance from a specific latitude and longitude. I'd then be able to retrive patents within a mile of downtown Los Angles regardless of the "city" (airport, county or labled rock on the roadside) that the uspto has as the city. A possible alternative would be to do some sort of consolidation of locations, changing the odd locations into standard ones.