Patentsview API Bugs

My expectation is that the patentsview data would match the data in the uspto's ppubs database but I'm seeing these major differences in no particular order.
  1. Nearly 8,000 withdrawn patents are returned by the api. I raised this as an issue but it was closed without being fixed. See #18 below
  2. Plant patents do not have cpcs where appropriate. It was another issue that was closed without being fixed. The problem is that the bulk Cooperative Patent Classification file produced by the uspto only contains assignments for utility patents. See numbers 3, 15, 19, 28, 33 and 41 below.
  3. There are a lot of problems with locations due to the underlying data. There needs to be a disambiguation effort like there was for inventors. See below.
  4. There are two problems with uspc classifications. Background: the uspto has been using its own classification system for at least a hundred years. In the 2010s most of the world's patent offices agreed to start using a new Cooperative Patent Classification system. As such, the uspto stopped assigning uspcs to utility patents issued after May 2015, and now exclusively assigns cpcs. The uspto does continue to assign uspcs to design patents, plant patents and reissued plant or design patents. This is an important distinction: the uspto still assigns uspcs to non utility patents.
    1. The first major problem is that the patentsview team mistakenly thinks that all patent types stopped receiving uspc assignments in 2015.
    2. The second major problem is that the uspto bulk uspc file the patentsview api uses has not been updated in two year and this appears to be intentional on the uspto's part (as if the uspto itself doesn't understand the important distinction). This page says the file will stop being produced in January 2020 and the file that is available is from 2018. The last plant patent in mcfpat.zip is PP29260 and D816289 is the last design patent, each issued April 24, 2018.
    Combining these two problems and problem 2 means that plant and design patents issued after March 2015 will not be returned by the api's cpc_subsections or uspc_mainclasses endpoints. In other words, you can do queries that should return these non utility patents but they will not be included in the returned results. I consider this to be a critical flaw that needs action by the uspto and patentview team to correct. I could work around the api returning withdrawn patents etc. but I cannnot work around data that is not returned by the patentsview api.

    This page shows the null uspc classifications coming back from the api for the most recently issued plant patents. Also provided are links that go to the uspto's site showing the uspc classifications the api should be returning. Here's a page showing recently issued design patent data returned by the api, which do not have uspcs.
  5. There are 305 missing patents. They aren't in the bulk xml files provided by the uspto. See 43 and 44 below.
  6. You cannnot query for the same field using an "and" as you can in just about any api. Ex. you won't get results if you search for patents with an inventor_last_name of smith and inventor_last_name of jones. Spoiler alert, they exist according to ppubs! See samefield.htm and samefield2.htm
Numbers 1 and 4A could be fixed by the patentsview team. They should not load the withdrawn patents they encounter in the grant xml files (ones subsequently withdrawn after being issued) into their database. I don't know of any other system that returns data for withdrawn patents as they do. There is an outdated bulk file available of uspcs so there is no reason the api should not return them. Numbers 2, 3, and 5 are problems with the underlying data the uspto makes available to anyone, including the patentsview team. The underlyling files would need to be fixed by the uspto before these problems could be fixed in the patentsview database. The locations problems (number 3) also exist in the ppubs database. More information about these problems is here. Specific examples of these problems follow. Most of the location bugs are due to the underlaying uspto data. It is, for example, possible to search for an inventor's city that contains 200: "200$".INCI.c/200$ at ppubs.uspto.gov. You might think you are getting back all the patents for a particular location from patentsview but you wouldn't if the uspto data contains errors. This can be seen on this page where the searches are by latitute and longitude. You'll most likely see odd locations nearby that have patents associated with them. Ex: patents where the city is Los Angeles International Airport. The patents associated with the odd location wouldn't be returned by the patentsview location endpoint for a query where the city is Los Angeles. To me this is a big problem that is not mentioned anywhere I could find. The location endpoint should at least have a disclaimer explaining that you may not get back all the patents you may expect. To compensate for this data problem I think there should be an endpoint or parameters on the location endpoint to search within a specific distance from a specific latitude and longitude. I'd then be able to retrive patents within a mile of downtown Los Angles regardless of the "city" (airport, county or labled rock on the roadside) that the uspto has as the city. A possible alternative would be to do some sort of consolidation of locations, changing the odd locations into standard ones.