Loading Problems

This is a further explaination of the 5 major problems listed here.

The patentsview database is built from nearly 2,000 xml files of granted patent's data. Problem 1: nearly 8,000 patents found in the xml grant files were subsequently withdrawn yet they are loaded into the patentsview database along with the other 6 million or so patents found in the grant xml files. I don't know of another system that returns data for withdrawn patents. Problem 5: there are 305 non withdrawn patents missing in the grant xml files which means they are also missing in the patentsview database. Interestingly, 182 of the missing patents are present in the uspto's other api Patent Examination Data System.

Classification Problems

There is classification data in the grant xml files but classifications can change after a patent is issued. As a result, the uspto makes bulk classification files available approximately quarterly. The patentsview team uses the bulk classification files when building its database. There are two main classification systems being used. One is the relativley new CPC or Cooperative Patent Classification system. Problem 2: plant patents and reissued patents can be classified using CPCs yet the bulk cpc file only contains classifications for utility patents. As a result, the patentsview database only contains CPC assignments for utility patents. There is also a problem with loading grant xml files more recent than covered by the bulk classification files. Example: a recent update loaded patent data through 2018-11-27 but the most recent bulk uspc file only has data through 2018-04-24. As a result, the last seven months of reissued and plant patents in the patentsview database do not have uspc classifications (Problem 4). The most recent bulk cpc file has data through 2018-11-27 but for some reason cpcs are not coming back on the most recently issued patents in the patentsview database. Perhaps the load was mistakenly done without first retrieving the most recent bulk cpc file. Or perhaps the grant xml files were processed before 2019-02-05 which is when the latest bulk cpc file was made available.