Patentsview Data Gaps
This is a further explaination of the 5 major problems listed
here.
The root cause is that there are gaps in the bulk data the uspto makes available.
The patentsview database is built from nearly
2,000 xml files of granted patent's data¹.
The problem is that there are 305 patents missing in the grant xml files which means they are also missing in the patentsview database. Interestingly, 182 of the
missing patents are present in the uspto's other api
Patent Examination Data System.
Classification Problems
There is classification data in the grant xml files but classifications can change after a patent is issued. As a result, the uspto makes bulk classification files available
approximately quarterly. The patentsview team uses the bulk classification files when building its database.
There are two main classification systems being used. One is the relativley new CPC or Cooperative Patent Classification system. Problem: plant patents and reissued
patents can be classified using CPCs yet the bulk cpc file only contains classifications for utility patents. As a result, the patentsview database only contains
CPC assignments for utility patents.
¹Patent Grant Full Text Data (No Images) (JAN 1976 - PRESENT) files