Patentsview Data Gaps

This is a further explaination of the 5 major problems listed here. The root cause is that there are gaps in the bulk data the uspto makes available.

The patentsview database is built from nearly 2,000 xml files of granted patent's data¹. The problem is that there are 305 patents missing in the grant xml files which means they are also missing in the patentsview database. Interestingly, 182 of the missing patents are present in the uspto's other api Patent Examination Data System.

Classification Problems

There is classification data in the grant xml files but classifications can change after a patent is issued. As a result, the uspto makes bulk classification files available approximately quarterly. The patentsview team uses the bulk classification files when building its database. There are two main classification systems being used. One is the relativley new CPC or Cooperative Patent Classification system. Problem: plant patents and reissued patents can be classified using CPCs yet the bulk cpc file only contains classifications for utility patents. As a result, the patentsview database only contains CPC assignments for utility patents.

¹Patent Grant Full Text Data (No Images) (JAN 1976 - PRESENT) files