Patentsview Data Flaws
There are two problems with the bulk data files the uspto makes available to anyone, including the patentsview api team who populates their
database from bulk data files. The underlying data files would need to be fixed
by the uspto before these problems would be fixed in the patentsview database.
- Plant and reissued patents in the patentsview database do not have cpcs as appropriate because the bulk cpc data file only contains cpcs for utility patents.
This was raised as an api issue but it was closed without being fixed.
This page shows that about half of all plant patents have one or more cpc assigment.
Reissued utility patents should also have one or more cpc assignment but none are present in the patentsview database.
- There are 306 missing patents. They aren't in the bulk xml files provided by the uspto.
Further Details
The patentsview api team processes the approximately 1,200 bulk xml files of patent data and then makes the processed data available on their
data download page. I downloaded their patent.tsv.zip and wrote a perl
script to analyze gaps in the file. I expected to find gaps in the patent data corresponding to withdrawn patents. Unexpectedly, I found
nearly 8,000 withdrawn patents in the patentsview's patent.tsv file. Apparently they load all the patents
found in the bulk xml files, including ones that were
subsequently withdrawn. I raised this as
an api issue but it was closed
without being fixed.
I also found 306 gaps in patent.tsv that do not correspond to withdrawn patents. These seem to be patents that aren't present in the
bulk patent xml files. Some of them are not available in the PEDS api and some behave oddly on uspto.gov as outlined
here. None can be found in the patentsview database.
All of this means that
- Your patentsview search results can contain patents what were withdrawn.
- Plant patents and reissued patents in your patentsview search results will not contain any of the cpcs assigned to them.
- If you preform a cpc search through the patentsview api, the results will not contain reissued patents or plant patents as a search on uspto.gov would include.
- Because of the missing patents, patentsview search results can be missing patents that would be returned by a search on uspto.gov