Monday, October 18, 2010

How Complete is the USPTO Patent Database?

There was an interesting discussion last week on the Intellogist blog about the number of allegedly missing patent documents in the USPTO's PatFT database. Of course, this is an important question for anyone who uses the database, but especially for anyone who is doing legal or business research. (PatFT is by default the public patent database of record, although the USPTO does not make this claim.)

Determining the number of records that should be in the PatFT database is relatively easy. The USPTO assigns patent numbers in sequential order, as it has done since 1836. Let's take a closer look at utility patents issued from 1976 to the present. We know that the number of the first utility patent issued in 1976 is 3,930,271 and the highest patent number issued to date (as of Oct. 12, 2010) is 7,814,566. Subtract the latter from the former and add one and you get a total of 3,884,296. So the full-text collection in PatFT should contain 3,884,296 utility patent documents.

However, some of the numbers in the 3,930,271-7,814,566 range are unused because allowed applications (applications that are on the verge of being issued and have been assigned numbers) may be withdrawn from issue by the USPTO or the applicant. These numbers are withdrawn permanently and not reassigned to different applications. (The USPTO publishes lists of these withdrawn patent numbers each week in the Official Gazette.)

How many withdrawn patent numbers are there in our time frame? That's also easy to determine because the USPTO publishes an up-to-date list of withdrawn patent numbers. According to the list, there are 19,753 withdrawn patent numbers in the range 3,930,271-7,814,566. So we must subtract this number from the number above to get the total number of utility patents issued after Jan. 1, 1976 in the PatFT database.

"Potentially assigned patent numbers" - "withdrawn patent numbers" = "total issued patents"

3,884,296 - 19,753 = 3,864,543

We can check this number in PatFT by searching the "Application Type" field (APT) for patents coded "1" (utility patent applications).

apt/1 = 3,864,555

This search retrieves 3,864,555 hits, which is 12 *more* than the number we expected to see based on the calculation above. For a collection of almost 4 million documents, this is a very, very small discrepancy. I would expect similar results for other types of patent documents in the database, e.g. plants, designs, etc.

The reasonable conclusion is that there are no significant gaps in the USPTO's PatFT database, at least for the period after 1975. Of course, no database is perfect and there could be a few missing records in PatFT, but they are probably extremely rare.