How the data in a database or search engine is structured is as important (or more so) as the data itself.
I was reminded of this old rule-of-thumb a couple of weeks ago when I decided to compare coverage of PCT applications in public patent databases, namely PatentScope, FreePatentsOnline, esp@cenet, and Patent Lens. The results were fairly consistent until I got to Patent Lens, where my benchmark searches retrieved far more documents than the other three databases. (See table below.)
Most surprising was that my date of publication searches retrieved many more documents for five of the six dates I had selected. If PatentScope, which is the official record of the WIPO, says that 3,280 PCT applications were published on March 19, 2009, why did Patent Lens tell me it found 4,364? Obviously, this has serious implications for anyone using Patent Lens to do competitive intelligence, market research or simply track the number of PCTs filed by their organization.
Fortunately, the friendly folks at Patent Lens provided the explanation: Patent Lens indexes all versions of published PCTs, which inflates the number of retrieved documents. This includes subsequently published international search reports (A3 or A9), amended (A4) and corrected versions (A9). PatentScope and espacenet link these documents to the record for the initial publication (A1 or A2). FreePatentsOnline's PCT coverage apparently includes only the first published application (A1 or A2), although I haven't yet confirmed it. This explains the consistency in search results in FPO, PatentScope and esp@cenet.
So what are the practical implications for non-IP professionals who use Patent Lens? Well, if a researcher or tenure-track professor searches his or her name or university, they may get an inflated document count.