LSA is Great Theory
by Roji John on July 21, 2008
Well, it may be all right in practice, but it will never work in theory. --Warren Buffett
Since its inception in the 80s (US4839853), many search applications have trended towards LSA as a method for correlating concepts. We often find that what sounds great in theory falls short in practice. In the realm of Intellectual Property (IP), Latent Symantic Analysis (LSA) seems one of these situations.
LSA can be described as a technique using statistical analysis to find associations between terms. Without getting into the mathematics, documents with similar yet uncommon terms are considered semantically close.
In theory, the methodology should be excellent for classifying text without actually reading or understanding it. In fact, for general text as may exist on the internet, LSA can prove useful in finding a handful of very related results.
Our task in the IP arena is quite different. The very specific terms used in these technical documents can often confuse LSA systems. But we also have a key advantage within IP—it’s been classified by an expert examiner at the patent office. That examiner knows precisely the concepts described in the invention.
Many users in this community are already very familiar with the US and IPC classification systems. Each system has its pros and cons. Making both classification systems available and approachable is a basic function that every patent application should provide. In addition, being able to leverage the knowledge embedded in the classification systems is key. When a document has several classifications tagged, they must all be taken into account by the application to accurately contain the invention.
Another problem with LSA is scalability. True LSA requires growing matrices that become difficult to manage and scale with large document sets. Shortcuts can be taken, but queries that are not pre-calculated can take minutes to hours to days. With today’s demand for instant information, users cannot be expected to endure such delays. Instant access to the data is a must.
Finally, LSA provides little room for user control. Its algorithms are static until updated by the provider.
I believe applications that learn from the user are better than ones that try to teach the user. Providing a responsive, accessible and repeatable method to teach the application keeps control in the hands of the expert—you!
These are all commitments we take seriously as we strive to provide the best application possible.

There are no comments for this entry yet.
Leave a Reply