Quality of information retrieval from the practical viewpoint.

Introduction

At present information retrieval system developers need to have an efficient methodology for quality estimation. It is very important for a system design and turning.
In this paper I am going to compare theoretical and practical approaches.

Theoretical approach

Theoretical approach originates from the classical works of G. Salton and others [1]. Two main characteristics of document search are recall and precision.
"Recall" is the proportion of relevant material actually retrieved in an answer;
"Precision" is the proportion of retrieved material that is actually relevant.
Other researchers have tried to introduce other metrics but without a great success. Van Rijsbergen wrote in his well-known book "Information Retrieval"[2]:

"The advantages of basing it on precision and recall are that they are:
(1) the most commonly used pair;
(2) fairly well understood quantities."
We can add another advantage of these characteristics - they can be easily used in a mathematical model.
Nowadays every theoretical information retrieval paper usually contains classical precision/recall curve that shows the advantage of the new method over the previous one.
The disadvantages of the metrics are:
  1. They are based on the subjective notion "relevance". Relevance can be objective when we are using a test controlled collection and well studied queries (i.e. TREC), but it becomes subjective when we are studying real document sets like commercial databases or the Internet.
  2. The characteristics do not indicate the quality of search from the user's point of view. I am trying to prove it in the next part of this article.

Practical researches

The number of information retrieval studies has greatly increased in recent years due to increasing popularity of internet searches engines. For example, the site searchenginewatch.com regularly publishes reviews of search engines quality. Usually these researchers apply the method that models an ordinary user's behavior. For instance, they make a number of currently popular queries and analyze a number of top result documents for relevance by experts. Apparently this study gives a user more information about the potential search quality than theoretical precision and recall.
Some search engines developers study the statistics of user search queries. A number of these researches are published [3,4,5,6]. The results of them show, that the real user queries and search behavior are different from classical test queries (i.e. TREC) and theoretical user model.

  1. The queries are short. Average length of query is not longer than 4 words. It means that queries are much shorter than searched documents. Because of this a theoretical model in which query is treated as a sample document does not work in real environment. The short queries and relatively long documents have different term occurrence statistics. For example, popular theoretic vector model therefore is not used in real systems.
  2. Users usually open only a small number of top ranked documents. It means that first of all a user quality is the quality of ranking. Classical recall/precision pair does not reflect ranking at all. Some theoretical models use measurement of relevance that can be applied for ranking. For example, the well-known van Rijsbergen's probalistic relevance model uses the value of probability. But researchers have not analyzed the properties of the produced ranking in these terms.
  3. Users often begin to surf other hyperlinked documents after examining a number of documents from the result set. Only few academic works have analyzed hyperlinked document collections, but all the modern systems support different kinds of hypertext. Usage of hyperlinks can significantly increase the search quality for a user [5]. Good example is "Google" one of the most popular modern internet search engines, which first to begin the actual making of a hyperlink analysis.
  4. Users rarely reformulate queries and do not employ technologies like "relevance feedback" to improve the search results. It means that the system works in the conditions information lack and must produce perfect results in this situation. Obviously a lot of theoretical works that study methods of getting additional information from a user are simply not acceptable in practice. For instance, the renown Salton's "relevance feedback" theory has been studied well and as a result of this implemented in many systems still is not used according to usability tests.

Conclusion

The classical academic quality estimation is not suitable for the practical engineering evaluation of modern search engines. Developers are needed to apply a number of empirical methods which have not been theoretically studied.
The creation of practice-oriented methods is one of the most important tasks of the modern information retrieval theory.

References

  1. Salton G. Automatic Information Organization and Retrieval, McGraw-Hill,New York(1968)
  2. Van Rijsbergen. Information Retrieval
  3. Christoph Holscher Gerhard Strube Web Search Behavior of Internet Experts and Newbies http://www9.org/w9cdrom/81/81.html
  4. Dietmar Wolfram A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine http://www.slis.ualberta.ca/cais2000/wolfram.htm
  5. Amit Singhal Marcin Kaszkiel A Case Study in Web Search using TREC Algorithms
  6. Amanda Spink Jack L. Xu Selected results from a large study of Web searching: the Excite study