Studying Search Quality of A Word-pair Approach in Informational Retrieval

Improving of search quality is an important task of modern Information Retrieval. One possible approach is to use information about words collocations. The simplest variant of words cooccurence is word pairs. With this approach a system searches not only using bag of words document model, but including specific term pairs as if they were separate terms. The method has a number of advantages. First of all it is very easy to implement - modifications of a search engine are minimal. Second, it is possible to create an auxiliary next-word index when adjacency term pairs are used.

The paper is studying search quality of the method. Real user queries was used and a collection of documents of Russian legislation. A special evaluation method based on a real user behavior has been proposed.

The experiments show that the word-pair approach using contact collocation decrease search quality. It has been experimentally proved that usage of adjacency term pairs puts too strict restriction and therefore reduces recall. Tests show quality improvement only if it is used word pairs with distance 10-15 words between terms. But with this distance it is impossible to use a special index structure which makes the word-pair approach less attractive.