Back to Index


AppString > AppStringDoc

Top-k

Overview

The module contains an implementation of the technique presented in [1].

Usage

For compiling instructions, please see CompileDoc.

The module uses C++ STL TR1 library provided by GNU GCC.

An example of how to use the module is available in src/topk/example.cc.

Interface

The module is divided in three components:

The Top-k Index (Topk::Index) class is defined in src/topk/topkindex.h. Its main methods are:

    Index();
    Index(const std::string &filename);

    template<class InputIterator> 
    void build(InputIterator begin, InputIterator end, const GramGen &gramGen);

    void load(const std::string &filename);

    void save(const std::string &filename)
      const;

The Top-k Query Index (Topk::IndexQuery) class contains a part of the index that is relevant for a particular query. This class is used by the search algorithms. It is defined in src/topk/topkindex.h and its main method is:

    IndexQuery(const Index &idx, const Query &query);

The Top-k Search Algorithms are defined in multiple files. All the algorithms define the getTopk method. Its prototype is:

    template<
      class RandomAccessIterator1, 
      class RandomAccessIterator2, 
      class OutputIterator>  
    void getTopk(
      const RandomAccessIterator1 data, 
      const RandomAccessIterator2 weights, 
      const Index &idx, 
      const Query &que, 
      IndexQuery &idxQue, 
      OutputIterator topk);

The most popular algorithm is the heap-based algorithm. It is defined in the src/topk/topkheap.h.

The main idea is that a Topk::Index object can be created to hold the index. The index can be build by specifying an iterator over a sequence of strings and a gram generator. Additionally, the index can be saved to disk and then loaded from disk. When a query comes, a Topk::IndexQuery object is created using the Topk::Index instance and the query. Finally, the search algorithm of choice can applied.

Contributors


[1] Rares Vernica, Chen Li: Efficient top-k algorithms for fuzzy search in string collections. KEYS 2009: 9-14. (Workshop on Keyword Search on Structured Data, collocated with SIGMOD 2009)