AppString > AppStringDoc
ListMerger
Introduction
This module provides list-merging algorithms for solving the T-occurrence problem in approximate string search.
Overview
The following list-merging algorithms are included:
- HeapMerger - pop list-heads to a heap, push from heap and count occurrences of each element
- ScanCount - uses one counter for every possible stringid (scan count array), traverses inverted lists and increments counts in the array
- MergeOpt - separate long lists from short lists, for short lists use heap merge, for long lists do binary search on candidates from short lists
- MergeSkip - like heapmerger, but uses the merging-threshold to skip elements on the lists
- DivideSkip - combines MergeOpt and MergeSkip, skipping is used for the short lists, binary search is done on the long lists
- OnDiskMergerSimple - reads inverted lists and runs DivideSkipMerger
- OnDiskMergerAdapt - balances costs of reading inverted lists (from disk) and post-processing candidates (from disk)
Contributors
- Alexander Behm (design, main author)
- Chen Li (design, implementation, project leader)
- Jiaheng Lu (design, main author)
References
- Chen Li, Jiaheng Lu, Yiming Lu: Efficient Merging and Filtering Algorithms for Approximate String Searches, ICDE'08, April 2008