AppString
> AppStringDoc
ListMerger
Introduction
This module provides list-merging algorithms for solving the
T-occurrence problem in approximate string search.
Overview
The following list-merging algorithms are included:
- HeapMerger - pop list-heads to a heap, push from heap and count
occurrences of each element
- ScanCount - uses one counter for every possible stringid (scan
count array), traverses inverted lists and increments counts in the
array
- MergeOpt - separate long lists from short lists, for short
lists use heap merge, for long lists do binary search on candidates
from short lists
- MergeSkip - like heapmerger, but uses the merging-threshold to
skip elements on the lists
- DivideSkip - combines MergeOpt and MergeSkip, skipping is used
for the short lists, binary search is done on the long lists
Contributors
- Alexander Behm (design, implementation)
- Chen Li (design, implementation, project leader)
- Jiaheng Lu (design, main author)
References
- Chen Li, Jiaheng Lu, Yiming Lu: Efficient Merging and Filtering
Algorithms for Approximate String Searches, ICDE'08, April
2008