AppString > AppStringDoc
The module contains the implementation of the technique presented in [1] and [2].
For compiling instructions, please see CompileDoc.
An example of how to use the module is available in src/sepia/example.cc.
The main class of the module is Sepia which is declared in src/sepia/sepia.h.
The main methods of Sepia are:
Sepia(const vector<string> &dataset, unsigned thresholdMin, unsigned thresholdMax); Sepia(const vector<string> &dataset, const string &filename); void build(); void saveData(const string &filename) const; float getEstimateSelectivity(const string &query, unsigned editdist) const;
The main idea is that the user can create a Sepia object by specifying a vector of strings (dataset) and a few extra parameters (for details see [1]) or load an existing one from a file. If the object was not loaded, then it needs to be built. Next, the user has the option of saving the object to a file. In order to estimate the selectivity of a given string, the user calls select.
The performance results are available in [1] and [2].
[1] Liang Jin and Chen Li: Selectivity Estimation for Fuzzy String Predicates in Large Data Sets. VLDB 2005: 397-408
[2] Liang Jin, Chen Li, Rares Vernica: SEPIA: estimating selectivities of approximate string predicates in large Databases. VLDB J. 17(5): 1213-1229 (2008)