AppString > AppStringDoc
MAT-tree: A tree-based structure for indexing string and numeric attributes. Using MAT-tree, we can perform range queries on both string and numeric attributes. [1]
The program can be compiled using Visual C or gnu C++.
Compile the project in Visual C, and run accordingly. You can also write a makefile and compile it using a GNU C compiler.
Main files:
Useful parameters:
const int MAXLEN = 100; //maximum length of a string attribute const int PGSIZE = 256; //page size const int TRIELEN = 1000; //maximum size of a Trie (in string representation) const int K = 400; //# of centers in MAT-tree const int STRDELTA = 3; //threshold for string attribute const int NUMDELTA = 4; //threshold for numeric attribute const int SIZES = 80000; //size of the dataset const int ALPH_SIZE = 29; //size of the alphabet #define DATAFILE "data.txt" //input file for dataset #define QUERYFILE "query.txt" //query file const int NUMQUERY = 10; //# of queries to run
Prepare DATAFILE and QUERYFILE. Each record is in one line, with a string followed by by aq numeric value. In the case there are white spaces in the string, you need to replace them with special characters first.
The performance results are available in [1].
[1] Liang Jin, Nick Koudas, Chen Li, Anthony K. H. Tung: Indexing Mixed Types for Approximate Retrieval. VLDB 2005: 793-804