AppString > AppStringDoc
This tutorial will guide through the basics steps needed to perform approximate string search on a collection of strings using this library. This guide focuses on how to use the FilterTree (FilterTreeDoc) module.
Most modules in this release were developed and tested on Ubuntu Linux using the GNU GCC/G++ compiler.
In order to compile and run most modules you will need the following:
On systems with the aptitude package manager (e.g. Ubuntu, Debian) you can install all required packages by typing the following as root user (or using sudo):
$ apt-get install gcc g++ libboost-dev
The module MatTreeDoc was developed in Visual C++. No makefile is provided for that module. We recommend using Windows and Visual C++ for that module.
For your convenience, we have added wrappers that contain all
necessary objects as described in section "Basic Usage". All you
need to do to build an index and execute queries, is to create an
instance of a wrapper. These wrappers initialize components with
default values and are the simplest and fastest way to use our
library - at the expense of being able to control tuning parameters
(which filters are used, fanout, etc.).
We recommend browsing through the code in
filtertree/wrappers/example.cc.
In this guide we will use a wrapper to show you how to perform approximate string search using the edit distance.
Let us say you have extracted the archive to the following
directory: /home/joe/flamingo-2.0.1
Then you need to edit
/home/joe/flamingo-2.0.1/src/makefile.inc and set
CODEBASEROOT to the root directory of the source files, i.e.
/home/joe/flamingo-2.0.1/src
After the modifications, your
/home/joe/flamingo-2.0.1/src/makefile.inc should
look like this:
CODEBASEROOT = /home/joe/flamingo-2.0.1/src APPSTRINGROOT = $(CODEBASEROOT) VPATH = $(APPSTRINGROOT) CC = g++ CPPFLAGS = -Wall -I$(APPSTRINGROOT) -O3 # CPPFLAGS = -Wall -I$(APPSTRINGROOT) -g ifndef CODEBASEROOT $(error Please edit makefile.ini and set the CODEBASEROOT variable to the absolute path of the source code directory. e.g., if you put the code in /home/user/flamingo-2.0.1/src do: CODEBASEROOT = /home/user/flamingo-2.0.1/src) endif
Now you can compile the wrapper library (and all other required libraries) by entering /home/joe/flamingo-2.0.1/src/filtertree/wrappers and running make, i.e.:
$ cd /home/joe/flamingo-2.0.1/src/filtertree/wrappers $ make
There should now be a file libfiltertreewrappers.a in /home/joe/flamingo-2.0.1/src/filtertree/wrappers, i.e. for an ls -l you should get an output similar to the following:
$ ls -l -rwxr-xr-x 1 joe joe 139644 2008-09-17 16:36 example -rw-r--r-- 1 joe joe 2913 2008-09-17 10:55 example.cc -rw-r--r-- 1 joe joe 84668 2008-09-17 16:35 example.o -rw-r--r-- 1 joe joe 1772 2008-09-17 16:35 libfiltertreewrappers.a -rw-r--r-- 1 joe joe 1609 2008-09-17 10:55 makefile -rw-r--r-- 1 joe joe 1822 2008-09-17 10:55 wrapperabs.h -rw-r--r-- 1 joe joe 307 2008-09-17 10:55 wrappers.cc -rw-r--r-- 1 joe joe 679 2008-09-17 10:55 wrappers.h -rw-r--r-- 1 joe joe 985 2008-09-17 16:35 wrappersimple.h -rw-r--r-- 1 joe joe 1640 2008-09-17 16:35 wrappers.o
(note that the exact file sizes may differ from yours)
Now that we have compiled the library, we are ready to include
it into an application.
Let us assume you wish to use the library in an application located
in /home/joe/searchapp that consists of one source
file /home/joe/searchapp/main.cc
You can copy and paste the following lines of source code into
/home/joe/searchapp/main.cc for us to compile:
#include "wrappers/wrappers.h" #include <iostream> #include <vector> using namespace std; int main() { StringContainerVector strContainer; strContainer.fillContainer("/home/joe/flamingo-2.0.1/src/filtertree/data/dummy.txt", 80); // read 80 lines from the file specified WrapperSimpleEd wrapper(&strContainer, 3); // use a simple wrapper that uses the edit distance and 3-grams wrapper.buildIndex(); float editDistance = 2.0f; string queryString = "xample"; vector<unsigned> resultStringIDs; // where to store the result string ids wrapper.search(queryString, editDistance, resultStringIDs); // print out the result strings cout << "SIMILAR STRINGS: " << endl; for(unsigned i = 0; i < resultStringIDs.size(); i++) { string tmp; strContainer.retrieveString(tmp, resultStringIDs.at(i)); cout << tmp << endl; } return 0; }
This application will use the first 80 lines of
/home/joe/flamingo-2.0.1/src/filtertree/data/dummy.txt
as the data strings.
It will build an index to support approximate string search and
answer a query that asks for all data strings that are within an
edit-distance of 2 to "xample".
Finally, the results will be displayed.
Since we decided to have every module produce it's own library
(.a file) it is necessary to link several .a files with the main.o
(produced by /home/joe/searchapp/main.cc).
The simplest way to achieve this is to create a makefile for the
application, i.e. create a file
/home/joe/searchapp/makefile with the following
contents:
include /home/joe/flamingo-2.0.1/src/makefile.inc LDFLAGS = -lrt all: main main: main.o \ /home/joe/flamingo-2.0.1/common/libcommon.a \ /home/joe/flamingo-2.0.1/listmerger/liblistmerger.a \ /home/joe/flamingo-2.0.1/filtertree/libfiltertree.a \ /home/joe/flamingo-2.0.1/util/libutil.a \ /home/joe/flamingo-2.0.1/filtertree/wrappers/libfiltertreewrappers.a
Now you should be able to compile the application using make, i.e.
$ make
If make was successful, you have compiled and linked the application! It is time to try and run it by typing
$ ./main
You should have the an output something similar to the following:
INPUTFILE: "/home/joe/flamingo-2.0.1/src/filtertree/data/dummy.txt" 100% FILLING CONTAINER: 80/80; 0'0"/0'0" 100% INSERTING INTO INDEX: 80/80; 0'0"/0'0" SIMILAR STRINGS: example1 example2 example3 example4 example5
Congratulations, you have successfully created your first application using this library!
Approximate string search can be performed in two basic steps: (1) building the index, and (2) answering queries using the index. We will now discuss the basic components for each of the steps (at a high-level).
Refer to filtertree/example.cc for some examples.
Apart from reading this guide, we recommend you browse through the code of some example files. We have provided these files to help you understand how to use the library as quickly as possible.