(Approximate String Matching)

Release 2.0.1 (November 7, 2008)

Department of Computer Science, UC Irvine

This version is outdated. Our most recent release is here.


« Back to Flamingo Main Page

Getting Started

Please refer to the Flamingo Getting Started Guide.


This release (in C++) includes the source code of several algorithms for approximate string matching developed at UC Irvine. It includes algorithms for approximate selection queries, selectivity estimation for approximate selection queries, approximate queries on mixed types, and others. Although an implementation for approximate joins is included, the focus of this release is on approximate selection queries.

Here is a brief explanation of the terms used above:

There are various string similarity functions, such as Levenshtein Distance (aka the Edit Distance), Jaccard Similarity, Cosine Similarity, and Dice Similarity. The following is a description of the modules corresponding to the source directory structure:

In addition, we have provided some commonly used functions in the util directory.

Changes in Version 2.0.1


[DIR]Parent Directory

Acknowledgements: This release is partially supported by the NSF CAREER Award No. IIS-0238586, the NSF award No. IIS-0742960, the NSF-funded RESCUE project, a Google Research Award, a gift fund from Microsoft and a fund from CalIt2.
Many thanks to Sattam Alsubaiee, Minh Doan, and Kensuke Ohta for their valuable testing and feedback on the code and documentation.

License Agreement: Permission to use, copy, modify, and distribute the implementations of MAT-Tree, SEPIA, StringMap, and FilterTree is permitted under the terms of the BSD license. The implementation of the PartEnum algorithm invented by Microsoft researchers is limited to non commercial use, which would be covered under the royalty free covenant that Microsoft made public.

For any questions regarding this release, please send email to flamingo AT ics.uci.edu