Analysis of Web search query logs shows that roughly 10-15% of queries sent to search engines contain spelling errors. A query speller is crucial to search engines in improving Web search relevance, because it is hard for a search engine to retrieve relevant contents with misspelled keywords. In this project we explore leveraging Web-scale data for implementing a highly accurate and efficient spell checker. To implement a Web-scale spell checker that can handle diverse needs of Web users, we process massive amounts of data such as query logs and Web ngrams to build accurate language models. (PDF)


University of California, Irvine
School of Information and Computer Sciences


News
Datasets
Name: JDB2011 [Download]
Size:11,134 queries
Last update:June 20, 2011
Description: We have randomly sampled 11,134 queries from the publicly available AOL and 2009 Million Query Track query sets and asked 8 human assessors to provide spelling corrections for these queries. Each query was judged by at least one human assessor and queries for which the human assessors had provided at least one suggestion which was different from the original query were reviewed by at least another human assessor. In order to assist the human assessors in providing the most plausible suggestions for each query, we had designed an interface that was showing Google and Bing search results for each query.
Format: The queries of this data set are split into 6,000 queries that can be used for trainig and cross-fold validation. The final models should be evaluated on the three test splits (each 1711 queries) and average performance across these three splits should be reported.

Each line has the following format:
query <tab> suggestion1 <tab> suggestion2 <tab> ...
Citation Policy: If you use this data set for a research purpose, please use the following citation:

Y. Ganjisaffar et al., qSpell: Spelling Correction of Web Search Queries using Ranking Models and Iterative Correction, in Spelling Alteration for Web Search Workshop, Bellevue, WA, USA, July 2011.

Bibtex:

@inproceedings{Ganji:2011:Speller,
    author = {Yasser Ganjisaffar and Andrea Zilio and Sara Javanmardi and Inci Cetindil and Manik Sikka and Sandeep Paul Katumalla and Narges Khatib-Astaneh and Chen Li and Cristina Lopes},
    title = {{qSpell}: Spelling Correction of Web Search Queries using Ranking Models and Iterative Correction},
    booktitle = {Spelling Alteration for Web Search Workshop},
    month = {July},
    year = {2011},
    location = {Bellevue, WA, USA},
}
People
Acknowledgments
We would like to thank Amazon.com for a research grant that allowed us to use their MapReduce cluster. Our research has been also partially supported by NIH grant 1R21LM010143-01A1 and NSF grants, OCI-074806 and IIS-1030002.
Contact spellchecker AT ics.uci.edu