Search functionality is often considered as “good to have” feature in website development, but search plays a crucial role of locating relevant information to the website visitors. Serach capabilities can be built into a website easily with modules such as lucene, Solr, ElasticSearch and Haystack among others.
This blog discusses about whoosh and how it can be integrated with Tornado web server.
Whoosh is a fast search engine developed by Matt Chaput that supports field based full text index search, storage, text analysis, posting formats and scoring algorithm. You can benefit from services like highlighted search, fuzzy search, document based search (more like this) and with spell checker (did you mean). Whoosh APIs are pythonic and are developed in pure python.. 🙂
Let’s take an example of blogger with the code snippet below
In the above code,
- class Search provides searching capabilities with Whoosh
- __init__ method accepts the indexdir (directory where serach index gets created) and searchstr (string that needs to be searched)
- searcher() method first defines a document schema (this is how a blog would look), and creates an index in indexdir based on the schema. It then creates a writer object that is used to add blogs and commit those. Finally search() method returns the search results with a max limit of 50 searches
- When the user browses to http://localhost:8888/, search form is rendered.
- On submitting the search word, GET request is sent to http://localhost:8888/search.
- class Srch handles this request and in turn calls Search class that implements search functionality with Whoosh
When the user searches for tornado we get this output below which suggests that the word tornado was found in 1 document and it took .0004 secs to search for it
<1/1 Results for Term('content', u'tornado', boost=1.0) runtime=0.000482797622681>