Category Archives: Python
Tornado Internals
Well, what a piece of technology and every time you read more about it the better you know about it and can better appreciate it. Yes, I’m talking about the very own Tornado Web Server. My ‘attempt’ here is to tell you about the workflow of Tornado internals…
Tornado is a non blocking web server as we all know and understand. But how does it get this non blocking thing going its way?
Let’s first understand that Tornado is event driven web server. But what does event driven programming mean? Well it means your application thrives an event loop (single threaded) that keeps polling for events, identifying them as wanted and then handling them. Tornado also works on similar principles.
Tornado web server runs an ioloop, a single threaded main event loop and is at the crust of async nature of Tornado. ioloop.add_handler(fd, handlers, events) maintains a list of file descriptors, events to be watched and the corresponding handlers for each of these fd.
ioloop is a user space construct but who listens to events on the fd’s. That should be a kernel library and Tornado uses epoll, kqueue(BSD) – libraries that provide event notifications in a non-blocking way. epoll has three main functions:
- epoll_create – creates an epoll object
- epoll_ctl – controls the file descriptors and events to be watched
- epoll_watch – waits until registered event occurs or wait till timeout
epoll thus watches file descriptors (sockets) and returns needed (READ, WRITE & ERROR) events.
As described above, Tornado’s ioloop consumes these events (for the file descriptors) and run associated handlers for these events.
tornado.IOStream works as an abstraction layer on top of sockets. It provides three methods:
- read_until() – reads the socket until it finds empty line delimiter that suggests completion of HTTP headers
- read_bytes() – reads N number of bytes from socket
- write() – write a buffer to socket
All of these methods can call a callback when their job is done.
tornado.httpserver is a non blocking http server that accepts connections from clients on a defined port by adding the sockets to the ioloop.
- http_server = httpserver.HTTPServer(handle_request)
- http_server.listen(8888)
- ioloop.IOLoop.instance().start()
handler argument as mentioned in ioloop is a callback accepts the new connection, creates a IOStream, and creates a HTTPConnection object of httpserver class that is now responsible handling all client requests.
Selenium with Python bindings
After a lot of posts on Tornado web server and understanding BDD, lets get to testing our website. What better than to you selenium. Lets go through the setup and create our first test..
Prerequisites
1. Python bindings for Selenium – Go to, selenium site and download the package
Install as:
- tar xvf selenium-2.25.0.tar.gz
- cd selenium-2.25.0
- sudo python setup.py install
2. Java Server – Download the server from here
Run as:
- java -jar selenium-server-standalone-2.25.0.jar
Here we discuss the usage of Selenium 2.0 Web Driver, with/without selenium server. Below are the examples of each of these:
Just a bit of history first… Web Driver aims to improve Selenium 1.0 Remote Control. The distinguishing factors being:
- Object Oriented APIs
- More features
- Web Driver uses the APIs exported by the browser for automated testing while Selenium Remote Control injects Javascript to run the test
Web Driver without selenium server
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://www.yahoo.com") # Load page
assert "Yahoo!" in browser.title
Web Driver with selenium server – WebDriver Remote
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Remote(
command_executor='http://127.0.0.1:4444/wd/hub',
desired_capabilities=DesiredCapabilities.FIREFOX)
driver.get("http://www.python.org")
driver.close()
BDD in Python with lettuce
Behavior Driven Development, also known as BDD, is a concept developed by Dan North and is based on a popular and well adopted TDD. As in Dan’s words -
‘BDD is a second-generation, outside–in, pull-based, multiple-stakeholder, multiple-scale, high-automation, agile methodology. It describes a cycle of interactions with well-defined outputs, resulting in the delivery of working, tested software that matters.’
BDD provides a framework where QA, Business Analysts and other stake-holders communicate and collaborate on sotware development. While TDD emphasized on developing tests for unit piece of code. BDD insists on developing tests for business scenarios or use cases or behavioral specification of software being developed. According to Dan, BDD tests should be written as user stories ‘As a [role] I want [feature] so that [benefit]‘ and Acceptance criteria should be defined as ‘Given [initial context], when [event occurs], then [ensure some outcomes].‘
lettuce is typically used in Python to implement BDD. This blog covers the installation of lettuce on Ubuntu and its application with an example of fibonacci function
Installation
buntu@ubuntu:~$ sudo pip install lettuce
[sudo] password for buntu:
Downloading/unpacking lettuce
Downloading lettuce-0.2.9.tar.gz (40Kb): 40Kb downloaded
Running setup.py egg_info for package lettuce
Downloading/unpacking sure (from lettuce)
Downloading sure-1.0.6.tar.gz
Running setup.py egg_info for package sure
Downloading/unpacking fuzzywuzzy (from lettuce)
Downloading fuzzywuzzy-0.1.tar.gz
Running setup.py egg_info for package fuzzywuzzy
Installing collected packages: fuzzywuzzy, lettuce, sure
Running setup.py install for lettuce
Installing lettuce script to /usr/local/bin
Running setup.py install for sure
Running setup.py install for fuzzywuzzy
Successfully installed lettuce
Setup
Let’s first create a directory structure that looks like this
buntu@ubuntu:~$ tree lettucetests/
lettucetests/
|– features
| |– fib.feature
| |– test.py
`– test.feature
1 directory, 3 files
Define Features
Write Tests
Abstraction in Search
Another great event I attended this year.. PyCon India 2012, was better organized, had better talks, more audiences, job fair and more fun than ever before..
Not to forget the evening dinner for speakers
I loved every bit of it…. talking to experts, talking to Python enthusiasts, answering their Qs and wondering why I was not like the younger folks when I was younger?
Vishal and I delivered a talk on ‘Rapid development of website search in Python’.. We spoke about,
- Why Search is imperative in web sites
- How is the Schema defined and Analyzers chosen
- How indexing, and searching works with appropriate flowcharts
- How search can be easily integrated with your web application
- What are the design and development considerations for implementing it
We also shared our observations on facets of a good search solution, It should be:
- Integral to the website development
- Decoupled from the web framework used for website development
- Adaptable (scale and requirements of website)
- And most importantly it should be rapidly developed and deployed
This talk provoked a new design concept of Abstraction in Search (never tried before as we know of) and contributed to the Python community at large…
Preface
We all understand no same solution fit for two different problems. The same phenomena applies for search engines as well.. A search engine may have high indexing, committing capabilities but slower searching algorithm when compared to an equivalently feature rich engine. Hence a search engine is deemed to be the best solution for one website but maybe an utter unfit for other…
Problem
Now developing search with one particular algorithm or a particular engine, and plugging it into any website that you develop is no less than digging your own grave! Why the h**l would you assume that the one search solution that you’ve develop for your large scale website is suitable for other small or medium scaled or sized websites?
Solution
We propose development of customized search engines that are adaptable to the small/medium and large scaled & sized websites. Once you have the search engine implementations, develop an Abstraction Layer over these engines. Abstraction Layer would ensure:
- Freedom to choose an engine based on applicability and adaptability to the website
- Develop once and reuse as many times
- Call to a search engine can be decided at run time
The abstraction layer could be implemented in a well know facade pattern way!
Design
We propose a simple to understand SVC model (based on MVC model). SVC stands for Search View Controller. In SVC, the Controller, calls search.py with appropriate search engine to find the search results for user input keywords. search.py is an abstraction developed on the search engines implementations that can adapt to small, mid and large scaled & sized websites. The decision to call a search solution from search.py abstraction is dependent on the website developer (as s/he understands the requirements of website and the search solution for it). Selected search engine then generates the search results for input query terms and passes onto the controller via search.py. Controller then applies the search results to the View (templates) and renders the results to the user..
Prototype Implementation
We’ve developed a prototype for the idea discussed above (termed as fsMgr). fsMgr assumes that the webpages that need to be search are already available (or scrapped) in a tree structure.
search.py of fsMgr abstracts Whoosh and pyLucene search engines. By doing this, we demonstrate, how either of these engines can be leveraged for website search based on the website requirements.
We use Tornado Web Server of Python as Controller that provides us request handling capabilities so that we can export simple search and advanced search capabilities (such as highlighted search, didyoumean spell-checker and morelikethis document searcher) to the users.
Tornado’s template capabilities are used as Views in this prototype.
Code
Source code of this prototype implementation at fsMgr
SVC Architecture
Tornado – Whoosh – DidYouMean
Have tried to search a word in google and you got a response from google saying, ‘Did You Mean’ when the word you have typed is spelled incorrectly? Something like this? And you want to implement this feature in your engine?
Well, Whoosh search engine is capable of performing didyoumean operation on the queires presented by the user. Didyoumean essentially presents suggestions to the users on mis-typed or mis-spelled queries based on the key terms present in the index. Whoosh currently works more of typo checker or corrector as it doesn’t have the capabilities of handling phonetics well enough…
For correction Whoosh looks up for correct words in:
- Created Index
- File with words list
With Whoosh, developers can define Schema fields that would be used for spell-checker. For instance, if you were to perform spell-check on contents, simply define Schema with the field ‘content’ as ‘spelling=True.’
Here’s an example of Whoosh’s didyoumean capability with Tornado Web Server
Did You Mean input query form
Tornado Web Server handling spell-checker requests
In this example, if user searches for word ‘Torando’ he gets suggestion for Tornado and if he tries for ‘piethon’ he gets Python
Tornado – Whoosh – MoreLike and MoreLikeThis
Like other search engines, Whoosh too provides more_like() and more_like_this() methods to find similar documents in the index, Typically, morelikethis doesnt execute any special query to get the list of similar documents to the one specified, but in fact it searches all other documents in the index relative to the document content that is specified. Here’s a example of more_like() method of Whoosh integrated with Tornado
User enters the document path and submits it to the index which then presents the similar morelike documents. User form code here
In the code below:
- document_number(path=path) gets the document number of the specified document path in the index
- more_like(docnum, ‘content’) method then find documents *like* the specified document based on content
- more_like_this(“content”, top=1) method searches the top 1 sub-hits
Tornado – Whoosh
Search functionality is often considered as “good to have” feature in website development, but search plays a crucial role of locating relevant information to the website visitors. Serach capabilities can be built into a website easily with modules such as lucene, Solr, ElasticSearch and Haystack among others.
This blog discusses about whoosh and how it can be integrated with Tornado web server.
Whoosh is a fast search engine developed by Matt Chaput that supports field based full text index search, storage, text analysis, posting formats and scoring algorithm. You can benefit from services like highlighted search, fuzzy search, document based search (more like this) and with spell checker (did you mean). Whoosh APIs are pythonic and are developed in pure python..
Let’s take an example of blogger with the code snippet below
In the above code,
- class Search provides searching capabilities with Whoosh
- __init__ method accepts the indexdir (directory where serach index gets created) and searchstr (string that needs to be searched)
- searcher() method first defines a document schema (this is how a blog would look), and creates an index in indexdir based on the schema. It then creates a writer object that is used to add blogs and commit those. Finally search() method returns the search results with a max limit of 50 searches
- When the user browses to http://localhost:8888/, search form is rendered.
- On submitting the search word, GET request is sent to http://localhost:8888/search.
- class Srch handles this request and in turn calls Search class that implements search functionality with Whoosh
When the user searches for tornado we get this output below which suggests that the word tornado was found in 1 document and it took .0004 secs to search for it
<1/1 Results for Term('content', u'tornado', boost=1.0) runtime=0.000482797622681>
Tornado – File Uploads
Quite often we’re in need of providing file upload mechanism on our website. Be it logs management or user profile management, support for file upload is a must. This blog describes how uploads be achieved with Tornado web server.
Example code:
In this code snippet;
- When user browses to http://localhost:8888/, he is presented with file upload form (code below)
- On browsing and selecting the appropriate file, the user clicks on upload
- The file gets uploaded and the user gets a message with filename & the uploaded location
In the upload form, its important to note the usage of below tags for file uploads:
- enctype=”multipart/form-data”
- input type=”file
As a side note, if you print fileinfo variable, you would observe a dictionary with contents and meta-data of file being uploaded
fileinfo is {'body': 'This is a file upload test for Tornado!!\n', 'content_type': u'application/octet-stream', 'filename': u'fileuploadtest'}


