Blog Archives
Node.js
This is an introduction post to node.js..
Node.js is an event driven, non blocking (async) I/O style software that is used to develop server side implementations. (If you’re a Python fan, it’s like programming in Twisted!). It’s also built on Google’s V8 JavaScript engine. Like other event driven servers, Node.js, runs an event loop and handles the events asynchronously with callback invocations.
A typical ‘ hello world ‘ web server implementation can be found below. Run as: node helloworld.js
In this example,
1. we import module http and create a http server that listens on port 8888.
2. When the user makes a Get request on http://localhost:8888, the web server renders Hello World on the web browser. You can play around with request and response variables as used in the server code.
3. It’s interesting to note that if reponse.end() statement is commented out, the request doesn’t complete and the server hangs. If you press Ctrl+C, only then the request completes and ‘Hello World’ is rendered on the browser. So server developers, beware!
You may ask, what’s this function(request, response) and why it’s anonymous? Well, that’s a motivation for you to read my next post!
Selenium with Python bindings
After a lot of posts on Tornado web server and understanding BDD, lets get to testing our website. What better than to you selenium. Lets go through the setup and create our first test..
Prerequisites
1. Python bindings for Selenium – Go to, selenium site and download the package
Install as:
- tar xvf selenium-2.25.0.tar.gz
- cd selenium-2.25.0
- sudo python setup.py install
2. Java Server – Download the server from here
Run as:
- java -jar selenium-server-standalone-2.25.0.jar
Here we discuss the usage of Selenium 2.0 Web Driver, with/without selenium server. Below are the examples of each of these:
Just a bit of history first… Web Driver aims to improve Selenium 1.0 Remote Control. The distinguishing factors being:
- Object Oriented APIs
- More features
- Web Driver uses the APIs exported by the browser for automated testing while Selenium Remote Control injects Javascript to run the test
Web Driver without selenium server
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://www.yahoo.com") # Load page
assert "Yahoo!" in browser.title
Web Driver with selenium server – WebDriver Remote
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Remote(
command_executor='http://127.0.0.1:4444/wd/hub',
desired_capabilities=DesiredCapabilities.FIREFOX)
driver.get("http://www.python.org")
driver.close()
BDD in Python with lettuce
Behavior Driven Development, also known as BDD, is a concept developed by Dan North and is based on a popular and well adopted TDD. As in Dan’s words -
‘BDD is a second-generation, outside–in, pull-based, multiple-stakeholder, multiple-scale, high-automation, agile methodology. It describes a cycle of interactions with well-defined outputs, resulting in the delivery of working, tested software that matters.’
BDD provides a framework where QA, Business Analysts and other stake-holders communicate and collaborate on sotware development. While TDD emphasized on developing tests for unit piece of code. BDD insists on developing tests for business scenarios or use cases or behavioral specification of software being developed. According to Dan, BDD tests should be written as user stories ‘As a [role] I want [feature] so that [benefit]‘ and Acceptance criteria should be defined as ‘Given [initial context], when [event occurs], then [ensure some outcomes].‘
lettuce is typically used in Python to implement BDD. This blog covers the installation of lettuce on Ubuntu and its application with an example of fibonacci function
Installation
buntu@ubuntu:~$ sudo pip install lettuce
[sudo] password for buntu:
Downloading/unpacking lettuce
Downloading lettuce-0.2.9.tar.gz (40Kb): 40Kb downloaded
Running setup.py egg_info for package lettuce
Downloading/unpacking sure (from lettuce)
Downloading sure-1.0.6.tar.gz
Running setup.py egg_info for package sure
Downloading/unpacking fuzzywuzzy (from lettuce)
Downloading fuzzywuzzy-0.1.tar.gz
Running setup.py egg_info for package fuzzywuzzy
Installing collected packages: fuzzywuzzy, lettuce, sure
Running setup.py install for lettuce
Installing lettuce script to /usr/local/bin
Running setup.py install for sure
Running setup.py install for fuzzywuzzy
Successfully installed lettuce
Setup
Let’s first create a directory structure that looks like this
buntu@ubuntu:~$ tree lettucetests/
lettucetests/
|– features
| |– fib.feature
| |– test.py
`– test.feature
1 directory, 3 files
Define Features
Write Tests
Abstraction in Search
Another great event I attended this year.. PyCon India 2012, was better organized, had better talks, more audiences, job fair and more fun than ever before..
Not to forget the evening dinner for speakers
I loved every bit of it…. talking to experts, talking to Python enthusiasts, answering their Qs and wondering why I was not like the younger folks when I was younger?
Vishal and I delivered a talk on ‘Rapid development of website search in Python’.. We spoke about,
- Why Search is imperative in web sites
- How is the Schema defined and Analyzers chosen
- How indexing, and searching works with appropriate flowcharts
- How search can be easily integrated with your web application
- What are the design and development considerations for implementing it
We also shared our observations on facets of a good search solution, It should be:
- Integral to the website development
- Decoupled from the web framework used for website development
- Adaptable (scale and requirements of website)
- And most importantly it should be rapidly developed and deployed
This talk provoked a new design concept of Abstraction in Search (never tried before as we know of) and contributed to the Python community at large…
Preface
We all understand no same solution fit for two different problems. The same phenomena applies for search engines as well.. A search engine may have high indexing, committing capabilities but slower searching algorithm when compared to an equivalently feature rich engine. Hence a search engine is deemed to be the best solution for one website but maybe an utter unfit for other…
Problem
Now developing search with one particular algorithm or a particular engine, and plugging it into any website that you develop is no less than digging your own grave! Why the h**l would you assume that the one search solution that you’ve develop for your large scale website is suitable for other small or medium scaled or sized websites?
Solution
We propose development of customized search engines that are adaptable to the small/medium and large scaled & sized websites. Once you have the search engine implementations, develop an Abstraction Layer over these engines. Abstraction Layer would ensure:
- Freedom to choose an engine based on applicability and adaptability to the website
- Develop once and reuse as many times
- Call to a search engine can be decided at run time
The abstraction layer could be implemented in a well know facade pattern way!
Design
We propose a simple to understand SVC model (based on MVC model). SVC stands for Search View Controller. In SVC, the Controller, calls search.py with appropriate search engine to find the search results for user input keywords. search.py is an abstraction developed on the search engines implementations that can adapt to small, mid and large scaled & sized websites. The decision to call a search solution from search.py abstraction is dependent on the website developer (as s/he understands the requirements of website and the search solution for it). Selected search engine then generates the search results for input query terms and passes onto the controller via search.py. Controller then applies the search results to the View (templates) and renders the results to the user..
Prototype Implementation
We’ve developed a prototype for the idea discussed above (termed as fsMgr). fsMgr assumes that the webpages that need to be search are already available (or scrapped) in a tree structure.
search.py of fsMgr abstracts Whoosh and pyLucene search engines. By doing this, we demonstrate, how either of these engines can be leveraged for website search based on the website requirements.
We use Tornado Web Server of Python as Controller that provides us request handling capabilities so that we can export simple search and advanced search capabilities (such as highlighted search, didyoumean spell-checker and morelikethis document searcher) to the users.
Tornado’s template capabilities are used as Views in this prototype.
Code
Source code of this prototype implementation at fsMgr
SVC Architecture
Tornado – Redis
As defined on Redis website, it is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists among others. It’s similar to say memcached library in the sense that it ia an in memory key/value pair but persistent on disk.. Redis has gained importance as a NoSQL option for web development because of its speed (GET and SET operations in the range of 100,000 per seconds). This post is about including Redis in Tornado for web development.
Let’s start with installing Redis-server
ubuntu@ubuntu:~/tornado-2.2$ sudo apt-get install redis-server Reading package lists... Done Building dependency tree Reading state information... Done The following packages were automatically installed and are no longer required: libtext-glob-perl libcompress-bzip2-perl libparams-util-perl libfile-chmod-perl libdata-compare-perl libfile-pushd-perl libfile-which-perl libcpan-inject-perl libfile-find-rule-perl libcpan-checksums-perl libnumber-compare-perl Use 'apt-get autoremove' to remove them. The following NEW packages will be installed: redis-server 0 upgraded, 1 newly installed, 0 to remove and 171 not upgraded. Need to get 80.8kB of archives. After this operation, 283kB of additional disk space will be used. Get:1 http://us.archive.ubuntu.com/ubuntu/ lucid/universe redis-server 2:1.2.0-1 [80.8kB] Fetched 80.8kB in 2s (27.3kB/s) Selecting previously deselected package redis-server. (Reading database ... 138423 files and directories currently installed.) Unpacking redis-server (from .../redis-server_2%3a1.2.0-1_i386.deb) ... Processing triggers for man-db ... Processing triggers for ureadahead ... Setting up redis-server (2:1.2.0-1) ... Starting redis-server: redis-server.
Confirmation
ubuntu@ubuntu:~/tornado-2.2$ ps aux | grep redis redis 19104 0.0 0.1 2284 716 ? Ss 21:23 0:00 /usr/bin/redis-server /etc/redis/redis.conf
Python client library for redis
ubuntu@ubuntu:~/tornado-2.2$ sudo pip install redis Downloading/unpacking redis Downloading redis-2.6.2.tar.gz Running setup.py egg_info for package redis Installing collected packages: redis Running setup.py install for redis Successfully installed redis
With redis installed, lets go to an example where Redis meets Tornado. In the example below,
- When the web server is started, redis server is initialized with key-value pairs of username and password (password is md5 hash of username in hex format) for users ‘bob’ and ‘clara’.
- On browsing to http://localhost:8888/login and POSTing the username and password details to Tornado web server, authentication of details happen from redis server.
- Relevant message for successful/unsuccessful attempt is render on user’s browser
Tornado – Autoreload
One of the most irritating (i would say) things about web development is about restarting your web server whenever there’s a change in the code base or the template files to test if the development change has been propagated correctly… If you’re a web developer you know exactly about the agony I’m referring to… Well, Tornado has something to offer in this realm as well..
tornado.autoreload module automatically detects development changes and restarts the server when a module is modified. Not only that, it can also restart when a monitored file has changed. (File change acts as a trigger) Moreover, the developer can hook into the restart call and execute a method call just before the server restarts. Let’s see all the above features with code snippets.
Example 1:
In this example, tornado.autoreload() re-starts the web server when there is a change in tornadoreload.py file. Sequence of events go like this:
- When user runs tornadoreload.py app, you don’t see any message on command line and ioloop has started.
- On browsing to http://localhost:8000/, Main class handles the GET request and renders ‘Main’ on the web page
- With the web server still running, open another terminal and edit tornadoreload.py to change say ‘Main’ to ‘MainO’
- Refer to the terminal where you are running the server, you would notice a message stating ‘Hooked before reloading…’.
- This is because when tornadoreload.py has changed, web server restarts with tornado.autoreload.start() and also before restarting it calls the hooked function with tornado.autoreload.add_reload_hook(fn) where the method fn() prints the ‘Hooked before reloading…’ message on command line..
- And if you refresh your browser, you would see ‘MainO’ pertaining to the change in tornadoreload.py app
- So as a web developer, you’re free..
But restart is destructive and cancels the pending requests..
Example 2:
In this example, we demonstrate how tornado can restart based on a change in watched (monitored) file. Here, the file ‘watch’ is being monitored for change for tornado to restart.. Also note tornado gets restarted when the app and any modules imported in the app are changed.
Many applications built on Tornado don’t use these and often end up using debug=True in tornado.web.Application constructor, which is also useful in detecting changes in module and static file content.
Tornado – Escape – Json
Tornado web server exports methods for escape/unescape html, xml, Json among others. This blog discusses about encoding and decoding JSON format.
Tornado has the following methods:
- tornado.escape.json_encode(value) – JSON’ify the Python object passed to it as argument
- tornado.escape.json_decode(value) – Converts the JSON string into Python Object
Here’s a usage example:
In this example,
1. When user browses to http://localhost:8888/blog, jsonform.html is rendered that asks for ‘Title’ and ‘Author’
2. On filling this form, a POST request is sent to /blog URL, where the posted arguments are encoded to JSON string with tornado.escape.json_encode() and rendered on the user browser
3. class Language is request handler that caters to http://localhost:8888/lang. In this class, a Python dictionary object is converted to JSON string with tornado.escape.json_encode() and responds with this JSON string to any client request.
4. When tornadojsonclient.py makes a GET request to /lang URL, JSON string is sent as a response. The client decodes this JSON string to a Python dictionary object with tornado.escape.json_decode() method
Code for tornadojsonclient.py below
Tornado – Web Sockets
WebSocket (RFC 6455) protocol provides bi-directional (or duplex) communication between web server and browser. Communication typically happens over a single TCP connection and is used to facilitate interaction between web server and browser like live streaming and real time content delivery. It is crucial to note that WebSocket is non standard HTTP connection where although the handshake happens through Http, but the communication i message based.
Please note: WebSocket protocol is different from socket programming. Different libraries (like ws4py) are available in Python that can act as client for websocket communication.
Tornado provides a tornado.websocket.WebSocketHandler class to create WebSocket Handler. Methods like get() or post() won’t work here; instead following methods need to be overridden by the server developer
open() / close() – handle open or closed sockets
on_message() / write_message() – handles messages
Here’s an example implementation along with ws4py client
In this example,
- You start a WebSocket Handler by running tornadowebsocket.py. This handler implements open(), on_message and close() methods
- When ws4pyclient.py is run in another terminal, message ‘Hello Chetan’ is sent from opened() method to web server
- This opens up the server side websocket and message ‘Socket opened’ is printed from open() method
- on_message() of server sends a message to the client with self.write_message() and closes the socket with on_close() method and print ‘Socket closed’
- The message sent by the server is then received by the client in received_message() method where the client prints ‘Received from server: Hello Chetan’
- Client then is closed with closed() method by printing ‘Closed down’
Tornado – Asynchronous Requests
Tornado is a non-blocking I/O web server. In Tornado, when the request handler serves the request made by the client, the request is closed by itself. Python decorator @tornado.web.asynchronous can be used to override the default behavior (of automatically finishing the request) and keep the request open. So, its the duty of the server developer to finish the request.
Example
In the above example, When user browses to http://127.0.0.1/, AsyncHandler handles it and sends a GET request to http://google.co.in and receives the response. Processing of this response is done by _async_callback() method. Thus in this example, when get() returns, the request has not been finished. Once the response is processed and self.finish() is called, only then request is completed.
If the developer fails to finish the request with self.finish(), the browser hangs as in the picture below.

