Sets in Python are often used for two purposes:
1. Removing the duplicate entries in a collection
2. For membership testing.
By membership, here we mean to find existence of element in a collection
The focus of this post is to evaluate performance of list, tuple and set data structures with respect to each other on the basis of membership testing.
How do I do it? Well, some code, some data collection and some analysis
from datetime import datetime listA = range(10000000) setA = set(listA) tupA = tuple(listA) def calc(data, type): start = datetime.now() if data in type: print "" end = datetime.now() print end-start calc(9999, listA) calc(9999, tupA) calc(9999, setA)
Below is the average of 10 iterations for the time taken by lists/tuples and sets for testing membership of 9999, 99999, 999999 in 10000000
|Search 9999 in 10000000|
|Search 99999 in 10000000|
|Search 999999 in 10000000|
1. for testing membership of elements at the beginning of the collection, lists or tuples perform more than 100% better than sets
2. As soon as you start testing membership of elements that are in the middle or end of the collection, sets perform 40% – 1800% better than lists or tuples
You now have a fair idea why you should think of using sets for large collections…
Hardware I used: windows 7, x64, 4GB RAM
Python version: 2.7.2