Performance for testing memberships: list vs tuples vs sets
Sets in Python are often used for two purposes:
1. Removing the duplicate entries in a collection
2. For membership testing.
By membership, here we mean to find existence of element in a collection
The focus of this post is to evaluate performance of list, tuple and set data structures with respect to each other on the basis of membership testing.
How do I do it? Well, some code, some data collection and some analysis
from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)
def calc(data, type):
start = datetime.now()
if data in type:
print ""
end = datetime.now()
print end-start
calc(9999, listA)
calc(9999, tupA)
calc(9999, setA)
Below is the average of 10 iterations for the time taken by lists/tuples and sets for testing membership of 9999, 99999, 999999 in 10000000
| Search 9999 in 10000000 | ||
| list | tuple | set |
| 0.8 | 0.8 | 1.9 |
| Search 99999 in 10000000 | ||
| list | tuple | set |
| 2.6 | 2.8 | 1.7 |
| Search 999999 in 10000000 | ||
| list | tuple | set |
| 21.4 | 21.6 | 0.8 |
Conclusions
1. for testing membership of elements at the beginning of the collection, lists or tuples perform more than 100% better than sets
2. As soon as you start testing membership of elements that are in the middle or end of the collection, sets perform 40% – 1800% better than lists or tuples
You now have a fair idea why you should think of using sets for large collections…
Note
Hardware I used: windows 7, x64, 4GB RAM
Python version: 2.7.2
Posted on April 9, 2012, in Uncategorized and tagged Data structure, Development Tools, Languages, Programming, Python, software testing, Tuple. Bookmark the permalink. 1 Comment.
Interesting! could we have a mechanism or an intelligent API where given a list, the membership query function transparently chooses the best construct among set, tuples or list before querying on the same list?