Performance Tips
Python is not the fastest language, but it is fast enough for most tasks when you write it well. This guide covers practical techniques to speed up your Python code, and just as importantly, teaches you when not to optimize.
Generator Expressions vs List Comprehensions
List comprehensions create the entire list in memory at once. Generator expressions produce items one at a time, using almost no memory regardless of size.
# List comprehension: creates entire list in memory
squares = [x ** 2 for x in range(1_000_000)] # ~8 MB in memory
# Generator expression: produces one item at a time
squares = (x ** 2 for x in range(1_000_000)) # ~120 bytes!
Use generators when you only need to iterate through results and do not need to keep them all in memory, index into them, or check their length.
# Perfect use case: sum, any, all, min, max
total = sum(x ** 2 for x in range(1_000_000)) # No brackets needed!
# Check if any line contains an error
has_error = any("ERROR" in line for line in open("log.txt"))
Click "Run" to execute your code__slots__ for Memory-Efficient Classes
By default, Python objects store their attributes in a dictionary (__dict__), which has significant memory overhead. Using __slots__ replaces the dictionary with a fixed set of attribute slots.
class PointRegular:
def __init__(self, x, y):
self.x = x
self.y = y
class PointSlots:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x = x
self.y = y
This saves 40-60% memory per instance, which matters when you create millions of objects.
__slots__ and Inheritance
When using __slots__ with inheritance, every class in the hierarchy should define its own __slots__. If a parent class does not use __slots__, the subclass will still have a __dict__ and the memory savings are lost.
class Base:
__slots__ = ('x',)
class Child(Base):
__slots__ = ('y',) # Only add NEW attributes here
# 'x' is inherited from Base's __slots__
class BrokenChild(Base):
# No __slots__ defined — gets a __dict__, losing the memory benefit
pass
Tradeoffs of __slots__:
- Cannot add arbitrary attributes at runtime
- Cannot use multiple inheritance with two parents that both have non-empty
__slots__(unless carefully designed) - Some libraries that inspect
__dict__may not work correctly
Click "Run" to execute your codeLocal Variable Lookups Are Faster Than Global
Python looks up local variables much faster than global or built-in variables. In tight loops, this can make a measurable difference.
# Slow: global lookup on every iteration
import math
def slow_sqrt_sum(numbers):
total = 0
for n in numbers:
total += math.sqrt(n) # Global lookup for math, then attribute lookup for sqrt
return total
# Fast: local alias
def fast_sqrt_sum(numbers):
sqrt = math.sqrt # Bind to local variable once
total = 0
for n in numbers:
total += sqrt(n) # Local lookup is faster
return total
This technique matters in hot loops. For most code, readability is more important.
Click "Run" to execute your codeString Concatenation: join() vs +
Building strings with + in a loop creates a new string object on each iteration, because strings are immutable. Use str.join() instead.
# Slow: O(n^2) — creates a new string every iteration
result = ""
for word in words:
result += word + " "
# Fast: O(n) — single allocation
result = " ".join(words)
Click "Run" to execute your codeThe collections Module
Python's collections module provides specialized container types that are faster or more convenient than the built-in types for specific use cases.
defaultdict: Auto-Initializing Dictionary
from collections import defaultdict
# Instead of checking if key exists
word_count = {}
for word in words:
if word not in word_count:
word_count[word] = 0
word_count[word] += 1
# Use defaultdict
word_count = defaultdict(int)
for word in words:
word_count[word] += 1 # Auto-initializes to 0
Counter: Counting Made Easy
from collections import Counter
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
counts = Counter(words)
print(counts.most_common(2)) # [('apple', 3), ('banana', 2)]
deque: Fast Appends and Pops from Both Ends
Lists are slow when inserting or removing from the front (O(n)). deque is O(1) for both ends.
from collections import deque
d = deque([1, 2, 3])
d.appendleft(0) # O(1) — list.insert(0, x) is O(n)!
d.popleft() # O(1) — list.pop(0) is O(n)!
namedtuple: Lightweight Immutable Objects
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(3, 4)
print(p.x, p.y) # Access by name, but memory-efficient like a tuple
Click "Run" to execute your codeUsing Sets for Membership Testing
Checking if an item is in a list is O(n). Checking if it is in a set is O(1). For repeated membership tests, convert to a set first.
# Slow: O(n) lookup for each check
valid_ids = [1, 2, 3, ..., 10000]
if user_id in valid_ids: # Scans entire list
...
# Fast: O(1) lookup
valid_ids = {1, 2, 3, ..., 10000}
if user_id in valid_ids: # Hash table lookup
...
Click "Run" to execute your codeAvoiding Unnecessary Copies
Python's assignment creates references, not copies. Understand when copies happen and when they are needed.
# Reference (no copy) — fast, shared mutations
a = [1, 2, 3]
b = a # b points to the same list
b.append(4) # a is also [1, 2, 3, 4]!
# Shallow copy — copies the container but not the items
b = a[:] # or a.copy() or list(a)
# Deep copy — copies everything recursively
import copy
b = copy.deepcopy(a)
Avoid unnecessary copies in function calls:
# Wasteful: creates a copy every time
def process(items):
sorted_items = sorted(items) # Creates new list
...
# Better: sort in place if you own the data
def process(items):
items.sort() # Sorts in place, no copy
...
Click "Run" to execute your codeThe Cost of Polymorphism
Python's dynamic dispatch means every method call involves looking up the method in the object's class hierarchy at runtime. This is slower than a direct function call. For performance-critical code, understand the trade-offs.
Virtual Method Dispatch in Python
Every method call on an object (obj.method()) requires:
- Looking up
methodinobj.__class__.__dict__ - If not found, walking up the MRO (Method Resolution Order)
- Creating a bound method object
- Calling the bound method
This overhead is small for a single call but adds up in tight loops over millions of objects.
# Slower: polymorphic dispatch through class hierarchy
class Shape:
def area(self):
raise NotImplementedError
class Circle(Shape):
def __init__(self, r):
self.r = r
def area(self):
return 3.14159 * self.r ** 2
shapes = [Circle(i) for i in range(100_000)]
total = sum(s.area() for s in shapes) # Method lookup on each iteration
# Faster: simple function
def circle_area(r):
return 3.14159 * r ** 2
radii = list(range(100_000))
total = sum(circle_area(r) for r in radii) # No method dispatch overhead
When to Prefer Functions Over Class Hierarchies
- Hot loops processing large datasets: plain functions can be 20-50% faster
- Simple transformations that do not need state: a function is simpler and faster
- Numerical computations: consider using NumPy or other optimized libraries
However, do not sacrifice code organization for micro-optimization. Method dispatch overhead rarely matters in real applications. Profile first.
Click "Run" to execute your codeProfiling with cProfile and timeit
Never optimize without measuring first. Python provides excellent profiling tools.
timeit for Micro-Benchmarks
import timeit
# Time a single expression
timeit.timeit('sum(range(1000))', number=10000)
# Compare approaches
time_list = timeit.timeit('[x**2 for x in range(100)]', number=10000)
time_gen = timeit.timeit('list(x**2 for x in range(100))', number=10000)
cProfile for Function-Level Profiling
import cProfile
def my_program():
data = [i ** 2 for i in range(10000)]
total = sum(data)
filtered = [x for x in data if x > 1000]
return len(filtered)
cProfile.run('my_program()')
The output shows how many times each function was called and how much time was spent in each.
Click "Run" to execute your codeWhen to Optimize
The most important performance tip is knowing when not to optimize. Follow these rules:
- Make it work first. Write clear, correct code.
- Measure. Use profiling to find actual bottlenecks. Most code is not performance-critical.
- Optimize the bottleneck. Focus on the 10% of code that accounts for 90% of execution time.
- Consider algorithmic changes first. Switching from O(n^2) to O(n log n) beats any micro-optimization.
- Only then micro-optimize. Local variable caching,
__slots__, etc.
"Premature optimization is the root of all evil." — Donald Knuth
Common algorithmic improvements:
| Instead of... | Use... | Improvement |
|---|---|---|
| Nested loops for search | Dictionary/set lookup | O(n^2) to O(n) |
| Repeated string concatenation | str.join() |
O(n^2) to O(n) |
| Sorting to find min/max | min()/max() |
O(n log n) to O(n) |
| List for queue operations | collections.deque |
O(n) pops to O(1) |
| Re-computing in loops | Cache/memoize results | Depends on problem |
Click "Run" to execute your codeSummary
| Technique | When to Use | Benefit |
|---|---|---|
| Generator expressions | Iterating without keeping all items | Memory savings |
__slots__ |
Many instances of the same class | 40-60% less memory per instance |
| Local variable binding | Tight loops with global lookups | Faster variable access |
str.join() |
Building strings in a loop | O(n) instead of O(n^2) |
collections types |
Counting, grouping, queues, records | Built-in optimized containers |
| Sets for membership | Repeated in checks |
O(1) instead of O(n) |
| Functions over methods | Hot loops with simple operations | Avoid dispatch overhead |
| Profile first | Always | Optimize the right thing |
What's Next?
Learn how to keep your code clean and catch problems early with IDE Warnings & Linting.