Why proper data structures matter
In Python, choosing the right built-in structure — list, dict, or set — can significantly simplify your code and improve performance. These structures are highly optimized in C, so idiomatic use yields both readability and speed. Before writing an algorithm or function, ask: How will I access and modify the data? That question typically determines your choice.
Examples of access patterns:
- Index or slice elements by position → likely a
list. - Lookup by key / map from id → use
dict. - Membership checks and uniqueness → use
set.
Lists — when & how to use them
Lists are ordered, mutable sequences. They are the most versatile container and are great when ordering matters, or you need to repeatedly append items.
Core operations and complexity (average)
- Index access
a[i]— O(1) - Append
a.append(x)— amortized O(1) - Insert/remove at arbitrary index — O(n)
- Iteration — O(n)
Common idioms
# List comprehension (fast & readable)
squares = [x*x for x in range(10)]
# Flatten one-level nested lists
flat = [y for x in list_of_lists for y in x]
When to prefer lists
- Maintaining ordered collections (history, recent items)
- When you need to preserve duplicates
- When random access by index is required
Alternatives when lists are not enough
If you need frequent pops from the left, prefer collections.deque which provides O(1) pops from both ends. If you need sorted order with frequent inserts, consider bisect module or third-party balanced-tree libs.
from collections import deque
dq = deque()
dq.append(1)
dq.appendleft(0)
dq.popleft()
Dictionaries — mapping essentials
Dictionaries are Python’s hash maps: they provide average O(1) lookup, insert, and delete. Use them whenever you need to associate keys with values.
Practical patterns
# Counting frequencies
from collections import Counter
counts = Counter(words)
# Grouping values
from collections import defaultdict
groups = defaultdict(list)
for user, tag in records:
groups[user].append(tag)
Safe access
Prefer dict.get for optional values and setdefault or defaultdict for collected values. Avoid catching KeyError as normal control flow.
value = d.get('key', default_value)
Ordered dicts & determinism
Since Python 3.7, insertion order is preserved in dicts. This is useful for predictable iteration (e.g., generating deterministic exports), but do not rely on it for algorithmic properties.
Sets — membership & mathematical operations
Sets are unordered collections of unique elements and are optimized for membership testing and set algebra (union / intersection / difference).
Examples
a = {1,2,3}
b = {3,4,5}
a & b # intersection -> {3}
a | b # union -> {1,2,3,4,5}
a - b # difference -> {1,2}
When to use sets
- Remove duplicates
- Fast membership tests (
if x in s) - Compute intersections/unions of collections
Limitations
Sets require hashable elements. Use frozenset for hashable immutable sets if you need to use a set as a key in another mapping.
Practical recipes
Below are real-world snippets you’ll use frequently.
1. Top N frequent words
from collections import Counter
import re
def top_n_words(text, n=10):
words = re.findall(r'\w+', text.lower())
return Counter(words).most_common(n)
2. Remove duplicates while preserving order
def unique_preserve_order(seq):
seen = set()
out = []
for x in seq:
if x not in seen:
seen.add(x)
out.append(x)
return out
3. Group by key
from collections import defaultdict
groups = defaultdict(list)
for item in items:
key = key_func(item)
groups[key].append(item)
4. Invert a mapping (careful with duplicate values)
def invert_map(d):
inv = {}
for k, v in d.items():
inv.setdefault(v, []).append(k)
return inv
Performance considerations
Microbenchmarks can be useful, but measure in your environment. Some rules of thumb:
- Use dict for lookups — O(1) average vs O(n) linear search in lists.
- List append is cheap (amortized O(1)); inserting at the front is costly (O(n)).
- Set membership is fast due to hashing; choose it for membership-dominant workloads.
Example benchmark: membership
import timeit
setup = "s = set(range(10000)); l = list(range(10000)); x = 9999"
timeit.timeit("x in s", setup=setup, number=10000) # set membership
timeit.timeit("x in l", setup=setup, number=10000) # list membership
On large collections set membership is orders of magnitude faster.
Memory tradeoffs
Hash-based structures (dict, set) use more memory than lists. For very large datasets, consider database-backed solutions or specialized structures (e.g., arrays, numpy, tries) depending on the use case.
Immutables: tuples & frozenset
Tuples are immutable sequences useful for fixed-size records (like coordinates) and can be used as dict keys when needed. frozenset is the immutable counterpart to set and can be hashed.
point = (x, y)
cache = {}
cache[point] = "value" # using tuple as a key
Prefer tuples for fixed heterogeneous records; prefer dataclasses or namedtuples for clarity in complex code.
Readability & maintainability
Readable code often outperforms micro-optimized code in long-term value. Some recommendations:
- Name variables meaningfully (
users_by_id, notd). - Wrap complex mutations in functions with clear intent.
- Add docstrings and type hints where helpful.
Type hints for clarity
from typing import List, Dict
def summarize(values: List[int]) -> Dict[str, float]:
return {'min': min(values), 'max': max(values), 'avg': sum(values)/len(values)}
FAQ
Q: When should I use a list vs a dict for membership?
A: If you only need to test membership (is x present?), use a set for performance or a dict if you also need to associate data with keys.
Q: Are list comprehensions faster than loops?
A: Often yes. List comprehensions are implemented in C and are typically faster and more concise for simple transformations.
Q: Is Counter slower than manual counting?
A: Counter is implemented in Python but is optimized for readability and common patterns. Manual counting using dicts can be comparable — prefer Counter for clarity.
Key takeaways
- Match your data structure to access patterns: lists for ordered sequences, dicts for mapping, sets for membership and uniqueness.
- Prefer built-in idioms (comprehensions, Counter, defaultdict) for clarity and performance.
- Measure when performance matters; memory tradeoffs exist for hash-based structures.
- Use type hints and small, focused functions for maintainability.
Armed with these patterns you'll write clearer, faster Python that scales from scripts to production systems.