Python Iterators and Generators Explained
Iterator
What is an Iterator
An iterator is an object that implements the iterator protocol, containing two methods:
__iter__(): Returns the iterator object itself__next__(): Returns the next element of the container, raisesStopIterationexception if no more elements
Iterator Example
pythonclass MyIterator: def __init__(self, data): self.data = data self.index = 0 def __iter__(self): return self def __next__(self): if self.index >= len(self.data): raise StopIteration value = self.data[self.index] self.index += 1 return value # Using iterator my_iter = MyIterator([1, 2, 3, 4, 5]) for item in my_iter: print(item)
Iterator Characteristics
- Lazy Evaluation: Only computes the next value when needed
- Memory Efficient: Doesn't need to store all data at once
- Single-pass: Can only traverse forward, cannot go back
- One-time Use: Iterator cannot be reused after traversal
python# One-time use characteristic of iterator my_list = [1, 2, 3] my_iter = iter(my_list) print(list(my_iter)) # [1, 2, 3] print(list(my_iter)) # [] - iterator exhausted
Iterable
What is an Iterable
An iterable is an object that implements the __iter__() method, which returns an iterator. Common iterables include lists, tuples, strings, dictionaries, sets, etc.
Iterable Example
python# Built-in iterables my_list = [1, 2, 3] my_tuple = (1, 2, 3) my_string = "hello" my_dict = {'a': 1, 'b': 2} my_set = {1, 2, 3} # Check if iterable from collections.abc import Iterable print(isinstance(my_list, Iterable)) # True print(isinstance(123, Iterable)) # False
Relationship between Iterable and Iterator
python# Iterable gets iterator through iter() function my_list = [1, 2, 3] my_iterator = iter(my_list) print(next(my_iterator)) # 1 print(next(my_iterator)) # 2 print(next(my_iterator)) # 3 # print(next(my_iterator)) # StopIteration
Generator
What is a Generator
A generator is a special type of iterator created using functions and the yield statement. Generator functions pause during execution and save the current state, resuming from where they left off on the next call.
Generator Function Example
pythondef simple_generator(): yield 1 yield 2 yield 3 # Using generator gen = simple_generator() print(next(gen)) # 1 print(next(gen)) # 2 print(next(gen)) # 3
Generator Expression
python# Generator expression (similar to list comprehension) gen_expr = (x * x for x in range(10)) print(list(gen_expr)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # Generator expression saves memory # List comprehension list_comp = [x * x for x in range(1000000)] # Uses a lot of memory # Generator expression gen_expr = (x * x for x in range(1000000)) # Almost no memory usage
Advantages of Generators
- Memory Efficiency: Doesn't need to generate all values at once
- Lazy Computation: Only computes values when needed
- Infinite Sequences: Can represent infinitely long sequences
- Pipeline Processing: Can chain multiple generators
python# Infinite sequence generator def infinite_sequence(): num = 0 while True: yield num num += 1 # Using infinite sequence gen = infinite_sequence() for i in range(10): print(next(gen)) # 0, 1, 2, ..., 9
Comparison of Iterator and Generator
Similarities
- Both implement the iterator protocol
- Both support lazy evaluation
- Both can use
next()function to get the next value - Both can be used in
forloops
Differences
| Feature | Iterator | Generator |
|---|---|---|
| Implementation | Implement __iter__ and __next__ methods | Use yield statement |
| Code Complexity | Need to manually manage state | Automatic state management |
| Memory Usage | Need to store all data | Only saves current state |
| Code Simplicity | Relatively complex | More concise |
Code Comparison
python# Iterator implementation class SquaresIterator: def __init__(self, n): self.n = n self.current = 0 def __iter__(self): return self def __next__(self): if self.current >= self.n: raise StopIteration result = self.current ** 2 self.current += 1 return result # Generator implementation def squares_generator(n): for i in range(n): yield i ** 2 # Usage comparison print("Iterator:", list(SquaresIterator(5))) # [0, 1, 4, 9, 16] print("Generator:", list(squares_generator(5))) # [0, 1, 4, 9, 16]
Practical Application Scenarios
1. Processing Large Files
pythondef read_large_file(file_path): """Read large files line by line to avoid memory overflow""" with open(file_path, 'r') as file: for line in file: yield line.strip() # Using generator to process large files for line in read_large_file('large_file.txt'): process_line(line) # Process each line
2. Data Pipeline
pythondef read_data(source): """Read data""" for item in source: yield item def filter_data(data, predicate): """Filter data""" for item in data: if predicate(item): yield item def transform_data(data, func): """Transform data""" for item in data: yield func(item) # Using data pipeline data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] pipeline = transform_data( filter_data( read_data(data), lambda x: x % 2 == 0 # Filter even numbers ), lambda x: x * 2 # Transform to double ) print(list(pipeline)) # [4, 8, 12, 16, 20]
3. Fibonacci Sequence
pythondef fibonacci(): """Generate Fibonacci sequence""" a, b = 0, 1 while True: yield a a, b = b, a + b # Get first 10 Fibonacci numbers fib = fibonacci() fib_sequence = [next(fib) for _ in range(10)] print(fib_sequence) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
4. Batch Processing
pythondef batch_generator(data, batch_size): """Process data in batches""" for i in range(0, len(data), batch_size): yield data[i:i + batch_size] # Using batch generator data = list(range(100)) for batch in batch_generator(data, 10): print(f"Processing batch: {batch}")
Advanced Generator Usage
1. yield from
pythondef sub_generator(): yield 1 yield 2 def main_generator(): yield from sub_generator() # Delegate to sub-generator yield 3 print(list(main_generator())) # [1, 2, 3]
2. Sending Values to Generator
pythondef accumulator(): total = 0 while True: value = yield total if value is not None: total += value acc = accumulator() next(acc) # Start generator print(acc.send(10)) # 10 print(acc.send(20)) # 30 print(acc.send(30)) # 60
3. Throwing Exceptions in Generator
pythondef my_generator(): try: while True: value = yield print(f"Received value: {value}") except ValueError: print("Caught ValueError") finally: print("Generator closed") gen = my_generator() next(gen) gen.send(1) # Received value: 1 gen.throw(ValueError) # Caught ValueError gen.close() # Generator closed
Performance Comparison
pythonimport time import sys # Memory usage comparison def list_comprehension(n): return [i ** 2 for i in range(n)] def generator_expression(n): return (i ** 2 for i in range(n)) # Memory usage list_obj = list_comprehension(1000000) gen_obj = generator_expression(1000000) print(f"List memory usage: {sys.getsizeof(list_obj)} bytes") print(f"Generator memory usage: {sys.getsizeof(gen_obj)} bytes") # Execution time comparison start = time.time() sum([i ** 2 for i in range(1000000)]) print(f"List comprehension time: {time.time() - start:.4f} seconds") start = time.time() sum(i ** 2 for i in range(1000000)) print(f"Generator expression time: {time.time() - start:.4f} seconds")
Best Practices
1. Choose the Right Tool
python# Need to access data multiple times - use list data = [1, 2, 3, 4, 5] result1 = sum(data) result2 = max(data) # Only need one traversal - use generator data = (i for i in range(1000000)) result = sum(data)
2. Avoid Premature Evaluation
python# Bad practice def get_all_data(): return [process_item(item) for item in large_dataset] # Good practice def get_data_generator(): for item in large_dataset: yield process_item(item)
3. Use itertools Module
pythonimport itertools # Infinite counter counter = itertools.count(start=0, step=2) print(list(itertools.islice(counter, 5))) # [0, 2, 4, 6, 8] # Cycling iterator cycle = itertools.cycle([1, 2, 3]) print(list(itertools.islice(cycle, 7))) # [1, 2, 3, 1, 2, 3, 1] # Chain iterators chain = itertools.chain([1, 2], [3, 4], [5, 6]) print(list(chain)) # [1, 2, 3, 4, 5, 6]
Summary
Iterator
- Implements
__iter__and__next__methods - Manually manages state
- Suitable for complex iteration logic
- Can be created repeatedly
Generator
- Created using
yieldstatement - Automatically manages state
- More concise code
- Higher memory efficiency
- Suitable for data stream processing
Usage Recommendations
- Small datasets: Use lists or tuples
- Large datasets: Use generators
- Complex logic: Use iterator classes
- Data pipelines: Use generator expressions
- Infinite sequences: Use generator functions
Understanding the difference between iterators and generators helps write more efficient and elegant Python code.