乐闻世界logo
搜索文章和话题

What is the difference between iterators and generators in Python?

2月17日 21:45

Python Iterators and Generators Explained

Iterator

What is an Iterator

An iterator is an object that implements the iterator protocol, containing two methods:

  • __iter__(): Returns the iterator object itself
  • __next__(): Returns the next element of the container, raises StopIteration exception if no more elements

Iterator Example

python
class MyIterator: def __init__(self, data): self.data = data self.index = 0 def __iter__(self): return self def __next__(self): if self.index >= len(self.data): raise StopIteration value = self.data[self.index] self.index += 1 return value # Using iterator my_iter = MyIterator([1, 2, 3, 4, 5]) for item in my_iter: print(item)

Iterator Characteristics

  1. Lazy Evaluation: Only computes the next value when needed
  2. Memory Efficient: Doesn't need to store all data at once
  3. Single-pass: Can only traverse forward, cannot go back
  4. One-time Use: Iterator cannot be reused after traversal
python
# One-time use characteristic of iterator my_list = [1, 2, 3] my_iter = iter(my_list) print(list(my_iter)) # [1, 2, 3] print(list(my_iter)) # [] - iterator exhausted

Iterable

What is an Iterable

An iterable is an object that implements the __iter__() method, which returns an iterator. Common iterables include lists, tuples, strings, dictionaries, sets, etc.

Iterable Example

python
# Built-in iterables my_list = [1, 2, 3] my_tuple = (1, 2, 3) my_string = "hello" my_dict = {'a': 1, 'b': 2} my_set = {1, 2, 3} # Check if iterable from collections.abc import Iterable print(isinstance(my_list, Iterable)) # True print(isinstance(123, Iterable)) # False

Relationship between Iterable and Iterator

python
# Iterable gets iterator through iter() function my_list = [1, 2, 3] my_iterator = iter(my_list) print(next(my_iterator)) # 1 print(next(my_iterator)) # 2 print(next(my_iterator)) # 3 # print(next(my_iterator)) # StopIteration

Generator

What is a Generator

A generator is a special type of iterator created using functions and the yield statement. Generator functions pause during execution and save the current state, resuming from where they left off on the next call.

Generator Function Example

python
def simple_generator(): yield 1 yield 2 yield 3 # Using generator gen = simple_generator() print(next(gen)) # 1 print(next(gen)) # 2 print(next(gen)) # 3

Generator Expression

python
# Generator expression (similar to list comprehension) gen_expr = (x * x for x in range(10)) print(list(gen_expr)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # Generator expression saves memory # List comprehension list_comp = [x * x for x in range(1000000)] # Uses a lot of memory # Generator expression gen_expr = (x * x for x in range(1000000)) # Almost no memory usage

Advantages of Generators

  1. Memory Efficiency: Doesn't need to generate all values at once
  2. Lazy Computation: Only computes values when needed
  3. Infinite Sequences: Can represent infinitely long sequences
  4. Pipeline Processing: Can chain multiple generators
python
# Infinite sequence generator def infinite_sequence(): num = 0 while True: yield num num += 1 # Using infinite sequence gen = infinite_sequence() for i in range(10): print(next(gen)) # 0, 1, 2, ..., 9

Comparison of Iterator and Generator

Similarities

  • Both implement the iterator protocol
  • Both support lazy evaluation
  • Both can use next() function to get the next value
  • Both can be used in for loops

Differences

FeatureIteratorGenerator
ImplementationImplement __iter__ and __next__ methodsUse yield statement
Code ComplexityNeed to manually manage stateAutomatic state management
Memory UsageNeed to store all dataOnly saves current state
Code SimplicityRelatively complexMore concise

Code Comparison

python
# Iterator implementation class SquaresIterator: def __init__(self, n): self.n = n self.current = 0 def __iter__(self): return self def __next__(self): if self.current >= self.n: raise StopIteration result = self.current ** 2 self.current += 1 return result # Generator implementation def squares_generator(n): for i in range(n): yield i ** 2 # Usage comparison print("Iterator:", list(SquaresIterator(5))) # [0, 1, 4, 9, 16] print("Generator:", list(squares_generator(5))) # [0, 1, 4, 9, 16]

Practical Application Scenarios

1. Processing Large Files

python
def read_large_file(file_path): """Read large files line by line to avoid memory overflow""" with open(file_path, 'r') as file: for line in file: yield line.strip() # Using generator to process large files for line in read_large_file('large_file.txt'): process_line(line) # Process each line

2. Data Pipeline

python
def read_data(source): """Read data""" for item in source: yield item def filter_data(data, predicate): """Filter data""" for item in data: if predicate(item): yield item def transform_data(data, func): """Transform data""" for item in data: yield func(item) # Using data pipeline data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] pipeline = transform_data( filter_data( read_data(data), lambda x: x % 2 == 0 # Filter even numbers ), lambda x: x * 2 # Transform to double ) print(list(pipeline)) # [4, 8, 12, 16, 20]

3. Fibonacci Sequence

python
def fibonacci(): """Generate Fibonacci sequence""" a, b = 0, 1 while True: yield a a, b = b, a + b # Get first 10 Fibonacci numbers fib = fibonacci() fib_sequence = [next(fib) for _ in range(10)] print(fib_sequence) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

4. Batch Processing

python
def batch_generator(data, batch_size): """Process data in batches""" for i in range(0, len(data), batch_size): yield data[i:i + batch_size] # Using batch generator data = list(range(100)) for batch in batch_generator(data, 10): print(f"Processing batch: {batch}")

Advanced Generator Usage

1. yield from

python
def sub_generator(): yield 1 yield 2 def main_generator(): yield from sub_generator() # Delegate to sub-generator yield 3 print(list(main_generator())) # [1, 2, 3]

2. Sending Values to Generator

python
def accumulator(): total = 0 while True: value = yield total if value is not None: total += value acc = accumulator() next(acc) # Start generator print(acc.send(10)) # 10 print(acc.send(20)) # 30 print(acc.send(30)) # 60

3. Throwing Exceptions in Generator

python
def my_generator(): try: while True: value = yield print(f"Received value: {value}") except ValueError: print("Caught ValueError") finally: print("Generator closed") gen = my_generator() next(gen) gen.send(1) # Received value: 1 gen.throw(ValueError) # Caught ValueError gen.close() # Generator closed

Performance Comparison

python
import time import sys # Memory usage comparison def list_comprehension(n): return [i ** 2 for i in range(n)] def generator_expression(n): return (i ** 2 for i in range(n)) # Memory usage list_obj = list_comprehension(1000000) gen_obj = generator_expression(1000000) print(f"List memory usage: {sys.getsizeof(list_obj)} bytes") print(f"Generator memory usage: {sys.getsizeof(gen_obj)} bytes") # Execution time comparison start = time.time() sum([i ** 2 for i in range(1000000)]) print(f"List comprehension time: {time.time() - start:.4f} seconds") start = time.time() sum(i ** 2 for i in range(1000000)) print(f"Generator expression time: {time.time() - start:.4f} seconds")

Best Practices

1. Choose the Right Tool

python
# Need to access data multiple times - use list data = [1, 2, 3, 4, 5] result1 = sum(data) result2 = max(data) # Only need one traversal - use generator data = (i for i in range(1000000)) result = sum(data)

2. Avoid Premature Evaluation

python
# Bad practice def get_all_data(): return [process_item(item) for item in large_dataset] # Good practice def get_data_generator(): for item in large_dataset: yield process_item(item)

3. Use itertools Module

python
import itertools # Infinite counter counter = itertools.count(start=0, step=2) print(list(itertools.islice(counter, 5))) # [0, 2, 4, 6, 8] # Cycling iterator cycle = itertools.cycle([1, 2, 3]) print(list(itertools.islice(cycle, 7))) # [1, 2, 3, 1, 2, 3, 1] # Chain iterators chain = itertools.chain([1, 2], [3, 4], [5, 6]) print(list(chain)) # [1, 2, 3, 4, 5, 6]

Summary

Iterator

  • Implements __iter__ and __next__ methods
  • Manually manages state
  • Suitable for complex iteration logic
  • Can be created repeatedly

Generator

  • Created using yield statement
  • Automatically manages state
  • More concise code
  • Higher memory efficiency
  • Suitable for data stream processing

Usage Recommendations

  1. Small datasets: Use lists or tuples
  2. Large datasets: Use generators
  3. Complex logic: Use iterator classes
  4. Data pipelines: Use generator expressions
  5. Infinite sequences: Use generator functions

Understanding the difference between iterators and generators helps write more efficient and elegant Python code.

标签:Python