Chapter 23: Generators and Iterators

Every time you write a for loop in Python, something called an iterator is doing the work. When you call range(1_000_000), Python doesn't create a list of a million numbers — it creates an object that produces one number at a time, on demand. That's a generator.

Understanding generators and iterators explains how Python loops actually work, lets you process huge datasets without running out of memory, and gives you a tool for writing elegant, lazy pipelines.

How `for` Loops Actually Work

You've written hundreds of for loops. Here's what Python does behind the scenes:

for item in [1, 2, 3]:
    print(item)

Python silently translates this into something like:

_iter = iter([1, 2, 3])   # get an iterator from the list
while True:
    try:
        item = next(_iter)   # get the next value
        print(item)
    except StopIteration:    # no more values -> stop the loop
        break

Two built-in functions do all the work:

iter(obj) — calls obj.__iter__() and returns an iterator
next(iterator) — calls iterator.__next__() and returns the next value, or raises StopIteration when exhausted

numbers = [10, 20, 30]
it = iter(numbers)

print(next(it))   # 10
print(next(it))   # 20
print(next(it))   # 30

try:
    next(it)
except StopIteration:
    print("No more values.")

Any object that implements __iter__ and __next__ is an iterator. Lists, tuples, strings, dicts, files — all iterable. But they are not iterators themselves; you call iter() on them to get an iterator.

nums = [1, 2, 3]
print(hasattr(nums, "__iter__"))    # True  — it's iterable
print(hasattr(nums, "__next__"))    # False — it's NOT an iterator

it = iter(nums)
print(hasattr(it, "__iter__"))    # True
print(hasattr(it, "__next__"))    # True  — now it IS an iterator

Writing an Iterator Class

You can make any class iterable by implementing __iter__ and __next__:

class Countdown:
    """Counts down from start to 0."""
    def __init__(self, start):
        self.current = start

    def __iter__(self):
        return self   # the object is its own iterator

    def __next__(self):
        if self.current < 0:
            raise StopIteration
        value = self.current
        self.current -= 1
        return value


for n in Countdown(5):
    print(n, end=" ")
print()   # 5 4 3 2 1 0

This works — but it's verbose. Generators let you write the same thing in a fraction of the code.

Generator Functions: `yield`

A generator function looks like a normal function but uses yield instead of return. When called, it returns a generator object — an iterator that produces values one at a time.

def countdown(start):
    while start >= 0:
        yield start
        start -= 1


gen = countdown(5)
print(type(gen))   # <class 'generator'>

print(next(gen))   # 5
print(next(gen))   # 4
print(next(gen))   # 3

# Or just loop over it
for n in countdown(3):
    print(n, end=" ")   # 3 2 1 0
print()

What yield does:

Produces a value (like return)
Pauses the function, saving its entire state (local variables, which line it's on)
On the next next() call, resumes from exactly where it paused

This is the key insight: the function's state is frozen between yields. Local variables keep their values. The instruction pointer doesn't move. Nothing runs until you ask for the next value.

def trace_generator():
    print("Start")
    yield 1
    print("After first yield")
    yield 2
    print("After second yield")
    yield 3
    print("After third yield — generator done")


gen = trace_generator()
print(f"Got: {next(gen)}")   # prints "Start", returns 1
print(f"Got: {next(gen)}")   # prints "After first yield", returns 2
print(f"Got: {next(gen)}")   # prints "After second yield", returns 3

try:
    next(gen)                # prints "After third yield", then raises StopIteration
except StopIteration:
    print("Generator exhausted.")

Output:

Start
Got: 1
After first yield
Got: 2
After second yield
Got: 3
After third yield — generator done
Generator exhausted.

Memory Efficiency: The Big Win

A list stores everything in memory at once. A generator produces one item at a time and immediately discards it.

import sys

# List: all million numbers in memory
big_list = [x ** 2 for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(big_list):,} bytes")   # ~8,000,056 bytes

# Generator: produces one number at a time
def squares(n):
    for x in range(n):
        yield x ** 2

big_gen = squares(1_000_000)
print(f"Generator size: {sys.getsizeof(big_gen):,} bytes")   # 112 bytes

The generator is 112 bytes regardless of how many items it will eventually produce. The list grows with n. For large data, this is the difference between a program that works and one that crashes.

# Sum a billion numbers — impossible with a list on most machines
total = sum(x ** 2 for x in range(1_000_000_000))
print(total)   # works fine — never more than a few integers in memory at once

Common Generator Patterns

Infinite sequences

def naturals(start=1):
    """Generate 1, 2, 3, 4, ... forever."""
    n = start
    while True:
        yield n
        n += 1


def take(n, iterable):
    """Return the first n items from an iterable."""
    for _, item in zip(range(n), iterable):
        yield item


print(list(take(10, naturals())))   # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

An infinite generator is safe because values are produced only when requested. As long as you take a finite number, it terminates.

def fibonacci():
    """Generate the Fibonacci sequence forever."""
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b


fibs = fibonacci()
print([next(fibs) for _ in range(12)])
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

Reading large files line by line

def read_large_file(path):
    """Yield one line at a time — never loads the whole file."""
    with open(path) as f:
        for line in f:
            yield line.rstrip("\n")


# Only one line in memory at any time, even for a 10GB file
for line in read_large_file("huge_log.txt"):
    if "ERROR" in line:
        print(line)

This is how you process log files that are larger than your RAM.

Filtering and transforming

def where(iterable, predicate):
    """Yield only items where predicate(item) is True."""
    for item in iterable:
        if predicate(item):
            yield item


def select(iterable, transform):
    """Yield transform(item) for every item."""
    for item in iterable:
        yield transform(item)


numbers = range(1, 21)
evens   = where(numbers, lambda n: n % 2 == 0)
doubled = select(evens, lambda n: n * 2)

print(list(doubled))   # [4, 8, 12, 16, 20, 24, 28, 32, 36, 40]

These are your own filter() and map() — written as generators.

Batching

def batches(iterable, size):
    """Yield successive non-overlapping chunks of `size`."""
    batch = []
    for item in iterable:
        batch.append(item)
        if len(batch) == size:
            yield batch
            batch = []
    if batch:
        yield batch


numbers = range(1, 22)
for batch in batches(numbers, 5):
    print(batch)

Output:

[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]
[11, 12, 13, 14, 15]
[16, 17, 18, 19, 20]
[21]

Useful for sending data to an API in batches, writing CSV in chunks, etc.

Generator Expressions

You've seen list comprehensions. A generator expression is the same syntax with parentheses instead of brackets:

# List comprehension — all values computed now, stored in memory
squares_list = [x ** 2 for x in range(1_000_000)]

# Generator expression — values produced on demand
squares_gen  = (x ** 2 for x in range(1_000_000))

# Both support iteration
print(next(squares_gen))   # 0
print(next(squares_gen))   # 1
print(next(squares_gen))   # 4

When you pass a generator expression as the sole argument to a function, you can drop the extra parentheses:

# Both are equivalent
total = sum((x ** 2 for x in range(100)))
total = sum(x ** 2 for x in range(100))   # cleaner

# max/min/any/all all work this way
largest_even = max(x for x in range(100) if x % 2 == 0)   # 98
has_negative = any(x < 0 for x in [1, 2, -3, 4])          # True
all_positive = all(x > 0 for x in [1, 2,  3, 4])          # True

`yield from` — Delegating to Sub-generators

yield from lets one generator delegate to another:

def first_five():
    yield from range(1, 6)

def second_five():
    yield from range(6, 11)

def one_to_ten():
    yield from first_five()
    yield from second_five()


print(list(one_to_ten()))   # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

This is cleaner than looping and re-yielding:

# Instead of this:
def flatten_messy(nested):
    for sublist in nested:
        for item in sublist:
            yield item

# Write this:
def flatten(nested):
    for sublist in nested:
        yield from sublist


data = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
print(list(flatten(data)))   # [1, 2, 3, 4, 5, 6, 7, 8, 9]

Recursive flattening with `yield from`

def deep_flatten(obj):
    """Flatten arbitrarily nested lists."""
    if isinstance(obj, list):
        for item in obj:
            yield from deep_flatten(item)
    else:
        yield obj


deep = [1, [2, [3, [4, [5]]]]]
print(list(deep_flatten(deep)))   # [1, 2, 3, 4, 5]

Recursive generators with yield from are one of the most elegant patterns in Python.

Sending Values into a Generator

Generators can receive values too, using the .send() method. This turns them into coroutines — functions that both produce and consume values.

def accumulator():
    """Running total — receives numbers and yields the running sum."""
    total = 0
    while True:
        value = yield total   # yield current total, receive next value
        if value is None:
            break
        total += value


acc = accumulator()
next(acc)         # prime the generator (advance to first yield)

print(acc.send(10))   # 10
print(acc.send(5))    # 15
print(acc.send(20))   # 35
print(acc.send(3))    # 38
acc.close()           # clean shutdown

You must call next(gen) once first to advance the generator to its first yield. After that, .send(value) resumes the generator and passes value back as the result of the yield expression.

This is the foundation of Python's async/await system — but that's a topic for a later chapter.

`itertools` — Your Generator Toolkit

Python's itertools module is a collection of fast, memory-efficient generator-based tools. You met a few in Chapter 16; here's a deeper look at the ones you'll use most.

import itertools

# ── Infinite iterators ────────────────────────────────────────────────────────

# count(start, step) — like range but infinite
for n in itertools.islice(itertools.count(0, 5), 5):
    print(n, end=" ")   # 0 5 10 15 20
print()

# cycle(iterable) — repeat forever
colors = itertools.cycle(["red", "green", "blue"])
for _ in range(7):
    print(next(colors), end=" ")   # red green blue red green blue red
print()

# repeat(value, times) — repeat a value
print(list(itertools.repeat("Python", 3)))   # ['Python', 'Python', 'Python']


# ── Slicing and selecting ─────────────────────────────────────────────────────

# islice — slice any iterator (like list slicing, but lazy)
gen = (x ** 2 for x in naturals())
print(list(itertools.islice(gen, 5, 15)))   # squares from index 5 to 14


# ── Combining iterators ───────────────────────────────────────────────────────

# chain — concatenate multiple iterables
combined = itertools.chain([1, 2], [3, 4], [5, 6])
print(list(combined))   # [1, 2, 3, 4, 5, 6]

# zip_longest — zip with a fill value when lengths differ
a = [1, 2, 3]
b = ["a", "b"]
print(list(itertools.zip_longest(a, b, fillvalue="?")))
# [(1, 'a'), (2, 'b'), (3, '?')]

# product — Cartesian product (nested loops replaced)
suits  = ["♠", "♥", "♦", "♣"]
values = ["A", "2", "K"]
cards  = list(itertools.product(values, suits))
print(cards[:4])   # [('A', '♠'), ('A', '♥'), ('A', '♦'), ('A', '♣')]

# combinations and permutations
items = ["a", "b", "c", "d"]
print(list(itertools.combinations(items, 2)))
# [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]

print(list(itertools.permutations(["x", "y", "z"], 2)))
# [('x', 'y'), ('x', 'z'), ('y', 'x'), ('y', 'z'), ('z', 'x'), ('z', 'y')]


# ── Grouping ──────────────────────────────────────────────────────────────────

# groupby — group consecutive items by a key (data must be sorted first)
data = [
    {"name": "Alice",  "dept": "Eng"},
    {"name": "Bob",    "dept": "Eng"},
    {"name": "Carlos", "dept": "HR"},
    {"name": "Diana",  "dept": "HR"},
    {"name": "Eve",    "dept": "Sales"},
]
data.sort(key=lambda x: x["dept"])

for dept, members in itertools.groupby(data, key=lambda x: x["dept"]):
    names = [m["name"] for m in members]
    print(f"{dept}: {names}")

Output:

Eng: ['Alice', 'Bob']
HR: ['Carlos', 'Diana']
Sales: ['Eve']

Building a Lazy Data Pipeline

Generators compose beautifully. Each stage produces values one at a time; the next stage consumes them. Nothing runs until you pull from the end of the pipeline.

import csv
import itertools

def read_csv(path):
    """Yield one dict per row."""
    with open(path, newline="") as f:
        yield from csv.DictReader(f)


def where(iterable, predicate):
    return (item for item in iterable if predicate(item))


def select(iterable, *fields):
    return ({f: item[f] for f in fields} for item in iterable)


def convert(iterable, **converters):
    for item in iterable:
        row = dict(item)
        for field, func in converters.items():
            if field in row:
                row[field] = func(row[field])
        yield row


def take(n, iterable):
    yield from itertools.islice(iterable, n)


# Pipeline — nothing runs until list() at the end
pipeline = (
    read_csv("sales.csv")
    |> where(lambda r: r["region"] == "EMEA")        # not real Python syntax...
)

Python doesn't have a |> pipe operator, but we chain calls:

def run_pipeline(path):
    rows     = read_csv(path)
    emea     = where(rows, lambda r: r["region"] == "EMEA")
    trimmed  = select(emea, "date", "product", "amount", "region")
    typed    = convert(trimmed, amount=float)
    top10    = take(10, typed)

    for row in top10:
        print(row)

# Each generator is lazy — the file is read one line at a time,
# filtered, projected, converted, and printed, all without ever
# holding more than one row in memory (beyond what the generators need).

This is how production ETL pipelines work — reading gigabytes of CSV without materializing any of it in RAM.

Project: Build Your Own `range`

Let's build a complete iterator class that matches the behavior of Python's built-in range:

class MyRange:
    """
    A lazy range iterator matching Python's built-in range.
    Supports: iteration, len(), in operator, negative step, slicing preview.
    """
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            start, stop = 0, start
        if step == 0:
            raise ValueError("MyRange() step argument must not be zero.")
        self.start = start
        self.stop  = stop
        self.step  = step

    def __iter__(self):
        current = self.start
        if self.step > 0:
            while current < self.stop:
                yield current
                current += self.step
        else:
            while current > self.stop:
                yield current
                current += self.step

    def __len__(self):
        if self.step > 0:
            return max(0, (self.stop - self.start + self.step - 1) // self.step)
        else:
            return max(0, (self.start - self.stop - self.step - 1) // (-self.step))

    def __contains__(self, value):
        if self.step > 0:
            return self.start <= value < self.stop and (value - self.start) % self.step == 0
        else:
            return self.stop < value <= self.start and (self.start - value) % (-self.step) == 0

    def __repr__(self):
        if self.step == 1:
            return f"MyRange({self.start}, {self.stop})"
        return f"MyRange({self.start}, {self.stop}, {self.step})"


# Basic usage
print(list(MyRange(5)))           # [0, 1, 2, 3, 4]
print(list(MyRange(2, 10, 2)))    # [2, 4, 6, 8]
print(list(MyRange(10, 0, -2)))   # [10, 8, 6, 4, 2]

print(len(MyRange(0, 100, 3)))    # 34
print(6 in MyRange(0, 10, 2))     # True
print(7 in MyRange(0, 10, 2))     # False

# Lazy — no list ever created
total = sum(MyRange(1_000_000))
print(total)   # 499999500000

__iter__ is a generator function — it uses yield. The class doesn't store any list. Every item is computed on demand.

What You Learned in This Chapter

A for loop calls iter() to get an iterator, then next() repeatedly until StopIteration.
Any object with __iter__ and __next__ is an iterator.
A generator function uses yield to pause and resume, producing values on demand.
Generators use constant memory regardless of how many values they produce.
Generator expressions (x for x in ...) are lazy — no parentheses needed when passed alone to a function.
yield from delegates to a sub-generator — cleaner than a loop.
.send(value) passes a value into a paused generator.
itertools provides fast, composable generator tools: count, cycle, chain, islice, groupby, product, combinations, permutations.
Generators compose into lazy data pipelines that can process arbitrarily large data with constant memory.

What's Next?

Chapter 24 covers Context Managers — the with statement and how it works. You've been using with open(...) since Chapter 8. Now you'll learn to write your own context managers both with __enter__/__exit__ and with @contextlib.contextmanager, a generator-based shortcut that makes them trivial to write.

Chapter 23: Generators and Iterators

How for Loops Actually Work

Writing an Iterator Class

Generator Functions: yield

Memory Efficiency: The Big Win

Common Generator Patterns

Infinite sequences

Reading large files line by line

Filtering and transforming

Batching

Generator Expressions

yield from — Delegating to Sub-generators

Recursive flattening with yield from

Sending Values into a Generator

itertools — Your Generator Toolkit

Building a Lazy Data Pipeline

Project: Build Your Own range

What You Learned in This Chapter

What's Next?

How `for` Loops Actually Work

Generator Functions: `yield`

`yield from` — Delegating to Sub-generators

Recursive flattening with `yield from`

`itertools` — Your Generator Toolkit

Project: Build Your Own `range`