Chapter 56: Python Internals — How Python Really Works

You've written Python for 55 chapters. Now let's look under the hood.

Understanding how Python executes your code makes you a better developer. You'll write faster code, understand error messages more deeply, know why the GIL exists, and stop being surprised by Python's quirks. This chapter won't make you a CPython contributor — but it will make the language feel transparent instead of magical.

CPython — The Reference Implementation

Python the language is a specification. CPython is the most popular implementation of that specification — it's what you download from python.org. Other implementations exist:

Implementation	Language	Use case
CPython	C	The standard; most compatible
PyPy	Python + RPython	Faster (JIT compiler); ~5x speedup for loops
Jython	Java	Runs on JVM; integrates with Java libraries
MicroPython	C	Runs on microcontrollers (Raspberry Pi Pico)
GraalPy	Java	Oracle's high-performance Python on GraalVM

Everything in this chapter refers to CPython unless stated otherwise.

How Python Runs Your Code

When you type python hello.py, four things happen:

1. Lexing       -> source code   -> tokens
2. Parsing      -> tokens        -> Abstract Syntax Tree (AST)
3. Compilation  -> AST           -> bytecode
4. Execution    -> bytecode      -> results (CPython VM)

Step 1 — Lexing (Tokenisation)

The lexer breaks your source into tokens:

import tokenize, io

source = "x = 1 + 2"
tokens = list(tokenize.generate_tokens(io.StringIO(source).readline))
for tok in tokens:
    print(tok)

Output:

TokenInfo(type=1  (NAME),   string='x',   ...)
TokenInfo(type=54 (OP),     string='=',   ...)
TokenInfo(type=2  (NUMBER), string='1',   ...)
TokenInfo(type=54 (OP),     string='+',   ...)
TokenInfo(type=2  (NUMBER), string='2',   ...)

Step 2 — Parsing (AST)

The parser converts tokens into an Abstract Syntax Tree:

import ast

source = "x = 1 + 2"
tree   = ast.parse(source)
print(ast.dump(tree, indent=2))

Output:

Module(
  body=[
    Assign(
      targets=[Name(id='x')],
      value=BinOp(
        left=Constant(value=1),
        op=Add(),
        right=Constant(value=2)))],
  type_ignores=[])

The AST is a tree of node objects. Each node represents a syntactic construct: assignments, function calls, loops, conditionals.

Step 3 — Compilation (Bytecode)

The compiler converts the AST into bytecode — a sequence of simple instructions for the Python virtual machine. Use dis to see it:

import dis

def add(a, b):
    return a + b

dis.dis(add)

Output:

  2           0 RESUME          0

  3           2 LOAD_FAST       0 (a)
              4 LOAD_FAST       1 (b)
              6 BINARY_OP      0 (+)
             10 RETURN_VALUE

Each line is one bytecode instruction. The VM reads these instructions one by one and executes them.

Bytecode is cached in __pycache__/ as .pyc files so Python doesn't recompile unchanged modules:

mypackage/
└── __pycache__/
    ├── module.cpython-312.pyc
    └── utils.cpython-312.pyc

Step 4 — Execution (CPython VM)

The CPython virtual machine is a stack-based interpreter. It maintains a call stack. Each function call pushes a new frame onto the stack. Each frame has:

The bytecode being executed
A local variables dictionary
A reference to the global namespace
A reference to the enclosing scope
A data stack for intermediate values

import sys

def outer():
    def inner():
        frame = sys._getframe()
        print(f"Function:  {frame.f_code.co_name}")
        print(f"File:      {frame.f_code.co_filename}")
        print(f"Line:      {frame.f_lineno}")
        print(f"Locals:    {frame.f_locals}")
        print(f"Caller:    {frame.f_back.f_code.co_name}")
    inner()

outer()

Code Objects — The Compiled Unit

Every function, class, and module has a code object (__code__):

def greet(name: str, times: int = 1) -> str:
    message = f"Hello, {name}!"
    return message * times

code = greet.__code__
print(code.co_name)         # greet
print(code.co_varnames)     # ('name', 'times', 'message') — local variable names
print(code.co_argcount)     # 2
print(code.co_consts)       # (None, 'Hello, ', '!') — constants
print(code.co_filename)     # path to the .py file
print(code.co_firstlineno)  # 1

Code objects are immutable. They're created at compile time and never change at runtime.

The GIL — Global Interpreter Lock

The GIL is a mutex (lock) inside CPython that allows only one thread to execute Python bytecode at a time.

This means:

Multithreaded Python cannot run Python code in parallel on multiple CPU cores
But it can do I/O in parallel (the GIL is released during I/O operations)

Why does the GIL exist?

CPython's memory management uses reference counting — every object has a counter that tracks how many references point to it:

import sys

a = [1, 2, 3]
print(sys.getrefcount(a))   # 2 (a + the argument to getrefcount)

b = a
print(sys.getrefcount(a))   # 3

del b
print(sys.getrefcount(a))   # 2

When the reference count drops to 0, the object is immediately freed. This is fast and simple — but reference counts are not thread-safe. Without the GIL, two threads could simultaneously decrement the same counter, corrupting memory.

The GIL is a single lock that protects all Python objects instead of thousands of individual locks.

What the GIL means for you

import threading

# I/O-bound: GIL is released, threads run in parallel [x]
# (database queries, HTTP requests, file reads)
def fetch_url(url):
    import urllib.request
    return urllib.request.urlopen(url).read()

threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
# These run in true parallel — the GIL is released during network I/O

# CPU-bound: GIL held, threads take turns [-]
# (number crunching, image processing, sorting)
def count_primes(n):
    return sum(1 for i in range(2, n) if all(i % j for j in range(2, i)))

# Use multiprocessing for CPU-bound work instead

Python 3.13+ — The No-GIL Build

Python 3.13 introduced an experimental "free-threaded" build that removes the GIL:

# Install the free-threaded build
python3.13t   # the 't' suffix indicates free-threaded

# Check if GIL is active
import sys
print(sys._is_gil_enabled())   # False in free-threaded build

The free-threaded build is experimental in 3.13 and will stabilise in future versions. For now, most production code should still use multiprocessing for CPU-bound parallelism.

Memory Management

Reference Counting

Every Python object has a reference count. When it hits 0, the memory is freed immediately:

import ctypes

def ref_count(obj):
    """Return the reference count of an object."""
    return sys.getrefcount(obj) - 1   # subtract the getrefcount call itself

x = [1, 2, 3]
print(ref_count(x))   # 1

y = x
print(ref_count(x))   # 2

del y
print(ref_count(x))   # 1
# When x goes out of scope, count hits 0 and memory is freed

Cyclic Garbage Collector

Reference counting can't handle cycles:

import gc

class Node:
    def __init__(self, value):
        self.value = value
        self.next  = None

a = Node(1)
b = Node(2)
a.next = b
b.next = a   # cycle: a -> b -> a

del a
del b
# Reference counts are now 1 for each — never reach 0
# The cyclic GC detects and cleans this up

Python runs a cyclic garbage collector periodically. You can trigger it manually:

import gc

gc.collect()           # run all three generations
gc.collect(0)          # collect youngest generation only
print(gc.get_count())  # (gen0, gen1, gen2) object counts

Object Interning

Python reuses certain objects to save memory. Small integers (-5 to 256) and many short strings are interned — only one copy exists:

# Small integers: same object
a = 100
b = 100
print(a is b)   # True — same object

# Large integers: different objects
a = 1000
b = 1000
print(a is b)   # False (implementation-defined, often False)

# Short strings: often interned
a = "hello"
b = "hello"
print(a is b)   # True (interned)

# Strings with spaces: NOT interned
a = "hello world"
b = "hello world"
print(a is b)   # False

This is why is should only be used for None, True, and False. Use == for all other equality checks.

Memory Profiling

import tracemalloc

tracemalloc.start()

# Your code here
data = [i for i in range(100_000)]

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")

print("Top 3 memory consumers:")
for stat in top_stats[:3]:
    print(stat)

The Import System

When you write import mymodule, Python:

Checks sys.modules — if already imported, returns the cached module
Finds the module file using sys.path
Loads and compiles the .py file
Executes the module code in a new namespace
Caches the result in sys.modules

import sys

# See all currently imported modules
print(list(sys.modules.keys())[:10])

# See where Python looks for modules
print(sys.path)

# Reload a module (rarely needed)
import importlib
importlib.reload(mymodule)

`all` — Controlling What's Exported

# mymodule.py
__all__ = ["PublicClass", "public_function"]

class PublicClass:
    pass

class _PrivateClass:   # won't be imported with "from mymodule import *"
    pass

def public_function():
    pass

def _private_function():
    pass

`if name == "main"`

When Python imports a module, it sets __name__ to the module's name. When you run a file directly, __name__ is "__main__":

# mymodule.py
def useful_function():
    return 42

# This block only runs when the file is executed directly
# NOT when it's imported
if __name__ == "__main__":
    result = useful_function()
    print(f"Result: {result}")

This pattern lets a file be both a reusable module and a runnable script.

Bytecode Deep Dive

Let's trace through a slightly more complex function:

import dis

def fibonacci(n):
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(n - 1):
        a, b = b, a + b
    return b

dis.dis(fibonacci)

Output (Python 3.12):

  2           RESUME          0

  3           LOAD_FAST       0 (n)
              LOAD_CONST      1 (1)
              COMPARE_OP      1 (<=)
              POP_JUMP_IF_FALSE ...

  4           LOAD_FAST       0 (n)
              RETURN_VALUE

  5           LOAD_CONST      2 (0)
              LOAD_CONST      1 (1)
              UNPACK_SEQUENCE 2
              STORE_FAST      1 (a)
              STORE_FAST      2 (b)
...

Key instructions:

LOAD_FAST — load a local variable onto the stack
LOAD_CONST — load a constant onto the stack
COMPARE_OP — compare top two stack items
POP_JUMP_IF_FALSE — jump if top of stack is falsy
STORE_FAST — pop stack and store in a local variable
CALL — call a function
RETURN_VALUE — return top of stack to caller

Understanding bytecode explains Python performance characteristics. LOAD_FAST (local variable) is faster than LOAD_GLOBAL (global variable) — that's why moving frequently accessed globals into local variables inside tight loops speeds things up.

`slots` — Memory-Efficient Classes

By default, Python stores instance attributes in a __dict__ (a hash map):

class Regular:
    def __init__(self, x, y):
        self.x = x
        self.y = y

obj = Regular(1, 2)
print(obj.__dict__)   # {'x': 1, 'y': 2}

Each __dict__ uses ~230 bytes. For classes with millions of instances, this adds up.

__slots__ replaces the dict with fixed-size per-attribute descriptors:

class Slotted:
    __slots__ = ("x", "y")

    def __init__(self, x, y):
        self.x = x
        self.y = y

obj = Slotted(1, 2)
# obj.__dict__  <- AttributeError: no __dict__
print(obj.x)    # 1

import sys

print(sys.getsizeof(Regular(1, 2)))   # ~48 bytes + ~230 for __dict__
print(sys.getsizeof(Slotted(1, 2)))   # ~56 bytes — no __dict__ overhead

Use __slots__ when you create millions of instances of a class and memory is a concern.

`getattr` vs `getattribute`

These two look similar but behave very differently:

class Demo:
    def __init__(self):
        self.exists = "I exist"

    def __getattr__(self, name: str):
        # Called ONLY when normal attribute lookup fails
        print(f"__getattr__: {name} not found")
        return f"default_{name}"

    def __getattribute__(self, name: str):
        # Called for EVERY attribute access — be careful
        print(f"__getattribute__: accessing {name}")
        return super().__getattribute__(name)


d = Demo()
print(d.exists)        # __getattribute__ called -> "I exist"
print(d.missing)       # __getattribute__ called -> not found -> __getattr__ called

__getattr__ — safe to override. Use for proxy objects, lazy loading, dynamic attributes.

__getattribute__ — called for every attribute access. Override it only if you absolutely must — it's easy to create infinite recursion. Always call super().__getattribute__(name).

AST Manipulation — Code That Reads Code

Python's AST can be inspected and transformed:

import ast


def find_function_names(source: str) -> list[str]:
    """Extract all function names defined in source code."""
    tree  = ast.parse(source)
    names = []
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef):
            names.append(node.name)
    return names


source = """
def greet(name):
    return f"Hello, {name}"

def farewell(name):
    return f"Goodbye, {name}"

class MyClass:
    def method(self):
        pass
"""

print(find_function_names(source))
# ['greet', 'farewell', 'method']

def count_operations(source: str) -> dict[str, int]:
    """Count each type of binary operation in source."""
    tree  = ast.parse(source)
    ops   = {}
    for node in ast.walk(tree):
        if isinstance(node, ast.BinOp):
            op_name = type(node.op).__name__
            ops[op_name] = ops.get(op_name, 0) + 1
    return ops


source = "result = (a + b) * (c - d) / (e + f)"
print(count_operations(source))   # {'Add': 2, 'Mult': 1, 'Sub': 1, 'Div': 1}

AST manipulation is used by linters (ruff, flake8), formatters (black), type checkers (mypy), and code analysis tools.

What You Learned in This Chapter

CPython is the reference implementation. PyPy is faster via JIT. MicroPython runs on microcontrollers.
Python executes your code in four steps: Lex (source -> tokens) -> Parse (tokens -> AST) -> Compile (AST -> bytecode) -> Execute (bytecode -> results via CPython VM).
Use tokenize to see tokens, ast.parse() + ast.dump() to see the AST, dis.dis() to see bytecode.
Bytecode is cached in __pycache__/*.pyc files. The VM is stack-based.
Each function call creates a frame with local variables, the code object, and a data stack. sys._getframe() inspects the current frame.
Every object has a reference count. When it hits 0, the object is freed immediately. The cyclic GC handles reference cycles.
The GIL allows only one thread to execute Python bytecode at a time. I/O releases the GIL, so threads work well for I/O-bound work. Use multiprocessing for CPU-bound parallelism. Python 3.13 introduced an experimental no-GIL build.
Small integers (-5 to 256) and short strings are interned — shared single objects. Use == for equality, is only for None/True/False.
__slots__ replaces __dict__ with fixed descriptors, saving 200+ bytes per instance.
__getattr__ fires only when normal lookup fails (safe to override). __getattribute__ fires on every access (override with extreme care).
The ast module lets you parse, inspect, and transform Python source code programmatically.

What's Next?

Chapter 57 covers The Python Ecosystem Map — a tour of the major libraries and frameworks in every domain: web, data science, machine learning, automation, DevOps, finance, games, and embedded systems. Think of it as a roadmap for wherever you want to go next.

Chapter 56: Python Internals — How Python Really Works

CPython — The Reference Implementation

How Python Runs Your Code

Step 1 — Lexing (Tokenisation)

Step 2 — Parsing (AST)

Step 3 — Compilation (Bytecode)

Step 4 — Execution (CPython VM)

Code Objects — The Compiled Unit

The GIL — Global Interpreter Lock

Why does the GIL exist?

What the GIL means for you

Python 3.13+ — The No-GIL Build

Memory Management

Reference Counting

Cyclic Garbage Collector

Object Interning

Memory Profiling

The Import System

__all__ — Controlling What's Exported

if __name__ == "__main__"

Bytecode Deep Dive

__slots__ — Memory-Efficient Classes

__getattr__ vs __getattribute__

AST Manipulation — Code That Reads Code

What You Learned in This Chapter

What's Next?

`all` — Controlling What's Exported

`if name == "main"`

`slots` — Memory-Efficient Classes

`getattr` vs `getattribute`