Chapter 56: Python Internals — How Python Really Works
You've written Python for 55 chapters. Now let's look under the hood.
Understanding how Python executes your code makes you a better developer. You'll write faster code, understand error messages more deeply, know why the GIL exists, and stop being surprised by Python's quirks. This chapter won't make you a CPython contributor — but it will make the language feel transparent instead of magical.
CPython — The Reference Implementation
Python the language is a specification. CPython is the most popular implementation of that specification — it's what you download from python.org. Other implementations exist:
| Implementation | Language | Use case |
|---|---|---|
| CPython | C | The standard; most compatible |
| PyPy | Python + RPython | Faster (JIT compiler); ~5x speedup for loops |
| Jython | Java | Runs on JVM; integrates with Java libraries |
| MicroPython | C | Runs on microcontrollers (Raspberry Pi Pico) |
| GraalPy | Java | Oracle's high-performance Python on GraalVM |
Everything in this chapter refers to CPython unless stated otherwise.
How Python Runs Your Code
When you type python hello.py, four things happen:
1. Lexing -> source code -> tokens
2. Parsing -> tokens -> Abstract Syntax Tree (AST)
3. Compilation -> AST -> bytecode
4. Execution -> bytecode -> results (CPython VM)
Step 1 — Lexing (Tokenisation)
The lexer breaks your source into tokens:
import tokenize, io
source = "x = 1 + 2"
tokens = list(tokenize.generate_tokens(io.StringIO(source).readline))
for tok in tokens:
print(tok)
Output:
TokenInfo(type=1 (NAME), string='x', ...)
TokenInfo(type=54 (OP), string='=', ...)
TokenInfo(type=2 (NUMBER), string='1', ...)
TokenInfo(type=54 (OP), string='+', ...)
TokenInfo(type=2 (NUMBER), string='2', ...)
Step 2 — Parsing (AST)
The parser converts tokens into an Abstract Syntax Tree:
import ast
source = "x = 1 + 2"
tree = ast.parse(source)
print(ast.dump(tree, indent=2))
Output:
Module(
body=[
Assign(
targets=[Name(id='x')],
value=BinOp(
left=Constant(value=1),
op=Add(),
right=Constant(value=2)))],
type_ignores=[])
The AST is a tree of node objects. Each node represents a syntactic construct: assignments, function calls, loops, conditionals.
Step 3 — Compilation (Bytecode)
The compiler converts the AST into bytecode — a sequence of simple instructions for the Python virtual machine. Use dis to see it:
import dis
def add(a, b):
return a + b
dis.dis(add)
Output:
2 0 RESUME 0
3 2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (b)
6 BINARY_OP 0 (+)
10 RETURN_VALUE
Each line is one bytecode instruction. The VM reads these instructions one by one and executes them.
Bytecode is cached in __pycache__/ as .pyc files so Python doesn't recompile unchanged modules:
mypackage/
└── __pycache__/
├── module.cpython-312.pyc
└── utils.cpython-312.pyc
Step 4 — Execution (CPython VM)
The CPython virtual machine is a stack-based interpreter. It maintains a call stack. Each function call pushes a new frame onto the stack. Each frame has:
- The bytecode being executed
- A local variables dictionary
- A reference to the global namespace
- A reference to the enclosing scope
- A data stack for intermediate values
import sys
def outer():
def inner():
frame = sys._getframe()
print(f"Function: {frame.f_code.co_name}")
print(f"File: {frame.f_code.co_filename}")
print(f"Line: {frame.f_lineno}")
print(f"Locals: {frame.f_locals}")
print(f"Caller: {frame.f_back.f_code.co_name}")
inner()
outer()
Code Objects — The Compiled Unit
Every function, class, and module has a code object (__code__):
def greet(name: str, times: int = 1) -> str:
message = f"Hello, {name}!"
return message * times
code = greet.__code__
print(code.co_name) # greet
print(code.co_varnames) # ('name', 'times', 'message') — local variable names
print(code.co_argcount) # 2
print(code.co_consts) # (None, 'Hello, ', '!') — constants
print(code.co_filename) # path to the .py file
print(code.co_firstlineno) # 1
Code objects are immutable. They're created at compile time and never change at runtime.
The GIL — Global Interpreter Lock
The GIL is a mutex (lock) inside CPython that allows only one thread to execute Python bytecode at a time.
This means:
- Multithreaded Python cannot run Python code in parallel on multiple CPU cores
- But it can do I/O in parallel (the GIL is released during I/O operations)
Why does the GIL exist?
CPython's memory management uses reference counting — every object has a counter that tracks how many references point to it:
import sys
a = [1, 2, 3]
print(sys.getrefcount(a)) # 2 (a + the argument to getrefcount)
b = a
print(sys.getrefcount(a)) # 3
del b
print(sys.getrefcount(a)) # 2
When the reference count drops to 0, the object is immediately freed. This is fast and simple — but reference counts are not thread-safe. Without the GIL, two threads could simultaneously decrement the same counter, corrupting memory.
The GIL is a single lock that protects all Python objects instead of thousands of individual locks.
What the GIL means for you
import threading
# I/O-bound: GIL is released, threads run in parallel [x]
# (database queries, HTTP requests, file reads)
def fetch_url(url):
import urllib.request
return urllib.request.urlopen(url).read()
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
# These run in true parallel — the GIL is released during network I/O
# CPU-bound: GIL held, threads take turns [-]
# (number crunching, image processing, sorting)
def count_primes(n):
return sum(1 for i in range(2, n) if all(i % j for j in range(2, i)))
# Use multiprocessing for CPU-bound work instead
Python 3.13+ — The No-GIL Build
Python 3.13 introduced an experimental "free-threaded" build that removes the GIL:
# Install the free-threaded build
python3.13t # the 't' suffix indicates free-threaded
# Check if GIL is active
import sys
print(sys._is_gil_enabled()) # False in free-threaded build
The free-threaded build is experimental in 3.13 and will stabilise in future versions. For now, most production code should still use multiprocessing for CPU-bound parallelism.
Memory Management
Reference Counting
Every Python object has a reference count. When it hits 0, the memory is freed immediately:
import ctypes
def ref_count(obj):
"""Return the reference count of an object."""
return sys.getrefcount(obj) - 1 # subtract the getrefcount call itself
x = [1, 2, 3]
print(ref_count(x)) # 1
y = x
print(ref_count(x)) # 2
del y
print(ref_count(x)) # 1
# When x goes out of scope, count hits 0 and memory is freed
Cyclic Garbage Collector
Reference counting can't handle cycles:
import gc
class Node:
def __init__(self, value):
self.value = value
self.next = None
a = Node(1)
b = Node(2)
a.next = b
b.next = a # cycle: a -> b -> a
del a
del b
# Reference counts are now 1 for each — never reach 0
# The cyclic GC detects and cleans this up
Python runs a cyclic garbage collector periodically. You can trigger it manually:
import gc
gc.collect() # run all three generations
gc.collect(0) # collect youngest generation only
print(gc.get_count()) # (gen0, gen1, gen2) object counts
Object Interning
Python reuses certain objects to save memory. Small integers (-5 to 256) and many short strings are interned — only one copy exists:
# Small integers: same object
a = 100
b = 100
print(a is b) # True — same object
# Large integers: different objects
a = 1000
b = 1000
print(a is b) # False (implementation-defined, often False)
# Short strings: often interned
a = "hello"
b = "hello"
print(a is b) # True (interned)
# Strings with spaces: NOT interned
a = "hello world"
b = "hello world"
print(a is b) # False
This is why is should only be used for None, True, and False. Use == for all other equality checks.
Memory Profiling
import tracemalloc
tracemalloc.start()
# Your code here
data = [i for i in range(100_000)]
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
print("Top 3 memory consumers:")
for stat in top_stats[:3]:
print(stat)
The Import System
When you write import mymodule, Python:
- Checks
sys.modules— if already imported, returns the cached module - Finds the module file using
sys.path - Loads and compiles the
.pyfile - Executes the module code in a new namespace
- Caches the result in
sys.modules
import sys
# See all currently imported modules
print(list(sys.modules.keys())[:10])
# See where Python looks for modules
print(sys.path)
# Reload a module (rarely needed)
import importlib
importlib.reload(mymodule)
__all__ — Controlling What's Exported
# mymodule.py
__all__ = ["PublicClass", "public_function"]
class PublicClass:
pass
class _PrivateClass: # won't be imported with "from mymodule import *"
pass
def public_function():
pass
def _private_function():
pass
if __name__ == "__main__"
When Python imports a module, it sets __name__ to the module's name. When you run a file directly, __name__ is "__main__":
# mymodule.py
def useful_function():
return 42
# This block only runs when the file is executed directly
# NOT when it's imported
if __name__ == "__main__":
result = useful_function()
print(f"Result: {result}")
This pattern lets a file be both a reusable module and a runnable script.
Bytecode Deep Dive
Let's trace through a slightly more complex function:
import dis
def fibonacci(n):
if n <= 1:
return n
a, b = 0, 1
for _ in range(n - 1):
a, b = b, a + b
return b
dis.dis(fibonacci)
Output (Python 3.12):
2 RESUME 0
3 LOAD_FAST 0 (n)
LOAD_CONST 1 (1)
COMPARE_OP 1 (<=)
POP_JUMP_IF_FALSE ...
4 LOAD_FAST 0 (n)
RETURN_VALUE
5 LOAD_CONST 2 (0)
LOAD_CONST 1 (1)
UNPACK_SEQUENCE 2
STORE_FAST 1 (a)
STORE_FAST 2 (b)
...
Key instructions:
LOAD_FAST— load a local variable onto the stackLOAD_CONST— load a constant onto the stackCOMPARE_OP— compare top two stack itemsPOP_JUMP_IF_FALSE— jump if top of stack is falsySTORE_FAST— pop stack and store in a local variableCALL— call a functionRETURN_VALUE— return top of stack to caller
Understanding bytecode explains Python performance characteristics. LOAD_FAST (local variable) is faster than LOAD_GLOBAL (global variable) — that's why moving frequently accessed globals into local variables inside tight loops speeds things up.
__slots__ — Memory-Efficient Classes
By default, Python stores instance attributes in a __dict__ (a hash map):
class Regular:
def __init__(self, x, y):
self.x = x
self.y = y
obj = Regular(1, 2)
print(obj.__dict__) # {'x': 1, 'y': 2}
Each __dict__ uses ~230 bytes. For classes with millions of instances, this adds up.
__slots__ replaces the dict with fixed-size per-attribute descriptors:
class Slotted:
__slots__ = ("x", "y")
def __init__(self, x, y):
self.x = x
self.y = y
obj = Slotted(1, 2)
# obj.__dict__ <- AttributeError: no __dict__
print(obj.x) # 1
import sys
print(sys.getsizeof(Regular(1, 2))) # ~48 bytes + ~230 for __dict__
print(sys.getsizeof(Slotted(1, 2))) # ~56 bytes — no __dict__ overhead
Use __slots__ when you create millions of instances of a class and memory is a concern.
__getattr__ vs __getattribute__
These two look similar but behave very differently:
class Demo:
def __init__(self):
self.exists = "I exist"
def __getattr__(self, name: str):
# Called ONLY when normal attribute lookup fails
print(f"__getattr__: {name} not found")
return f"default_{name}"
def __getattribute__(self, name: str):
# Called for EVERY attribute access — be careful
print(f"__getattribute__: accessing {name}")
return super().__getattribute__(name)
d = Demo()
print(d.exists) # __getattribute__ called -> "I exist"
print(d.missing) # __getattribute__ called -> not found -> __getattr__ called
__getattr__ — safe to override. Use for proxy objects, lazy loading, dynamic attributes.
__getattribute__ — called for every attribute access. Override it only if you absolutely must — it's easy to create infinite recursion. Always call super().__getattribute__(name).
AST Manipulation — Code That Reads Code
Python's AST can be inspected and transformed:
import ast
def find_function_names(source: str) -> list[str]:
"""Extract all function names defined in source code."""
tree = ast.parse(source)
names = []
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
names.append(node.name)
return names
source = """
def greet(name):
return f"Hello, {name}"
def farewell(name):
return f"Goodbye, {name}"
class MyClass:
def method(self):
pass
"""
print(find_function_names(source))
# ['greet', 'farewell', 'method']
def count_operations(source: str) -> dict[str, int]:
"""Count each type of binary operation in source."""
tree = ast.parse(source)
ops = {}
for node in ast.walk(tree):
if isinstance(node, ast.BinOp):
op_name = type(node.op).__name__
ops[op_name] = ops.get(op_name, 0) + 1
return ops
source = "result = (a + b) * (c - d) / (e + f)"
print(count_operations(source)) # {'Add': 2, 'Mult': 1, 'Sub': 1, 'Div': 1}
AST manipulation is used by linters (ruff, flake8), formatters (black), type checkers (mypy), and code analysis tools.
What You Learned in This Chapter
- CPython is the reference implementation. PyPy is faster via JIT. MicroPython runs on microcontrollers.
- Python executes your code in four steps: Lex (source -> tokens) -> Parse (tokens -> AST) -> Compile (AST -> bytecode) -> Execute (bytecode -> results via CPython VM).
- Use
tokenizeto see tokens,ast.parse()+ast.dump()to see the AST,dis.dis()to see bytecode. - Bytecode is cached in
__pycache__/*.pycfiles. The VM is stack-based. - Each function call creates a frame with local variables, the code object, and a data stack.
sys._getframe()inspects the current frame. - Every object has a reference count. When it hits 0, the object is freed immediately. The cyclic GC handles reference cycles.
- The GIL allows only one thread to execute Python bytecode at a time. I/O releases the GIL, so threads work well for I/O-bound work. Use
multiprocessingfor CPU-bound parallelism. Python 3.13 introduced an experimental no-GIL build. - Small integers (-5 to 256) and short strings are interned — shared single objects. Use
==for equality,isonly forNone/True/False. __slots__replaces__dict__with fixed descriptors, saving 200+ bytes per instance.__getattr__fires only when normal lookup fails (safe to override).__getattribute__fires on every access (override with extreme care).- The
astmodule lets you parse, inspect, and transform Python source code programmatically.
What's Next?
Chapter 57 covers The Python Ecosystem Map — a tour of the major libraries and frameworks in every domain: web, data science, machine learning, automation, DevOps, finance, games, and embedded systems. Think of it as a roadmap for wherever you want to go next.