Chapter 36: Advanced Python Internals
Most developers use Python for years without understanding how it works internally. That's fine — the abstraction holds. But understanding what happens under the hood makes you a better Python developer. You'll know why some things are fast and others slow, why the GIL exists, how memory is managed, and why certain patterns work the way they do.
This chapter goes deep. Take it slowly.
How Python Runs Your Code
When you run python script.py, several things happen:
1. Lexing — the source text is tokenized into tokens
2. Parsing — tokens are turned into an Abstract Syntax Tree (AST)
3. Compiling — the AST is compiled to bytecode (.pyc files)
4. Executing — the CPython interpreter executes the bytecode
You can inspect every stage:
# Stage 1: Tokens
import tokenize
import io
source = "x = 1 + 2"
tokens = list(tokenize.generate_tokens(io.StringIO(source).readline))
for tok in tokens:
print(tok)
# TokenInfo(type=1 (NAME), string='x', ...)
# TokenInfo(type=54 (OP), string='=', ...)
# TokenInfo(type=2 (NUMBER),string='1', ...)
# TokenInfo(type=54 (OP), string='+', ...)
# TokenInfo(type=2 (NUMBER),string='2', ...)
# Stage 2: Abstract Syntax Tree
import ast
source = "x = 1 + 2"
tree = ast.parse(source)
print(ast.dump(tree, indent=2))
Output (simplified):
Module(
body=[
Assign(
targets=[Name(id='x')],
value=BinOp(
left=Constant(value=1),
op=Add(),
right=Constant(value=2)
)
)
]
)
# Stage 3: Bytecode
import dis
def add(a, b):
return a + b
dis.dis(add)
Output:
2 0 RESUME 0
3 2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (b)
6 BINARY_OP 0 (+)
10 RETURN_VALUE
Each line is a bytecode instruction — a simple operation the Python virtual machine executes. LOAD_FAST pushes a local variable onto the stack. BINARY_OP pops two values, adds them, pushes the result. RETURN_VALUE pops the top of the stack and returns it.
Python is a stack machine — instructions operate on a stack of values.
Inspecting bytecode of any function
import dis
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
dis.dis(fibonacci)
print()
print(f"Constants: {fibonacci.__code__.co_consts}")
print(f"Names: {fibonacci.__code__.co_names}")
print(f"Varnames: {fibonacci.__code__.co_varnames}")
print(f"Stack size:{fibonacci.__code__.co_stacksize}")
The __code__ object contains everything about the compiled function: its bytecode (co_code), local variable names, constants, free variables, and more.
The CPython Virtual Machine
CPython is the reference implementation of Python — the one you download from python.org. It's written in C.
The core execution loop is in ceval.c — a giant switch statement that dispatches on bytecode opcodes. For every LOAD_FAST, BINARY_OP, CALL_FUNCTION, there's a case in that switch that executes the corresponding C code.
This is why Python is slower than compiled languages like C or Go — each Python operation involves multiple C operations, plus the overhead of the dispatch loop. The GIL (covered next) adds more overhead for multi-threaded programs.
Code objects
Every function, class, and module has a code object (types.CodeType):
import types
def outer():
x = 10
def inner():
return x
return inner
print(type(outer.__code__)) # <class 'code'>
# Code object attributes
code = outer.__code__
print(f"co_name: {code.co_name}") # outer
print(f"co_filename: {code.co_filename}") # script.py
print(f"co_firstlineno:{code.co_firstlineno}")# 1
print(f"co_varnames: {code.co_varnames}") # ('x', 'inner')
print(f"co_freevars: {code.co_freevars}") # () — outer has no free vars
print(f"co_cellvars: {code.co_cellvars}") # ('x',) — x is captured by inner
inner_code = outer.__code__.co_consts[1] # the inner function's code object
print(f"inner freevars: {inner_code.co_freevars}") # ('x',) — captured from outer
Frame objects
When a function is called, Python creates a frame object — a runtime snapshot of the execution state:
import sys
def show_frame():
frame = sys._getframe() # the current frame
print(f"Function: {frame.f_code.co_name}")
print(f"File: {frame.f_code.co_filename}")
print(f"Line: {frame.f_lineno}")
print(f"Locals: {frame.f_locals}")
print(f"Globals: {list(frame.f_globals.keys())[:5]}")
caller = frame.f_back # the frame that called us
if caller:
print(f"Called by: {caller.f_code.co_name}")
def main():
x = 42
show_frame()
main()
The call stack is a linked list of frame objects — each frame has f_back pointing to its caller.
# Walk the entire call stack
import traceback
def deep():
for frame_info in traceback.extract_stack():
print(f" {frame_info.filename}:{frame_info.lineno} in {frame_info.name}")
def middle():
deep()
def top():
middle()
top()
The GIL — Global Interpreter Lock in Depth
The GIL is a mutex in CPython that ensures only one thread executes Python bytecode at a time. It's the most controversial aspect of CPython.
Why the GIL exists
CPython's memory management is not thread-safe by design. It uses reference counting — every object has a counter of how many references point to it. When the counter reaches zero, the object is freed. Without the GIL, two threads could simultaneously modify an object's reference count, causing corruption.
The GIL is a coarse solution: instead of fine-grained locks on every object, one global lock protects everything. This is simpler and faster for single-threaded programs — and most programs are mostly single-threaded.
The GIL releases during I/O
The GIL is released before any I/O operation and re-acquired after. This is why threading helps for I/O-bound work:
import threading
import time
def io_bound():
time.sleep(1) # GIL is released during sleep
def cpu_bound():
sum(range(10_000_000)) # GIL is held throughout
# I/O — threads help (GIL releases during sleep)
start = time.perf_counter()
threads = [threading.Thread(target=io_bound) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(f"I/O 4 threads: {time.perf_counter() - start:.2f}s") # ~1.0s
# CPU — threads don't help (GIL prevents true parallelism)
start = time.perf_counter()
threads = [threading.Thread(target=cpu_bound) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(f"CPU 4 threads: {time.perf_counter() - start:.2f}s") # ~4.0s (no speedup)
The GIL check interval
The GIL doesn't hold forever. Python checks every sys.getswitchinterval() seconds (default: 5ms) whether another thread wants the GIL. At that point, the current thread drops it, and another thread can acquire it.
import sys
print(sys.getswitchinterval()) # 0.005 (5 milliseconds)
sys.setswitchinterval(0.01) # change to 10ms
Python 3.13 — the no-GIL build
Python 3.13 introduced an experimental no-GIL build (--disable-gil). True multi-threading for CPU-bound work is coming to CPython — but the default build still has the GIL for backwards compatibility. Watch this space.
Memory Management
Reference counting
Every Python object has a reference count — an integer tracking how many names, containers, or other objects hold a reference to it.
import sys
x = [1, 2, 3]
print(sys.getrefcount(x)) # 2 (x itself, and the argument to getrefcount)
y = x
print(sys.getrefcount(x)) # 3 (x, y, and getrefcount's argument)
del y
print(sys.getrefcount(x)) # 2
When the reference count hits zero, CPython immediately deallocates the object — no waiting for a garbage collection cycle.
class MyObject:
def __init__(self, name):
self.name = name
print(f"Created {name}")
def __del__(self):
print(f"Destroyed {self.name}")
obj = MyObject("A")
print("About to delete")
del obj # reference count -> 0, __del__ called immediately
print("After delete")
Output:
Created A
About to delete
Destroyed A <- immediate
After delete
Cyclic garbage collector
Reference counting has one weakness: reference cycles. Two objects that reference each other will never reach zero, even when no outside code references either.
class Node:
def __init__(self, value):
self.value = value
self.next = None
# Create a cycle
a = Node(1)
b = Node(2)
a.next = b
b.next = a # cycle: a -> b -> a
del a
del b
# Reference counts are now 1 (each object references the other)
# Not zero — they're never freed by reference counting alone
Python's cyclic garbage collector (in the gc module) detects and breaks these cycles:
import gc
gc.enable() # on by default
gc.collect() # force a collection
print(gc.get_count()) # (generation 0, 1, 2 counts)
print(gc.get_threshold()) # (700, 10, 10) — collection thresholds
# Inspect what's tracked
gc.set_debug(gc.DEBUG_LEAK) # print objects that can't be freed
The cyclic GC runs automatically in three generations. Most objects are young (generation 0) and die young. Objects that survive multiple collections move to older generations and are collected less frequently.
Object internment and small integer caching
CPython caches small integers (-5 to 256) and short strings. This is why is comparisons work for small integers but not large ones:
a = 256
b = 256
print(a is b) # True — same cached object
a = 257
b = 257
print(a is b) # False (may vary) — different objects
# String interning
a = "hello"
b = "hello"
print(a is b) # True — interned (short strings matching identifier rules)
a = "hello world"
b = "hello world"
print(a is b) # False (may vary) — not interned
This is why you must always use == for value comparison, never is.
Memory pools — pymalloc
CPython doesn't call malloc/free directly for small objects. It uses its own memory allocator (pymalloc) that maintains pools of fixed-size blocks. This dramatically reduces memory fragmentation and allocation overhead for the many small, short-lived objects Python creates.
# See memory usage
import tracemalloc
tracemalloc.start()
# Your code here
data = [i ** 2 for i in range(100_000)]
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
print("Top 5 memory allocations:")
for stat in top_stats[:5]:
print(f" {stat}")
tracemalloc.stop()
The Descriptor Protocol
Descriptors are one of Python's most powerful mechanisms. They're how @property, @classmethod, @staticmethod, and super() work internally.
A descriptor is any object that defines __get__, __set__, or __delete__. When you access an attribute on an object, Python checks whether the class has a descriptor with that name.
class Descriptor:
def __set_name__(self, owner, name):
self.name = name # called when class is defined
def __get__(self, obj, objtype=None):
if obj is None:
return self # accessed from the class, not an instance
return obj.__dict__.get(self.name)
def __set__(self, obj, value):
obj.__dict__[self.name] = value
def __delete__(self, obj):
del obj.__dict__[self.name]
class MyClass:
x = Descriptor()
y = Descriptor()
obj = MyClass()
obj.x = 42
print(obj.x) # 42 — goes through Descriptor.__get__
del obj.x # goes through Descriptor.__delete__
Building a typed attribute descriptor
class TypedAttribute:
"""Descriptor that enforces a type constraint."""
def __set_name__(self, owner, name):
self.public_name = name
self.private_name = f"_{name}"
def __init__(self, expected_type, default=None):
self.expected_type = expected_type
self.default = default
def __get__(self, obj, objtype=None):
if obj is None:
return self
return getattr(obj, self.private_name, self.default)
def __set__(self, obj, value):
if not isinstance(value, self.expected_type):
raise TypeError(
f"{self.public_name} must be {self.expected_type.__name__}, "
f"got {type(value).__name__}"
)
setattr(obj, self.private_name, value)
class Person:
name = TypedAttribute(str)
age = TypedAttribute(int, default=0)
score = TypedAttribute(float, default=0.0)
p = Person()
p.name = "Alice"
p.age = 30
p.score = 95.5
print(p.name, p.age, p.score) # Alice 30 95.5
try:
p.age = "thirty" # TypeError: age must be int, got str
except TypeError as e:
print(e)
How @property is implemented
property is a built-in descriptor. Here's a simplified version:
class property_:
"""A simplified version of the built-in property descriptor."""
def __init__(self, fget=None, fset=None, fdel=None, doc=None):
self.fget = fget
self.fset = fset
self.fdel = fdel
self.__doc__ = doc or (fget.__doc__ if fget else None)
def __get__(self, obj, objtype=None):
if obj is None:
return self
if self.fget is None:
raise AttributeError("unreadable attribute")
return self.fget(obj)
def __set__(self, obj, value):
if self.fset is None:
raise AttributeError("can't set attribute")
self.fset(obj, value)
def __delete__(self, obj):
if self.fdel is None:
raise AttributeError("can't delete attribute")
self.fdel(obj)
def getter(self, fget):
return type(self)(fget, self.fset, self.fdel, self.__doc__)
def setter(self, fset):
return type(self)(self.fget, fset, self.fdel, self.__doc__)
def deleter(self, fdel):
return type(self)(self.fget, self.fset, fdel, self.__doc__)
class Temperature:
def __init__(self, celsius=0.0):
self._celsius = celsius
@property_()
def celsius(self):
return self._celsius
@celsius.setter
def celsius(self, value):
if value < -273.15:
raise ValueError("Temperature below absolute zero")
self._celsius = value
@property_()
def fahrenheit(self):
return self._celsius * 9 / 5 + 32
t = Temperature(100)
print(t.celsius) # 100
print(t.fahrenheit) # 212.0
t.celsius = -5
print(t.celsius) # -5
Metaclasses
A metaclass is a class whose instances are classes. The default metaclass for all Python classes is type. When you write class Foo:, Python calls type("Foo", (object,), {...}) under the hood.
# These are equivalent
class MyClass:
x = 42
MyClass = type("MyClass", (object,), {"x": 42})
print(type(MyClass)) # <class 'type'>
print(MyClass.x) # 42
Writing a metaclass
class SingletonMeta(type):
"""Metaclass that makes a class a Singleton."""
_instances: dict = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class Database(metaclass=SingletonMeta):
def __init__(self, url: str):
self.url = url
print(f"Connected to {url}")
db1 = Database("postgresql://localhost/myapp") # Connected to ...
db2 = Database("postgresql://localhost/other") # NOT printed — returns existing
print(db1 is db2) # True — same object
print(db2.url) # postgresql://localhost/myapp
Metaclass __new__ and __init__
class ValidatedMeta(type):
"""Metaclass that validates class definitions at creation time."""
def __new__(mcs, name, bases, namespace):
# Check that every method has type hints
for attr_name, value in namespace.items():
if callable(value) and not attr_name.startswith("_"):
if not hasattr(value, "__annotations__"):
raise TypeError(
f"{name}.{attr_name} must have type annotations"
)
return super().__new__(mcs, name, bases, namespace)
class StrictAPI(metaclass=ValidatedMeta):
def get_user(self, user_id: int) -> dict: # OK — has annotations
return {}
# class BadAPI(metaclass=ValidatedMeta):
# def fetch(): # TypeError — no annotations
# pass
__init_subclass__ — the modern alternative to metaclasses
Metaclasses are powerful but complex. For most use cases, __init_subclass__ is simpler:
class Plugin:
_registry: dict[str, type] = {}
def __init_subclass__(cls, name: str = "", **kwargs):
super().__init_subclass__(**kwargs)
if name:
Plugin._registry[name] = cls
print(f"Registered plugin: {name!r}")
class JSONPlugin(Plugin, name="json"):
def serialize(self, data):
import json
return json.dumps(data)
class CSVPlugin(Plugin, name="csv"):
def serialize(self, data):
return ",".join(str(v) for v in data)
print(Plugin._registry)
# {'json': <class 'JSONPlugin'>, 'csv': <class 'CSVPlugin'>}
plugin = Plugin._registry["json"]()
print(plugin.serialize({"hello": "world"}))
__init_subclass__ is called automatically when a class inherits from Plugin. It's the preferred pattern for plugin systems, registries, and framework hooks.
__slots__ Internals
We used __slots__ in Chapter 30 for memory reduction. Now let's understand why it works.
Without __slots__, every instance has a __dict__ — a hash table that can hold any attribute. A dict takes at least 232 bytes.
With __slots__, Python creates slot descriptors in the class — fixed memory offsets in the instance structure. No __dict__ is created. The instance is a small C struct with named slots.
import sys
class WithDict:
def __init__(self, x, y):
self.x = x
self.y = y
class WithSlots:
__slots__ = ("x", "y")
def __init__(self, x, y):
self.x = x
self.y = y
d = WithDict(1, 2)
s = WithSlots(1, 2)
print(f"WithDict: {sys.getsizeof(d)} + {sys.getsizeof(d.__dict__)} = "
f"{sys.getsizeof(d) + sys.getsizeof(d.__dict__)} bytes")
print(f"WithSlots: {sys.getsizeof(s)} bytes")
# WithDict: 48 + 232 = 280 bytes
# WithSlots: 56 bytes — 5x smaller
# Verify: WithSlots has no __dict__
print(hasattr(d, "__dict__")) # True
print(hasattr(s, "__dict__")) # False
__getattr__ and __getattribute__
These two methods control attribute access:
__getattribute__— called for every attribute access, even if the attribute exists__getattr__— called only when normal lookup fails
class TrackedAccess:
"""Log every attribute access."""
def __init__(self):
object.__setattr__(self, "_accessed", set())
def __getattribute__(self, name):
if not name.startswith("_"):
accessed = object.__getattribute__(self, "_accessed")
accessed.add(name)
return object.__getattribute__(self, name)
def accessed_attrs(self):
return self._accessed
class LazyLoader:
"""Load attributes from a dict only when accessed."""
def __init__(self, data: dict):
self._data = data
def __getattr__(self, name):
# Only called when normal lookup fails
if name in self._data:
value = self._data[name]
setattr(self, name, value) # cache it so __getattr__ isn't called again
return value
raise AttributeError(f"{type(self).__name__!r} has no attribute {name!r}")
config = LazyLoader({
"host": "localhost",
"port": 5432,
"database": "myapp",
})
print(config.host) # localhost — loaded from _data, cached on instance
print(config.port) # 5432
print(config.database) # myapp
AST Manipulation
You can parse, inspect, and even modify Python's AST — the internal representation of your code before compilation:
import ast
source = """
def greet(name):
return f"Hello, {name}!"
"""
tree = ast.parse(source)
# Walk all nodes
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
print(f"Function: {node.name}, line {node.lineno}")
elif isinstance(node, ast.Return):
print(f"Return statement at line {node.lineno}")
# Transform the AST — add a print to every function
class AddLogging(ast.NodeTransformer):
def visit_FunctionDef(self, node):
self.generic_visit(node) # process children first
# Add: print(f"Calling {node.name}")
log_stmt = ast.Expr(
value=ast.Call(
func=ast.Name(id="print", ctx=ast.Load()),
args=[ast.Constant(value=f"Calling {node.name}")],
keywords=[],
)
)
node.body.insert(0, log_stmt)
return node
transformed = AddLogging().visit(tree)
ast.fix_missing_locations(transformed)
code = compile(transformed, "<ast>", "exec")
exec(code)
greet("Alice")
# Calling greet <- injected by AST transformation
# Hello, Alice! <- original behavior
This is how many Python tools work: pytest rewrites assert statements, dataclasses generates __init__ code, attrs builds constructors — all by manipulating the AST.
sys Module Internals
import sys
# Python implementation details
print(sys.implementation.name) # cpython
print(sys.version) # 3.12.0 (...)
print(sys.version_info) # sys.version_info(major=3, minor=12, ...)
# Memory
print(sys.getsizeof([])) # 56 bytes (empty list)
print(sys.getsizeof({})) # 64 bytes (empty dict)
print(sys.getsizeof("")) # 49 bytes (empty string)
print(sys.getsizeof(0)) # 28 bytes (int)
# Recursion
print(sys.getrecursionlimit()) # 1000
sys.setrecursionlimit(5000) # increase for deep recursion
# Reference counting
x = [1, 2, 3]
print(sys.getrefcount(x)) # 2
# All loaded modules
print(list(sys.modules.keys())[:10])
# Platform
print(sys.platform) # 'win32', 'darwin', 'linux'
# Intern strings (force string to be shared)
a = sys.intern("hello world")
b = sys.intern("hello world")
print(a is b) # True — both refer to the same object
Project: A Simple Bytecode Analyzer
"""
bytecode_analyzer.py — Inspect and report on Python bytecode.
"""
import dis
import sys
import types
from collections import Counter
def analyze(func) -> dict:
"""
Analyze a function's bytecode and return a report.
Args:
func: Any Python function or method.
Returns:
A dict with keys: name, instructions, opcode_counts,
constants, locals, stack_size, complexity_estimate.
"""
code = func.__code__
instrs= list(dis.get_instructions(code))
opcode_counts = Counter(i.opname for i in instrs)
# Estimate cyclomatic complexity: 1 + number of branches
branch_ops = {"POP_JUMP_IF_TRUE", "POP_JUMP_IF_FALSE",
"JUMP_IF_TRUE_OR_POP", "JUMP_IF_FALSE_OR_POP",
"FOR_ITER"}
complexity = 1 + sum(1 for i in instrs if i.opname in branch_ops)
return {
"name": code.co_name,
"file": code.co_filename,
"line": code.co_firstlineno,
"instruction_count": len(instrs),
"opcode_counts": dict(opcode_counts.most_common(5)),
"constants": [c for c in code.co_consts if c is not None],
"local_variables": list(code.co_varnames),
"free_variables": list(code.co_freevars),
"stack_size": code.co_stacksize,
"complexity_estimate":complexity,
}
def print_report(func) -> None:
"""Print a human-readable bytecode analysis report."""
report = analyze(func)
print(f"\n{'━'*50}")
print(f" Function: {report['name']}")
print(f" File: {report['file']}:{report['line']}")
print(f"{'━'*50}")
print(f" Instructions: {report['instruction_count']}")
print(f" Stack depth: {report['stack_size']}")
print(f" Complexity est.: {report['complexity_estimate']}")
if report["constants"]:
print(f" Constants: {report['constants']}")
if report["local_variables"]:
print(f" Locals: {report['local_variables']}")
if report["free_variables"]:
print(f" Free vars: {report['free_variables']}")
print(f"\n Top opcodes:")
for opname, count in report["opcode_counts"].items():
bar = "█" * count
print(f" {opname:<30} {count:>3} {bar}")
print()
print(" Disassembly:")
dis.dis(func)
# ── Demo ──────────────────────────────────────────────────────────────────────
def simple(x, y):
return x + y
def with_branch(x):
if x > 0:
return "positive"
elif x < 0:
return "negative"
else:
return "zero"
def with_loop(items):
total = 0
for item in items:
if item > 0:
total += item
return total
def with_closure():
count = 0
def counter():
nonlocal count
count += 1
return count
return counter
if __name__ == "__main__":
for func in [simple, with_branch, with_loop]:
print_report(func)
What You Learned in This Chapter
- Python source goes through lexing -> parsing -> compilation -> execution.
tokenize,ast, anddislet you inspect each stage. - CPython is a stack machine — bytecode instructions push and pop values on a stack.
- Every function has a code object (
__code__) containing bytecode, constants, variable names, and free variables. - Every function call creates a frame object (
f_locals,f_globals,f_back,f_lineno) — the call stack is a linked list of frames. - The GIL is a coarse mutex protecting CPython's non-thread-safe reference counting. It releases during I/O, preventing true CPU parallelism in threads.
- Reference counting frees objects immediately when their count reaches zero. The cyclic GC handles reference cycles in three generations.
- Small integers (-5 to 256) and short strings are interned — always the same object. Use
==, neveris, for value comparison. tracemallocmeasures memory allocation per line.- A descriptor defines
__get__,__set__, or__delete__.property,classmethod, andstaticmethodare all descriptors. - A metaclass is a class whose instances are classes.
typeis the default metaclass. Use__init_subclass__for most plugin/registry patterns instead. __slots__replaces__dict__with fixed C struct slots — 4-5x smaller per instance.__getattribute__intercepts every access;__getattr__only fires when normal lookup fails.- Python's AST is manipulable — parse, inspect, transform, compile, and execute.
What's Next?
Chapter 37 covers Metaprogramming — class factories, dynamic class creation, __class_getitem__, runtime code generation, and exec/eval. You've seen the internals; now you'll use them to write code that writes code.