CPython source code Part 1

The internals of the painless python

Mehedee Hassan
5 min readFeb 13, 2022

INTRODUCTION

In this series, I will talk about the internals of CPython (the default python interpreter) using GitHub source code. I will try to point out a file from the GitHub repository of CPython and will try to explain block of codes which is easy to understand. Nothing much is required to understand except C and Python.

I will be using Python 3.7. There is no particular reason for using this. I have started my python journey from 3.7, that’s all.

Fig 1.1: Cpython source tree

And I am also considering that you can navigate through source code and like to read code. Otherwise, you will find it boring.

In this article, we will try to understand simple binary add operations.

Let’s Fire Up

We will start with the ceval.c. You can always start with anything, but ceval.c , what you can say is the easy path to start.

From here on, I will try to build up a story that will not include everything but will show a complete path from start to end.

PYOBJECT

Most of the time in python source code you will see the term PyObject (_object).

Fig 1.2 :PyObject definition

Nothing is actually declared to be a PyObject, but every pointer to a Python object can be cast to a PyObject. This is inheritance built by hand. — CPython Source Code Comment

It is just a plain old c struct. You can find the definition in object.c on line 105. This header file is included in the middle of Python.h header file which is then included in the ceval.c.

It is written in the Python.h header ,” No C extension should directly include any header file that is inside python.h , instead, they should include python.h”.

In PyObject, we can see three member variables,

  1. _PyObject_HEAD_EXTRA, is just a doubly linked list that points to the next and previous object[PyObject] in the python heap memory. I will explain python memory management later in this series. [you can read about the doubly linked list here :geeksforgeeks ]. Here _object is PyObject

2. Py_ssize_t, resolves a few issues as an unsigned version of C size_t which can store the maximum size of a theoretically possible object of any type (including array)

3. PyTypeObject is a structure , details definition of this structure in here

BACK TO THE CEVAL.C

So we will start from here (line 1064). You can consider python compiler using a stack and pushing and popping instructions to and from the stack.

Depending on the operation in this switch “opcode” ,CPython searches the targeted operation and performs it.

We will see a simple example about the binary add operation on line 1272.

Fig 2.1:Binary add operation

The TARGET is C macro shown below. It has replaced the “case statement” inside switch block and the opcodes are defined in opcode_targets.h file.

Fig 2.2: TARGET AND DISPATCH

Let’s talk about the binary add operation which performs z= b+c .

  • The first 2 lines reference of PyObject from the top of the stack which is b and c [fig 2.1].
  • In Fig 2.1, if-block [line 1282], we can see “PyUnicode_CheckExact” which is defined here [unicodeobject.h] is a type-checking method for characters.
  • We are interested in numerical addition ,So we will go to the else-block started at line 1287 [fig 2.1].
  • In the else block we can find PyNumber_Add function which is defined in abstract.c.
Fig 2.3 : PyNumber_Add funciton
  • We can see it calls a function BINARY_OP1 with a nb_add value at line 1072 [fig 2.3].
Fig 2.3.1 :BINARY_OP1 function
  • We can find the BINARY_OP1 defined in abstract.c file .It is a C macro. The macro calls a function binary_op1 [fig 2.3.1].
  • binary_op1 function definition shown below [fig 2.3.2].
Fig 2.3.2 :BINARY_OP1 function
  • In Figure 2.3.2 , line 863, we can see “ NB_BINOP(Py_TYPE(v)->tp_as_number, op_slot);” which refers to the tp_as_number member of PyLong_Type of longobject.c
Fig 2.3.3: tp_as_number
fig 2.3.4: long_as_number
  • In line 5390 of figure 2.3.3 ,we can see it holds a pointer (address of a variable ) called long_as_number.
  • long_as_number shown in fig 2.3.4 holds all the Python long operation methods name including long_add [line 5343].
  • So ,in summary, binary_op1 will call long_add function from longobject.c shown in figure 2.4.
fig 2.4 : long_add function
fig 2.5: x_add function
  • In Fig 2.4 ,we can see long_add function calls number of helper functions to perform add operation. It’s bit tricky but as a summary, we can understand the x_add function call and performing the integer addition using for-loop in figure 2.5.

Conclusion

In this article, I have just wanted to give a summarized picture of how a binary add operation works inside. I hope it helped you understand the internal function calls and references to structures. Thanks for your patience.

References

[1] https://www.geeksforgeeks.org/doubly-linked-list/

[2] https://github.com/python/cpython

--

--