Python has supported multiple threads for long time now. In fact, there are two libraries that implement multithreading. First is the thread library, which provides low-level primitives; I would advise avoiding this module unless you really have a specific requirement to control threading activities at a low level. The other is the threading library, which provides high-level classes to deal with multiple threads and also helper classes, such as Lock, Queue, Semaphore, Event, and so on.
Thread implementations vary from system to system, but in general they can be seen as lightweight processes. Usually threads are started from within a process and share the same memory address space. Because they share the memory it is very easy for them to communicate—they can easily access the same variables. Therefore, developers must take extra care when using multiple threads—shared variables must be locked before updating, so that other threads do not get inconsistent results. This is not necessarily a bad thing, but you need to keep it in mind when using threads.
A bigger issue when using threads is the Python interpreter implementation. Because Python memory management is not thread-safe, it is not possible to (safely) run multiple native threads that interpret Python bytecode. The mechanism to stop multiple threads executing at once, called Global Interpreter Lock (GIL), ensures that only one Python interpreter thread is running at any given point in time. So although each Python thread maps to a dedicated native system thread, only one is running at a time; therefore, effectively your multithreaded application becomes single-threaded, with additional overhead imposed by GIL and thread-scheduling and context-switching mechanisms.
You may wonder why the threading library provides various locking primitives if there's only one thread running at a time. Well, the main goal for GIL is to prevent multiple threads from accessing the same Python object structures. So it protects the internal memory structures of the interpreter, but not your application data, which you have to take care of by yourself.
This situation with the locking threads is quite specific to the original Python implementation and is unlikely to change. The current Python interpreter—CPython—is heavily optimized, and rewriting it without GIL would impact the performance of those single-threaded Python applications. There are other Python implementations, such as IronPython, that do not have GIL and therefore are more efficient in using multiple CPU cores.
An alternative to the threads is to use processes in the application. The major difference between a thread and a process is that the process has its own completely isolated memory segment and stack. Therefore multiple processes cannot share the same objects, which eliminates all the issues with object data being updated by multiple threads at the same time. This comes at a price, though—there is a lot more additional overhead involved when creating a new process, because the main process needs to be copied and a new memory segment allocated. Another issue is that developers cannot reference the same object from two different processes. So processes need different methods of communication, such as queues and pipes.
Support for multiprocessing has been implemented in Python starting with version 2.6. Python has a library called multiprocessing, whose API very closely matches the threading library calls, so porting existing multithreaded applications is a relatively simple task.
So as you can see, "true" multiprocessing in Python can be achieved by running your code within the processes rather than the threads. In some cases this approach is more advantageous, because the processes do not share anything and are completely independent from each other, which allows decoupling of the processes even further and running them on different servers. Processes share data using the queue and pipe primitives, which can use TCP/IP to send data from one process to another.
Was this article helpful?