Parallel and Distributed Computing
This lesson will introduce the concepts of parallel and distributed computing, including how to design algorithms that can be executed in parallel and how to manage communication and synchronization between processes.
Parallel and Distributed Computing
Each section explains one strategy and lets you run real Python to see it in action. Check your understanding at the bottom once youβve read all three.
A single processor handles one task at a time. Task 2 cannot start until Task 1 finishes. Total time = sum of every task. Simple and predictable β but it doesnβt scale.
Code Runner Challenge
Run serial processing and observe the timing. Try changing the unit values β notice total time always equals the sum.
Multiple cores inside the same machine work on different tasks simultaneously. Because cores share memory directly there is zero network overhead. Total time β slowest single task β not the sum.
Task 1"] Q --> C2["π₯οΈ Core 2
Task 2"] Q --> C3["π₯οΈ Core 3
Task 3"] Q --> C4["π₯οΈ Core 4
Task 4"] C1 --> R(["β Result"]) C2 --> R C3 --> R C4 --> R
Code Runner Challenge
Run parallel processing with ThreadPoolExecutor. Try making Task B much heavier β notice other tasks finish while it runs.
Separate machines β called nodes β collaborate over a network. Each node works independently, then sends its result back. The network handshake adds latency per task, but you gain scale no single machine can match. Distributed is essential when the problem is too large for one machine.
Task 1"] Q --> W2["βοΈ Node 2
Task 2"] Q --> W3["βοΈ Node 3
Task 3"] Q --> W4["βοΈ Node 4
Task 4"] W1 -->|"π‘ network"| R(["β Result"]) W2 -->|"π‘ network"| R W3 -->|"π‘ network"| R W4 -->|"π‘ network"| R
Code Runner Challenge
Run the distributed simulation with asyncio. Try changing NETWORK_MS to 500 β watch overhead dominate when tasks are light.
Check Your Understanding
Now that youβve seen all three paradigms β answer without looking back!
β A bakery has one oven and bakes one tray of cookies at a time. Tray 2 goes in only after tray 1 comes out. Which computing paradigm does this model?
β You are rendering a 3D film. Every frame is completely independent, and your workstation has a 16-core CPU. Which strategy cuts render time most?
β Training a state-of-the-art AI model requires 40 TB of data β far more than the RAM of any single machine. Which approach is necessary?
Scenario Challenges
β Scenario A β You have a 200 GB dataset to process on a machine with 12 cores. All tasks are fully independent.
β Scenario B β A genomics pipeline must search a 500 TB DNA database. The largest single server available has 2 TB of RAM.
β Scenario C β You need to run a single, indivisible 30-second calculation that cannot be broken into sub-tasks at all.
Scenario D β Code Challenge
The script below processes 8 images one at a time β itβs too slow. Convert the serial loop to run in parallel using ThreadPoolExecutor. The built-in checker will measure your speedup and tell you if you passed.
π‘ Hint 1 β What kind of problem is this?
π‘ Hint 2 β Which module and class?
concurrent.futures module is already imported at the top.
The class you want is ThreadPoolExecutor β it manages a pool
of worker threads and distributes tasks across them automatically.
from concurrent.futures import ThreadPoolExecutor
π‘ Hint 3 β How do you create a pool?
with block so the pool is cleaned up automatically when done.
max_workers controls how many threads run at once β set it to the number
of tasks or your CPU core count, whichever is smaller.
with ThreadPoolExecutor(max_workers=4) as pool:
# submit work here
π‘ Hint 4 β How do you map tasks to the pool?
pool.map(fn, iterable) works like the built-in map() β
it calls fn once for each item in the iterable, but runs them in parallel
across the worker threads. Wrap it in list() to collect all results.
results = list(pool.map(compress_image, images))Notice that
compress_image already takes a single argument (image_id),
so it plugs straight into pool.map without any changes.
π‘ Hint 5 β What exactly needs to change?
for loop (including the results = [] line) with a
single with block. Everything else β the function definition, the
images list, and the checker β stays exactly as-is.
# BEFORE
results = []
for img_id in images:
results.append(compress_image(img_id))
# AFTER β swap these four lines for two
π Hint 6 β Full solution (try on your own first!)
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=4) as pool:
results = list(pool.map(compress_image, images))
elapsed = time.perf_counter() - start
That's it β two lines of parallel code replace four lines of serial code,
and the checker at the bottom does the rest.
Code Runner Challenge
Convert the serial `for` loop to parallel using `ThreadPoolExecutor`. The checker at the bottom will automatically detect whether your code runs faster than the serial baseline and report your speedup.
Simulating the Difference
Build your own task queue, then watch all three approaches race.
- Click a task to increase its weight (1β2ββ¦β7β1 dots)
- Shift+click or drag to multi-select β then click any selected task to cycle all at once
- Deselect to ungroup
-
Serial β 1 worker, sequential Β Β Parallel β 4 cores, no overhead Β Β Distributed β 4 nodes, 600ms handshake but 1.5Γ faster processing. Try 1β2 dots (parallel wins) vs 3+ dots (distributed wins).
TL;DR
- Serial computing executes tasks one after another on a single processor.
- Parallel computing divides tasks across multiple cores on the same machine, allowing simultaneous execution without network overhead.
- Distributed computing spreads tasks across multiple machines, which can provide more resources but incurs network communication overhead.
- The best approach depends on the problem size, resource availability, and performance requirements. For small tasks, parallel computing may outperform distributed due to lower overhead, while for larger tasks, distributed computing can leverage more resources for faster completion.