Compression Quest — Squeeze the Bits
A 5-minute gamified AP CSP lesson on lossy and lossless compression. Compress strings with run-length encoding, then sort real-world file types into the right bin.
Compression Quest: Squeeze the Bits
AP CSP Big Idea 2 · ~5 minute lesson · Lossy vs Lossless compression
What is Compression?
- The core idea: Compression is an algorithm that re-encodes data into fewer bits than the original. The encoded version takes less storage and travels faster across networks.
- Why it exists: Bandwidth and storage cost real money. Without compression, a 4-minute song would be ~40MB instead of ~4MB, and HD streaming would be impossible on most connections.
- Lossless compression: The original file can be perfectly reconstructed bit-for-bit. It works by spotting patterns (repeated characters, common byte sequences) and replacing them with shorter codes. Examples: ZIP, PNG, GIF, FLAC, run-length encoding, Huffman coding.
- Lossy compression: The algorithm permanently discards data the human eye/ear is least likely to notice (subtle color shifts, high-frequency sound). The result is much smaller but cannot be restored to the original. Examples: JPEG, MP3, MP4, AAC.
- The big trade-off: File size vs. fidelity. Lossy gives smaller files at the cost of detail; lossless guarantees fidelity but compresses less. Pick by use case — never use lossy on source code, text, financial records, or anything where every bit matters.
- AP exam angle: You'll be asked to identify which type fits a scenario, explain why a file might or might not compress well, and reason about trade-offs between size, quality, and processing time.
The 30-second briefing
Compression shrinks data so it moves faster and stores smaller. There are two flavors:
Lossless
Original is perfectly reconstructed. Zero data lost.Examples: .zip, .png, .gif, .flac, text files.
Lossy
Throws away "less important" bits. Smaller, but can't get original back.Examples: .jpg, .mp3, .mp4, streaming video.
AP rule of thumb: If losing detail is unacceptable (code, bank records, archives), pick lossless. If a smaller file matters more than perfect fidelity (photos, music, video), pick lossy.
Mini-game 1 — Run-Length Encoder
RLE is a classic lossless trick: replace runs of repeated chars with char+count. Try to make a string that compresses well!
Why this matters: RLE shows that compression ratio depends on the data. Lots of repetition = great savings. Random data = often bigger after compression.
Mini-game 2 — Sort the Files
Decide whether each file format uses lossless or lossy compression and drop it into the matching bin.
🟢 Lossless
🟠 Lossy
Lock it in — quick check
- You're emailing the only copy of a contract. Lossy or lossless? (lossless — you can't lose words)
- Streaming a music video over slow Wi-Fi. Lossy or lossless? (lossy — smaller wins)
- RLE on the string
"ABCABC"— would it shrink? (no — no runs to collapse; could even grow)