Compression Quest: Squeeze the Bits

AP CSP Big Idea 2 · ~5 minute lesson · Lossy vs Lossless compression

Earn a Cruncher Badge 2 mini-games Exam-ready

What is Compression?

  • The core idea: Compression is an algorithm that re-encodes data into fewer bits than the original. The encoded version takes less storage and travels faster across networks.
  • Why it exists: Bandwidth and storage cost real money. Without compression, a 4-minute song would be ~40MB instead of ~4MB, and HD streaming would be impossible on most connections.
  • Lossless compression: The original file can be perfectly reconstructed bit-for-bit. It works by spotting patterns (repeated characters, common byte sequences) and replacing them with shorter codes. Examples: ZIP, PNG, GIF, FLAC, run-length encoding, Huffman coding.
  • Lossy compression: The algorithm permanently discards data the human eye/ear is least likely to notice (subtle color shifts, high-frequency sound). The result is much smaller but cannot be restored to the original. Examples: JPEG, MP3, MP4, AAC.
  • The big trade-off: File size vs. fidelity. Lossy gives smaller files at the cost of detail; lossless guarantees fidelity but compresses less. Pick by use case — never use lossy on source code, text, financial records, or anything where every bit matters.
  • AP exam angle: You'll be asked to identify which type fits a scenario, explain why a file might or might not compress well, and reason about trade-offs between size, quality, and processing time.

The 30-second briefing

Compression shrinks data so it moves faster and stores smaller. There are two flavors:

Lossless

Original is perfectly reconstructed. Zero data lost.

Examples: .zip, .png, .gif, .flac, text files.

Lossy

Throws away "less important" bits. Smaller, but can't get original back.

Examples: .jpg, .mp3, .mp4, streaming video.

AP rule of thumb: If losing detail is unacceptable (code, bank records, archives), pick lossless. If a smaller file matters more than perfect fidelity (photos, music, video), pick lossy.

Mini-game 1 — Run-Length Encoder

RLE is a classic lossless trick: replace runs of repeated chars with char+count. Try to make a string that compresses well!

HOW TO PLAY Type any string into the box (or hit a Quick load sample), then click Compress. The output and savings % appear below — try to beat 50% savings by feeding it long repeated runs.
Quick load:
→ result will show here

Why this matters: RLE shows that compression ratio depends on the data. Lots of repetition = great savings. Random data = often bigger after compression.

Mini-game 2 — Sort the Files

Decide whether each file format uses lossless or lossy compression and drop it into the matching bin.

HOW TO PLAY 1. Click a file tile (it gets a yellow outline). 2. Click the 🟢 Lossless or 🟠 Lossy bin to place it. Correct picks turn green ✓, wrong ones red ✗. Goal: all 8 right for the Master Cruncher badge.

🟢 Lossless

🟠 Lossy

0 / 8

Lock it in — quick check

  1. You're emailing the only copy of a contract. Lossy or lossless? (lossless — you can't lose words)
  2. Streaming a music video over slow Wi-Fi. Lossy or lossless? (lossy — smaller wins)
  3. RLE on the string "ABCABC" — would it shrink? (no — no runs to collapse; could even grow)