How Quarantine Saved Our Pipeline (And My Sleep)

Follow along in the notebook — run this example live in your browser, no install needed.

The 2am Scenario

⚠️ Incident — 02:17 AM

Your overnight Bronze ingestion job, the one that processes 400,000 web signup records and feeds your Silver marketing tables, has failed.

Dashboard is empty. The daily email campaign query is pulling from yesterday's data. Stakeholders are waking up to stale numbers. You're trying to debug a stack trace at 2am looking for one row that contains a null where your pipeline expected a date.

This is the default behaviour of most data pipelines: one bad record vetoes the entire run. The job either errors out entirely, or silently drops bad records with no audit trail. Neither is acceptable in production.

The Nuclear Option Problem

The typical "fix" is to wrap everything in try/except and log a warning. But that just hides the problem — bad data silently disappears and you have no way to investigate, reprocess, or alert on it.

the_wrong_fix.py

for row in records:
    try:
        process(row)
    except Exception as e:
        # Bad row silently vanishes. No audit trail.
        # No way to know what failed, or what % of data is bad.
        logger.warning(f"Skipping row: {e}")

You have two choices: fail loud (the 2am incident) or fail silent (broken dashboards that nobody notices for days). Neither is the right answer.

The right answer is a third option: route bad records out, keep clean records flowing, and leave a full audit trail of exactly what was rejected and why.

The Quarantine Pattern

In LakeLogic, you add three lines to your data contract. That's it.

contract.yaml

version: "1.0.0"
info:
  title: Bronze Web Signups

dataset: bronze_web_signups

model:
  fields:
    - name: signup_id
      type: string
      required: true
    - name: email
      type: string
      required: true
    - name: event_date
      type: date
      required: true
    - name: age
      type: int

quality:
  row_rules:
    - name: email_format
      sql: "email LIKE '%@%'"
    - name: age_positive
      sql: "age IS NULL OR age >= 0"

quarantine:
  include_error_reason: true                  # adds _reject_reason column

LakeLogic now does something different when it encounters a bad row. Instead of raising an exception, it routes the row to a separate quarantine file — and stamps it with a _reject_reason column explaining exactly which rule it violated.

Running It

run.py

from lakelogic import DataProcessor

result = DataProcessor("contract.yaml").run_source()

# result.good  → clean records, ready for Silver
# result.bad   → quarantined records with _reject_reason

print(f"✅ {len(result.good):,} valid rows")
print(f"🔒 {len(result.bad):,} quarantined rows")

# Inspect the reject reasons
print(result.bad["_reject_reason"].value_counts())

What the quarantined rows look like

quarantine sample output

┌────────────┬──────────────────┬────────────┬─────┬──────────────────────────────────────┐
│ signup_id  │ email            │ event_date │ age │ _reject_reason                        │
├────────────┼──────────────────┼────────────┼─────┼──────────────────────────────────────┤
│ SU-10042   │ not-an-email     │ 2026-02-28 │  32 │ row_rule:email_format                 │
│ SU-10091   │ jane@example.com │ 2026-03-01 │  -5 │ row_rule:age_positive                 │
│ SU-10103   │ NULL             │ 2026-03-01 │  28 │ required_field:email                  │
└────────────┴──────────────────┴────────────┴─────┴──────────────────────────────────────┘

Every quarantined row is preserved exactly as it arrived, with a clear reason. You can now investigate, fix the upstream source, and reprocess — without any data loss.

The Result: What Changes

✅ With quarantine enabled

Valid rows → Silver

398,247

Pipeline completes successfully

Quarantined rows

1,753

0.4% — with reasons, auditable

The pipeline finishes. The dashboard is populated. The 0.4% of bad records are held in a quarantine file with full audit columns. Your data quality team can review them in the morning — not at 2am.

Writing Quarantine to a Table (Not Just a File)

By default, quarantined rows write to a Parquet file next to your output. For production pipelines, you probably want them in a queryable table your data quality team can monitor:

contract.yaml — table quarantine

quarantine:
  include_error_reason: true
  target: "table:data_quality.bronze_signups_quarantine"
  # Writes to DuckDB by default (Polars/Pandas pipelines)
  # Writes to Spark Delta table on Databricks pipelines
  # Schema evolves automatically as your contract changes

Now your DQ team can run:

monitor.sql

-- Which rules are failing most often this week?
SELECT
    _reject_reason,
    COUNT(*) AS failures,
    DATE_TRUNC('day', event_date) AS day
FROM data_quality.bronze_signups_quarantine
WHERE _run_timestamp > NOW() - INTERVAL '7 days'
GROUP BY 1, 3
ORDER BY 2 DESC;

When to Use Quarantine vs. Fail Fast

Scenario	Behaviour	Why
Overnight batch jobs	Quarantine	Bad rows shouldn't cancel a 2-hour run
Bronze ingestion layer	Quarantine	Raw data is always dirty — expect and handle it
Financial calculations (revenue, payroll)	Fail fast	Partial results are worse than no results
Schema contract violations (wrong column types)	Fail fast	Structural problems need immediate attention
Silver/Gold quality gates	Quarantine	Route bad rows out, let clean rows continue downstream
CI / data contract tests	Fail fast	A broken contract should block the PR

The Rule of Thumb

If a bad record would cause all the other records to be wrong, fail fast. If a bad record is just itself wrong, quarantine it and keep going.

Most data pipelines deal with the second case far more often than the first. Bronze ingestion especially — you're processing data from external sources you don't control. Quarantine is your safety net.

Add Quarantine to Your Pipeline Today

Three lines of YAML. Works on Polars, Spark, DuckDB and Pandas. Full audit trail included. Open source, MIT licensed.

⭐ Star on GitHub Open Notebook in Colab →

How Quarantine Saved Our Pipeline
(And My Sleep)

The 2am Scenario

⚠️ Incident — 02:17 AM

The Nuclear Option Problem

The Quarantine Pattern

Running It

What the quarantined rows look like

The Result: What Changes

✅ With quarantine enabled

Writing Quarantine to a Table (Not Just a File)

When to Use Quarantine vs. Fail Fast

The Rule of Thumb

Add Quarantine to Your Pipeline Today

Continue Reading

Stop the Spark Tax: One Data Contract, Any Engine

Data Contracts vs Schema Validation — The Difference Matters