Open Playbook Data Organization

The Prompt Codex

A practical, role-specific guide to writing effective prompts for every function in a modern data organization. From data engineers to governance leads. Zero fluff.

7Chapters
9+Roles
50+Examples
20+Templates
Zero-Shot Few-Shot Chain-of-Thought Role Prompting RAG Guardrails
Technical Roles

Build & Ship

Data Engineering, ML Engineering, Data Science, QA. You build pipelines, train models, and write code. These prompts help you leverage LLMs for code generation, debugging, documentation, and testing.

Business & Strategy Roles

Analyze & Decide

Product Manager, BI, Data Analyst, Data Governance. You analyze data, define requirements, and set policy. These prompts help you extract insights, write specs, and enforce standards.

What's Inside

Table of Contents
01
Foundations
Core principles, prompt anatomy, common pitfalls
02
Techniques & Patterns
Zero-shot, few-shot, CoT, role prompting, structured output
03
Role-Based Examples
Bad → good prompts for 9+ data org roles
04
Advanced Strategies
Prompt chaining, RAG, evaluation, guardrails
05
Templates & Quick Ref
Copy-paste templates, decision trees, checklists
06
Resources & Next Steps
Tools, further reading, building a team prompt library
Chapter 01

Prompt Engineering Foundations

The core principles that separate effective prompts from wasted tokens. Master these before learning any technique.

The Four Pillars

Core Principle

Clarity

Be unambiguous. State exactly what you want the model to do. Remove words that could be interpreted multiple ways.

Core Principle

Specificity

Constrain the output format, length, and scope. Tell the model what you want and what you do not want.

Core Principle

Context

Provide the background the model needs. Include tech stack, data volumes, constraints, and domain knowledge.

Core Principle

Structure

Use delimiters, sections, and formatting to organize complex prompts. Numbered steps, XML tags, and markdown all work.

Anatomy of a Prompt

# A well-structured prompt has these components: [ROLE] — Who the model should act as You are a senior data engineer with expertise in Spark and Airflow. [CONTEXT] — Background information the model needs We have a daily ETL pipeline processing 50M rows from S3 to BigQuery. The pipeline has been failing intermittently for the past week. [TASK] — The specific instruction Analyze the error log below and provide a root cause analysis with 3 ranked remediation options. [FORMAT] — How the output should be structured Output as: 1. Root Cause (1-2 sentences) 2. Evidence (quote from logs) 3. Remediation Options (ranked by implementation effort) [CONSTRAINTS] — Boundaries and limitations - Do not suggest migrating off Spark - Solutions must be implementable within 1 sprint - Consider our 4-node r5.xlarge cluster [EXAMPLES] — Optional: show the expected pattern Example output format: Root Cause: Memory pressure from skewed join keys...

Common Pitfalls

PitfallWhat HappensFix
Vague instructionsModel guesses intent, produces generic outputState the exact task, audience, and purpose
No output formatInconsistent structure across responsesSpecify JSON, markdown, table, or bullet format
Missing contextToy-level answers that ignore your constraintsInclude tech stack, data volume, and constraints
Too many tasksModel prioritizes some, drops othersOne prompt, one task. Chain for multi-step work
No examplesModel invents its own formatAdd 1-2 input/output examples for complex tasks
Leading languageModel confirms your bias instead of analyzingAsk open-ended questions, request counterarguments

Prompt Quality Checklist

  • Clear task instructionDoes the prompt have a single, unambiguous task?
  • Output format specifiedJSON, markdown, table, numbered list, etc.
  • Relevant context providedTech stack, data volumes, constraints, domain info
  • Constraints definedWhat to avoid, boundaries, limitations
  • Examples included (if complex)1-2 input/output examples for non-trivial formats
  • Single focused taskOne prompt, one job. Multi-step? Use chaining.

Chatbot vs Agent Prompts

There is a fundamental difference between prompting a chatbot (one-shot Q&A) and configuring an AI agent (persistent, tool-using, acting on your behalf). Most people write prompts for chatbots when they actually need agent instructions.

Chatbot Prompt vs Agent Prompt
Chatbot Prompt
One-shot interaction. You ask a question, get an answer. No memory, no tools, no autonomy.

"Summarize this quarterly report."

Works fine for quick tasks where you review every output before acting on it.
Agent Prompt
Persistent instructions for an autonomous system. It acts on your behalf, uses tools, and makes decisions.

"You are a data quality monitor. Check the daily_transactions table every morning. Flag anomalies >2 sigma. If critical, alert #data-oncall in Slack. Never fabricate data points."

Requires: role, scope, tools, escalation rules, and failure instructions.
Key difference: Agent prompts need guardrails, failure modes, and escalation paths that chatbot prompts don't. Treat an agent like a new team member — define what they can and cannot do.
💡
The Agent Mindset When writing agent prompts, ask yourself: "If this agent encounters something unexpected at 3 AM with no human around, what should it do?" If you can't answer that, your prompt isn't ready for production.

Audience-Aware Prompting

The same task often needs different prompts depending on who will consume the output. A data report for an executive looks nothing like one for an engineer.

Same Task, Different Audiences
For Engineers
Summarize yesterday's pipeline failures. Include: error class, affected table, stack trace snippet, root cause hypothesis, and suggested fix with code.

Format: markdown with code blocks. Be technical. Include query IDs.
For Executives
Summarize yesterday's pipeline failures in 3 bullet points. For each: what business process was affected, estimated data delay, and whether it's resolved. Use plain language — no technical jargon. Lead with impact, not cause.
Why this matters: The underlying data is identical. The prompt controls the lens. Always specify your audience in the prompt — it changes vocabulary, detail level, and what counts as "the answer."
⚠️
Iteration is the norm Prompts are not one-and-done. Expect to iterate. The first version is a draft, not a finished product. Test, evaluate, refine. The best prompt engineers treat prompting as an empirical practice, not a one-shot exercise.
Chapter 02

Techniques & Patterns

The toolkit every prompt engineer needs — from basic zero-shot to advanced chaining strategies.

Zero-Shot Prompting

Zero-Shot

Basic
When to Use
Simple, well-defined tasks

Classification, summarization, formatting when the task is unambiguous

Pattern
Instruction only

No examples needed. Just describe the task clearly with constraints

Strength
Fast & cheap

Minimal tokens, quick to write, easy to iterate

Limitation
Format drift

Output format may vary without examples to anchor it

# Zero-shot example Classify the following customer support ticket into exactly one category: Bug Report, Feature Request, Account Issue, or General Question. Ticket: "I can't export my dashboard to PDF. The button spins but nothing downloads." Output only the category name, nothing else.

Few-Shot Prompting

Few-Shot

Intermediate
When to Use
Non-obvious output format

When the format or style is specific and hard to describe in words alone

Pattern
Examples + instruction

Provide 2-5 input/output pairs, then the actual input

Strength
Consistent output

Examples anchor the format, tone, and level of detail

Limitation
Token cost

Each example consumes tokens. Balance quality vs. cost

# Few-shot example: SQL column descriptions Generate a business-friendly description for each database column. Example 1: Column: user_id (INTEGER) Description: Unique identifier for each registered user in the platform Example 2: Column: created_at (TIMESTAMP) Description: Date and time when the record was first created in the system Now generate for: Column: mrr_cents (BIGINT) Description:

Chain-of-Thought (CoT)

Chain-of-Thought

Advanced
When to Use
Reasoning tasks

Math, logic, multi-step analysis, debugging, root cause analysis

Pattern
"Think step by step"

Ask the model to show its reasoning before giving the final answer

Strength
Accuracy

Dramatically improves performance on reasoning-heavy tasks

Limitation
Verbose output

More tokens in the response. May need parsing to extract the answer

# Chain-of-thought example A data pipeline processes 50M rows daily. After adding a new deduplication step, throughput dropped from 50M to 35M rows/day. The dedup step uses a hash-based approach on 3 columns (email, phone, address) with a 10GB lookup table. Think step by step about what could cause this throughput drop. For each hypothesis: 1. State the hypothesis 2. Explain the mechanism 3. Suggest a diagnostic check 4. Rate likelihood (High/Medium/Low) Then provide your top recommendation.

Role / Persona Prompting

Role Prompting

Intermediate
When to Use
Domain expertise needed

When you need responses calibrated to a specific skill level or perspective

Pattern
"You are a [role]..."

Set the persona in the system prompt or at the start of the user message

Strength
Calibrated depth

Responses match the expertise level and vocabulary of the assigned role

Limitation
Can over-index

Model may roleplay too hard. Keep the persona relevant and bounded

# Role prompting example You are a senior data engineer with 10 years of experience building large-scale ETL pipelines. You specialize in Spark, Airflow, and dbt. You value production reliability over cleverness. Review the following Airflow DAG and identify: 1. Any anti-patterns 2. Missing error handling 3. Scalability concerns Be direct and specific. Reference line numbers.

Structured Output

# Structured output: JSON schema enforcement Analyze the following error log and output your findings as JSON matching this exact schema: { "root_cause": string, 1-2 sentences, "severity": "critical" | "high" | "medium" | "low", "affected_tables": [list of table names], "remediation_steps": [ { "step": number, "action": string, "effort": "low" | "medium" | "high" } ], "requires_immediate_action": boolean } Output ONLY valid JSON. No explanatory text before or after.

Iterative Refinement

Prompt Development Cycle
Draft Prompt Test Evaluate Output Identify Gaps Refine Prompt Production Prompt

Technique Comparison

TechniqueBest ForComplexityToken Cost
Zero-ShotSimple, well-defined tasksLowMinimal
Few-ShotFormat-specific outputMediumModerate
Chain-of-ThoughtReasoning & analysisMediumModerate-High
Role PromptingDomain expertiseLowMinimal
Structured OutputMachine-readable resultsMediumLow
Prompt ChainingMulti-step workflowsHighHigh
Practice Exercise

Try All Three

  1. Pick a task from your daily work (e.g., writing SQL, reviewing a PR, drafting a spec)
  2. Write it 3 ways: zero-shot, few-shot, and chain-of-thought
  3. Run all 3 and compare the outputs for accuracy, format consistency, and usefulness
  4. Note which technique worked best and why — this builds your intuition
Chapter 03

Role-Based Examples

Real-world bad → good prompt transformations for every role in a data organization. Each section shows what not to do, why, and how to fix it.

🔧
Data Engineering

Pipeline Builders & Infrastructure Owners

Builds and maintains data pipelines, ETL/ELT processes, data infrastructure. Owns data movement, transformation, and platform reliability.

Key LLM Use Cases: Pipeline debugging, SQL optimization, schema design, documentation generation, migration planning
Role Prompting Structured Output Chain-of-Thought
Example 1 — Pipeline Debugging
✗ Bad Prompt
Fix my Spark job
✓ Good Prompt
I have a PySpark job processing 50M rows from S3 parquet files. It fails with OutOfMemoryError at the groupBy stage. The cluster has 4 r5.xlarge nodes (32GB RAM each). The groupBy key has 10K distinct values. The job ran fine until we added a new column with high cardinality strings (avg 500 chars).

Suggest 3 optimization strategies ranked by implementation effort. For each, explain the mechanism and expected impact.
Why it works: Includes tech stack, data volume, error message, infrastructure specs, and what changed. Requests ranked, structured output.
Example 2 — Airflow DAG Generation
✗ Bad Prompt
Write an Airflow DAG
✓ Good Prompt
Write an Airflow DAG (Airflow 2.7+, TaskFlow API) that:
a) Extracts data from PostgreSQL (connection_id='pg_source')
b) Transforms using a Python function that deduplicates on email column
c) Loads to BigQuery dataset 'analytics.users'

Requirements:
- Retry logic: 3 retries, 5-min delay
- SLA miss callback that sends to Slack
- Schedule: daily at 06:00 UTC
- Use type hints and docstrings
- Include DAG-level tags: ['etl', 'production']
Why it works: Specifies framework version, API style, connection IDs, transformation logic, error handling, schedule, and code quality requirements.
Example 3 — Documentation Generation
✗ Bad Prompt
Document this pipeline
✓ Good Prompt
Generate technical documentation in Markdown for the following dbt model. Include these sections:
1. Purpose (1-2 sentences)
2. Source tables and their owners
3. Transformation logic (step by step)
4. Output schema (table with column name, type, description)
5. Data quality assumptions
6. Downstream dependencies
7. SLA and refresh schedule

Model SQL:
[paste your SQL here]
Why it works: Defines exact documentation sections, output format (markdown), and provides the source material. Produces consistent, comprehensive docs every time.
Pipeline Debugging Template
e.g., PySpark 3.4, Airflow 2.7, dbt 1.6
Exact error text or stack trace snippet
Row count, column count, key cardinality
Cluster size, memory, node types
New columns, schema changes, volume growth
N ranked solutions with effort estimates
💡
Pro Tip — Data Engineering Always include your tech stack version, data volumes, and infrastructure constraints. Data engineering prompts without scale context produce toy answers that break at production volumes.

📋
Product Manager

Requirements Definers & Stakeholder Translators

Defines product requirements, prioritizes features, writes PRDs, and communicates across engineering, design, and business stakeholders.

Key LLM Use Cases: PRD drafting, user story generation, competitive analysis, feature prioritization, stakeholder communication
Role Prompting Chain-of-Thought Structured Output
Example 1 — PRD Drafting
✗ Bad Prompt
Write a PRD for a search feature
✓ Good Prompt
Write a PRD for adding semantic search to our internal knowledge base. Context:
- 5,000 Confluence pages, 200 DAU
- Current keyword search has 35% query failure rate
- Engineering team: 3 backend, 1 ML engineer

Include these sections:
1. Problem Statement (with metrics)
2. User Personas (3 types)
3. Proposed Solution (PM-level architecture)
4. Success Metrics with targets
5. Risks and Mitigations
6. Phased Rollout Plan (3 phases)

Keep it under 2 pages. Use data-driven language.
Why it works: Includes real metrics (35% failure rate), team constraints, specific sections, and length guidance. Produces a stakeholder-ready document, not a generic template.
Example 2 — User Story Generation
✗ Bad Prompt
Create user stories for our product
✓ Good Prompt
Generate user stories in the format: "As a [role], I want [capability], so that [benefit]" with acceptance criteria for each.

Feature: AI-powered data quality alerts in our warehouse monitoring tool
Roles: data engineer, data analyst, data governance lead

For each story include:
- 3-5 acceptance criteria
- Edge cases
- Non-functional requirements (latency, accuracy)
- Story points estimate (S/M/L)
Why it works: Specifies the story format, names the exact feature and user roles, and requests actionable acceptance criteria with edge cases.
Example 3 — Feature Prioritization
✗ Bad Prompt
Help me prioritize features
✓ Good Prompt
Score these 6 candidate features for Q3 using the RICE framework (Reach, Impact, Confidence, Effort). Output as a ranked table with columns: Feature | Reach | Impact | Confidence | Effort | RICE Score | 1-sentence Rationale.

Context: Team of 5 engineers, 10K monthly active users, goal is to reduce churn by 15%.

Features:
1. Email notification preferences
2. Dashboard sharing via link
3. Scheduled report exports
4. Single sign-on (SSO)
5. In-app onboarding tour
6. API rate limit dashboard
Why it works: Names the framework, provides team and metric context, lists specific features, and defines exact output columns.
PRD Prompt Template
What you're building
What exists today, key numbers, pain points
Team size, timeline, tech stack
Problem, personas, solution, metrics, risks, rollout
e.g., "Under 2 pages, data-driven, stakeholder-ready"
💡
Pro Tip — Product Manager Include real metrics and constraints. A PM prompt without numbers produces generic output that requires heavy editing. The more specific your context, the closer the first draft is to shippable.

🤖
ML Engineer

Model Builders & MLOps Operators

Trains, deploys, and monitors ML models. Builds inference pipelines, manages model lifecycle, and optimizes for latency and cost.

Key LLM Use Cases: Model debugging, experiment design, MLOps configuration, performance optimization, model card generation
Chain-of-Thought Role Prompting Structured Output
Example 1 — Model Debugging
✗ Bad Prompt
My model accuracy is low
✓ Good Prompt
I'm training an XGBoost classifier for customer churn prediction.
- Dataset: 500K rows, 45 features, 8% positive class
- Current AUC-ROC: 0.72 after tuning (max_depth=6, n_estimators=500, lr=0.1)
- I suspect class imbalance and possible feature leakage

Provide a systematic debugging approach:
1. Diagnostic checks to run (with Python code)
2. Resampling strategies to try (with tradeoffs)
3. Feature importance analysis steps
4. Validation strategy review

For each suggestion, explain WHY it might help and what to look for in the results.
Why it works: Includes model type, dataset stats, current metrics, hyperparameters, and hypotheses. Requests structured debugging steps with reasoning.
Example 2 — Training Pipeline Design
✗ Bad Prompt
Write a training pipeline
✓ Good Prompt
Design a Kubeflow pipeline for fine-tuning DistilBERT for multi-label text classification (12 labels, 100K training samples).

Pipeline steps:
1. Data preprocessing (tokenization, train/val/test split 80/10/10)
2. Training with W&B experiment tracking
3. Evaluation (F1-macro, per-label precision/recall)
4. Model registry upload to MLflow

Constraints:
- GPU node pool (T4 GPUs)
- Training must complete in <2 hours
- Pipeline must be idempotent

Output: Python pipeline DSL with comments explaining each step.
Why it works: Specifies orchestration tool, model architecture, dataset size, evaluation metrics, compute constraints, and code format requirements.
Model Debugging Template
e.g., XGBoost classifier, DistilBERT, Random Forest
Rows, features, class distribution, data types
AUC, F1, accuracy, loss curves
Current settings, what you've already tuned
What you think might be wrong
💡
Pro Tip — ML Engineer Include your model architecture, dataset stats, and current metrics. ML prompts need quantitative context. Also state what you've already tried so the model doesn't waste your time with obvious suggestions.

🔬
Data Scientist

Analysts & Experiment Designers

Performs statistical analysis, builds predictive models, designs experiments, and communicates insights to stakeholders.

Key LLM Use Cases: EDA guidance, statistical test selection, visualization code, insight summarization, experiment design
Chain-of-Thought Structured Output Few-Shot
Example 1 — Exploratory Data Analysis
✗ Bad Prompt
Analyze this data
✓ Good Prompt
Generate a Python EDA script (pandas + matplotlib) for a dataset of 200K e-commerce transactions with columns: user_id, timestamp, product_category, amount, is_return, channel.

The script should:
a) Compute summary statistics per product category
b) Plot return rate trends over time (monthly, line chart)
c) Segment users by purchase frequency using RFM analysis
d) Test whether mobile vs desktop has a significantly different average order value (Mann-Whitney U, alpha=0.05)

Output: clean, commented Python code. Use seaborn for styling. Include print statements explaining each finding.
Why it works: Names exact columns, specifies statistical tests with parameters, defines visualization types, and requests commented code with interpretive output.
Example 2 — Statistical Test Selection
✗ Bad Prompt
What statistical test should I use?
✓ Good Prompt
I need to determine if a new recommendation algorithm (variant B) increases click-through rate vs. control (variant A).

Details:
- Binary outcome (click/no-click)
- Sample: 50K users per group
- Baseline CTR: 3.2%
- Minimum detectable effect: 0.5 percentage points
- We need 80% power at alpha=0.05

Questions:
1. Which test is appropriate and why?
2. Are the assumptions met?
3. Calculate required sample size
4. Provide Python code using scipy.stats
5. How should we handle multiple comparisons if we add a third variant?
Why it works: Provides outcome type, sample sizes, baseline metrics, power requirements, and asks specific statistical questions with code output.
Analysis Request Template
Rows, columns, types, time range
What decision will this analysis inform?
Alpha level, power, confidence intervals needed?
Python code, markdown report, executive summary
💡
Pro Tip — Data Scientist Specify your statistical rigor requirements upfront. Without them, you get blog-post-level analysis. Always include alpha levels, power requirements, and what assumptions you need validated.

📊
Business Intelligence

Dashboard Builders & KPI Definers

Builds dashboards, writes SQL for reporting, defines KPIs, and translates business questions into data queries.

Key LLM Use Cases: SQL query generation, dashboard design, KPI definition, report narration, data modeling
Structured Output Few-Shot Chain-of-Thought
Example 1 — SQL Query Generation
✗ Bad Prompt
Write a SQL query for revenue
✓ Good Prompt
Write a BigQuery SQL query that calculates monthly recurring revenue (MRR).

Tables:
- subscriptions (id, user_id, plan_id, start_date, end_date, status)
- plans (id, name, monthly_price)

Requirements:
a) Only include active subscriptions (status='active', end_date IS NULL or > CURRENT_DATE)
b) Handle mid-month upgrades/downgrades by prorating
c) Output columns: month, total_mrr, new_mrr, churned_mrr, expansion_mrr
d) Use CTEs for readability
e) Include comments explaining the logic
f) Handle NULLs in end_date gracefully
Why it works: Specifies SQL dialect, table schemas, business logic (proration), output columns, code style (CTEs), and edge cases (NULLs).
Example 2 — Dashboard Design
✗ Bad Prompt
Design a dashboard
✓ Good Prompt
Design a Tableau dashboard layout for an executive weekly business review.

KPIs to include: ARR, Net Revenue Retention, CAC Payback Period, DAU/MAU Ratio, Support Ticket Resolution Time.

For each KPI specify:
1. Visualization type (number, line, bar, gauge)
2. Comparison benchmark (WoW, MoM, vs target)
3. Drill-down dimension
4. Alert threshold (red/yellow/green)

Layout: 3 rows, 2 tiles each. Top row = revenue metrics. Middle = engagement. Bottom = operations. Include filter bar at top (date range, region, product).
Why it works: Names the tool, audience, specific KPIs, visualization requirements, layout structure, and interactive elements. Produces a buildable spec.
SQL Generation Template
BigQuery, PostgreSQL, Snowflake, Redshift
Table names, columns, types, relationships
What to calculate, how to handle edge cases
Exact columns needed in the result
CTEs, comments, formatting preferences
💡
Pro Tip — Business Intelligence Always specify your SQL dialect, table schemas, and edge cases (NULLs, duplicates, timezone handling). A BI prompt without schema context produces SQL that looks right but returns wrong numbers.

📈
Data Analyst

Question Answerers & Insight Framers

Answers ad-hoc business questions, performs cohort analysis, creates reports, and supports decision-making with data.

Key LLM Use Cases: Ad-hoc query help, diagnostic analysis, cohort analysis, report writing, insight framing
Chain-of-Thought Structured Output Role Prompting
Example 1 — Diagnostic Analysis
✗ Bad Prompt
Why is revenue down?
✓ Good Prompt
Revenue dropped 12% MoM in February 2026. I have access to tables: orders, users, products, marketing_spend (all in BigQuery).

Generate a diagnostic analysis plan as a numbered list of SQL queries to run. For each query include:
a) The hypothesis being tested
b) The SQL query (BigQuery syntax)
c) How to interpret the result

Cover these dimensions:
1. Channel mix shift
2. Cohort retention change
3. Product mix change
4. Pricing/discount impact
5. Seasonality vs. anomaly
Why it works: Quantifies the problem (12% MoM), names available data sources, specifies output structure per query, and lists the diagnostic dimensions to cover.
Example 2 — Cohort Analysis
✗ Bad Prompt
Do a cohort analysis
✓ Good Prompt
Write a Python script (pandas) for monthly user retention cohort analysis.

Input: CSV with columns user_id, signup_date, activity_date
Output:
a) Cohort pivot table showing retention % for months 0-12
b) Seaborn heatmap visualization with percentage annotations
c) Summary identifying the cohort with best/worst M1 retention
d) Overall trend line (are newer cohorts retaining better or worse?)

Include data cleaning for: duplicate rows, missing dates, users with activity before signup.
Why it works: Defines input schema, exact output artifacts (pivot + heatmap + summary), specific metrics (M1 retention), and data quality handling.
Diagnostic Analysis Template
What metric changed, by how much, over what period
Table names, SQL dialect, accessible dimensions
Channel mix, retention, pricing, seasonality, etc.
Hypothesis, SQL, interpretation guidance
💡
Pro Tip — Data Analyst Quantify the problem before asking for a diagnosis. "Revenue is down" gets generic advice. "Revenue dropped 12% MoM, primarily in the EMEA segment" gets targeted analysis plans.

🧪
QA Engineer

Quality Gatekeepers & Test Designers

Designs test plans, writes test cases, performs regression testing, and validates data quality across the stack.

Key LLM Use Cases: Test case generation, test data creation, bug report analysis, regression planning, data validation rules
Structured Output Few-Shot Guardrails
Example 1 — API Test Cases
✗ Bad Prompt
Write test cases for the API
✓ Good Prompt
Generate test cases for REST endpoint POST /api/v2/users.

Input schema:
- name: string (required, 1-100 chars)
- email: string (required, valid email format)
- role: enum ['admin', 'viewer', 'editor']

Cover:
a) Happy path for each role
b) Validation errors (missing fields, invalid email, name too long, invalid role)
c) Duplicate email handling (expect 409 Conflict)
d) Rate limiting (100 req/min, expect 429)
e) Auth: requires Bearer token with 'user:create' scope

Format as table: Test ID | Category | Input | Expected Status | Expected Response Body
Why it works: Provides the exact API contract, field constraints, error scenarios, auth requirements, and output table format. Produces comprehensive, actionable test cases.
Example 2 — Data Quality Validation
✗ Bad Prompt
Help me test data quality
✓ Good Prompt
Write a Great Expectations validation suite for a 'daily_transactions' table.

Columns:
- transaction_id: UUID, must be unique
- amount: decimal, 0.01-999999.99
- currency: ISO 4217, only USD/EUR/GBP
- created_at: timestamp, not future, not older than 90 days
- user_id: foreign key to users table

Also include:
- Row count expectation: between 10K-500K per day
- Cross-column: if currency=GBP then amount < 100000
- Null checks on all columns

Output as Python code using the Great Expectations API.
Why it works: Specifies the validation framework, exact column constraints, cross-column rules, volume expectations, and output format.
Test Plan Generation Template
API endpoint, data table, pipeline, UI component
Fields, types, constraints, valid values
Happy path, validation, edge cases, security, performance
Status codes, error messages, side effects
Table, pytest code, Great Expectations suite, etc.
💡
Pro Tip — QA Engineer Always specify boundary values, error conditions, and security requirements. Happy-path-only test prompts produce happy-path-only tests. The bugs live at the boundaries.

🛡️
Data Governance

Policy Makers & Compliance Guardians

Defines data policies, manages data catalogs, ensures compliance, handles PII classification, and enforces data quality standards.

Key LLM Use Cases: Policy drafting, PII detection rules, data catalog enrichment, compliance documentation, access control reviews
Role Prompting Structured Output Chain-of-Thought
Example 1 — Data Retention Policy
✗ Bad Prompt
Write a data policy
✓ Good Prompt
Draft a data retention policy for our organization (fintech, US-based, SOC 2 Type II certified).

Cover these data categories:
a) Customer PII (name, email, SSN, phone)
b) Transaction records
c) Application logs
d) Marketing analytics data

For each category specify:
1. Retention period with legal justification
2. Storage requirements (encryption, access controls)
3. Deletion procedure (soft delete vs. hard delete, verification)
4. Exceptions (litigation hold, regulatory audit)
5. Responsible data steward role

Format as a numbered policy document. Reference SOC 2, CCPA, and GLBA where applicable.
Why it works: Specifies industry, compliance frameworks, exact data categories, and the structure of each policy section. Produces a near-final policy document.
Example 2 — Data Classification
✗ Bad Prompt
Classify our data
✓ Good Prompt
Classify each column in the following table schema into sensitivity tiers: Public, Internal, Confidential, Restricted.

Classification rules:
- Direct PII (name, email, SSN, phone) = Restricted
- Indirect PII (zip code, birth year, IP address) = Confidential
- Business metrics (revenue, user count) = Internal
- Product metadata (feature names, categories) = Public

Table schema: [paste your schema here]

Output as a table: Column Name | Data Type | Sensitivity Tier | Justification | Required Controls (encryption, masking, access level)
Why it works: Provides explicit classification rules, defines the tiers, and requests justification and required controls for each column. Auditable output.
Policy Document Template
Industry, size, certifications, regulatory environment
Types of data to cover (PII, transactions, logs, etc.)
GDPR, CCPA, SOC 2, HIPAA, GLBA, etc.
Retention, storage, deletion, exceptions, ownership
💡
Pro Tip — Data Governance Always specify your regulatory context (GDPR, CCPA, SOC 2, HIPAA). Generic governance prompts produce generic policies that don't survive an audit. Name the frameworks, and the model will cite them correctly.

Additional Roles

The patterns above apply to every data role. Here are quick examples for three more common positions.

Analytics Engineer

dbt Model Review
✗ Bad Prompt
Review my dbt model
✓ Good Prompt
Review this dbt model for: (1) naming convention violations (we use snake_case, prefix staging models with stg_), (2) missing tests (unique, not_null on PKs), (3) join anti-patterns, (4) opportunities to use incremental materialization. Output findings as a table: Line | Issue | Severity | Suggested Fix.

[paste model SQL]

Platform Engineer

Infrastructure Configuration
✗ Bad Prompt
Set up Kubernetes for ML
✓ Good Prompt
Write a Terraform module for a GKE cluster optimized for ML training workloads. Requirements: 1 CPU node pool (n2-standard-4, 3-10 nodes autoscaling) for orchestration, 1 GPU node pool (T4, 0-4 nodes, scale to zero when idle), preemptible instances for cost, Workload Identity enabled, network policy for namespace isolation. Include comments explaining each decision.

Data Product Manager

Data Product Spec
✗ Bad Prompt
Write a spec for our data product
✓ Good Prompt
Write a data product spec for a "Customer Health Score" API consumed by the sales team. Include: data sources (CRM events, support tickets, usage logs), scoring methodology (weighted composite, 0-100), API contract (REST, JSON, <200ms p99), refresh frequency (hourly), data quality SLAs (99.5% completeness, <5min staleness), and a rollout plan (alpha with 10 reps → beta → GA).
Chapter 04

Advanced Strategies

Multi-step workflows, prompt chaining, RAG patterns, and evaluation frameworks for production-grade prompt engineering.

Multi-Step Workflows

Complex tasks should be broken into atomic prompts. Each step has one clear objective. The output of step N becomes the input to step N+1.

Example: Data Quality Pipeline
Step 1: Profile Step 2: Detect Anomalies Step 3: Root Cause Step 4: Report
# Step 1: Data Profiling Prompt Profile the following dataset. For each column, output: - Data type, null count, unique count - Min/max/mean (numeric) or top 5 values (categorical) - Distribution shape (normal, skewed, bimodal, uniform) Output as a JSON array. # Step 2: Anomaly Detection Prompt (uses Step 1 output) Given this data profile: [paste Step 1 output] Identify columns with anomalies: unexpected nulls, outliers beyond 3 sigma, cardinality changes >20% from expected. Output as a ranked list. # Step 3: Root Cause Prompt (uses Step 2 output) These anomalies were detected: [paste Step 2 output] For each anomaly, hypothesize the root cause. Consider: upstream schema changes, data source outages, ETL bugs, seasonality. Rank by likelihood. # Step 4: Report Prompt (uses Steps 1-3) Generate an executive summary of data quality findings. Profile: [Step 1] Anomalies: [Step 2] Root causes: [Step 3] Format: 1-page markdown with severity indicators.
💡
Key Principle Break complex tasks into atomic prompts. Each step should have one clear objective and a well-defined output schema that feeds the next step. This dramatically improves reliability and makes debugging easier.

RAG-Aware Prompting

When your prompt includes retrieved context (documents, wiki pages, previous outputs), you need special instructions to keep the model grounded.

# RAG prompt pattern Context: The following documents were retrieved from our internal knowledge base. [DOCUMENT 1] {retrieved_document_1} [DOCUMENT 2] {retrieved_document_2} Instructions: Using ONLY the information in the documents above, answer the following question. If the answer is not found in the provided documents, respond with: "Not found in provided context." Do NOT use your general knowledge. Cite the document number for each claim. Question: {user_question} Format: Answer: [your answer] Sources: [Document N, paragraph M]
⚠️
RAG Grounding Always instruct the model to cite sources and stay within provided context. Without this, the model will confidently blend retrieved facts with its own training data, which is the #1 failure mode in RAG systems.

Prompt Evaluation Framework

DimensionWhat to MeasureHow to Test
AccuracyIs the output factually correct?Compare against known-good answers on a test set
CompletenessDoes it cover all required elements?Checklist scoring: did it include all sections?
FormatDoes output match the specified structure?Schema validation (JSON), regex matching
RelevanceIs it focused on the question asked?Evaluate ratio of useful vs. filler content
ConsistencySame prompt → same quality across runs?Run 5x and compare variance in output quality
SafetyNo PII leakage, no harmful content?Adversarial input testing, PII scanning

Guardrails & Safety

# System prompt: Output guardrails SAFETY INSTRUCTIONS: 1. Never include real PII in outputs. If the input contains PII, replace with placeholders: [NAME], [EMAIL], [PHONE], [SSN]. 2. If the query asks for information outside your knowledge, say "I don't have enough information to answer this accurately." 3. Never generate SQL with DROP, DELETE, or TRUNCATE unless explicitly requested and confirmed. 4. All generated code must include input validation. 5. Flag any data that appears to contain test/dummy values mixed with production data. VALIDATION: Before outputting, verify: - No email addresses, phone numbers, or IDs from the input appear in your output - Any SQL is read-only unless write operations were explicitly requested - Numerical claims are supported by the provided data
🚨
Production Rule Never deploy a prompt to production without guardrails. Test adversarial inputs. What happens when someone passes malicious SQL through a text field? What happens when the model encounters PII it should redact?

The Four Agent Safety Principles

When your prompt powers an autonomous agent (not just a one-shot chatbot), these four principles are non-negotiable.

Principle 1

Minimal Access

Grant only the permissions the agent needs. An email drafting agent should not have access to your calendar, contacts, and file system. Scope tools and data access to the minimum required for the task.

Principle 2

Plan for Failure

Define explicit fallback behavior. Every agent prompt must answer: "What do you do when you don't know?" The answer should never be "guess." Include: "If you cannot find the information, respond with 'I don't have enough information' and flag for human review."

Principle 3

Human Oversight

Keep approval gates on consequential actions. An agent can draft an email, but a human should send it. An agent can recommend a schema change, but a human should execute it. Define the line explicitly in your prompt.

Principle 4

Honest Limitations

Agents must flag uncertainty, not fabricate confidence. Include constraints like: "Do not fill in gaps with assumptions. If data is missing or ambiguous, say so explicitly and explain what additional information would be needed."

# Agent safety pattern: the "uncertainty clause" # Add this to EVERY agent system prompt: CRITICAL CONSTRAINTS: - Do not fill in gaps with assumptions. If information is missing, say "I don't have enough information to answer this" and specify what data you would need. - Do not take irreversible actions without human approval. Draft, propose, and recommend — but never execute unless explicitly authorized for that specific action class. - Do not access data or systems beyond your defined scope. If a task requires access you don't have, flag it and stop. - Always flag low-confidence outputs. If your confidence in a conclusion is below 80%, say so and explain why.

Pre-Deployment Checklist

Before any prompt goes into production (powering an agent, API, or automated workflow), run through this checklist.

  • Goal definedCan you state in one sentence what this prompt should accomplish?
  • Permissions scopedDoes the agent have access only to what it needs? No more.
  • Failure mode testedWhat happens with missing data, bad input, or ambiguous requests?
  • Escalation path definedWhen should the agent stop and ask a human? Is that path clear?
  • Adversarial inputs testedWhat happens with prompt injection, SQL injection, or PII in unexpected fields?
  • Output validated on 10+ examplesRun the prompt against real-world inputs and grade each output.
  • Human approval gate in placeFor any action with consequences (sending, deleting, modifying), is there a confirmation step?
💡
Start Small, Ship Something That Works Don't try to automate everything at once. Pick one workflow, one template. Test it thoroughly. Get it working reliably. Then expand. The teams that succeed with AI agents are the ones that start with a single, well-scoped prompt — not the ones that try to build an "AI platform" on day one.
Practice Exercise

Design a Prompt Chain

  1. Pick your team's most common multi-step workflow (e.g., incident analysis, data onboarding, report generation)
  2. Break it into 3-5 atomic steps with clear input/output contracts
  3. Write the prompt for each step including the handoff format
  4. Test the chain end-to-end with real data
  5. Add guardrails at steps where PII or sensitive data might flow through
Chapter 05

Templates & Quick Reference

Copy-paste prompt templates and decision aids for daily use. Bookmark this chapter.

Universal Prompt Template

The Master Template
You are a [role] with expertise in [domain]. You value [priorities].
Background: [situation]. We have [data/resources]. The constraint is [limitation].
[Specific action verb]: analyze / generate / review / design / debug...
Output as [JSON / markdown table / numbered list / code]. Include [sections].
Do not [limitation]. Must [requirement]. Keep under [length].
Example input: ... → Example output: ...

Role-Specific Quick Templates

Data Engineering

You are a senior data engineer. I have a [technology] pipeline that [problem]. Tech: [version]. Data: [volume]. Infra: [specs]. Error: [message]. Provide [N] solutions ranked by effort. For each: mechanism, code change, risk.

Product Manager

You are an experienced PM. Write a [PRD/user story/spec] for [feature]. Context: [users, current metrics, team size, timeline]. Include: [sections]. Keep under [length]. Use data-driven language.

ML Engineer

You are a senior ML engineer. My [model type] for [task] shows [metric = value]. Dataset: [rows, features, class balance]. Hyperparams: [current settings]. I've tried: [previous attempts]. Provide a systematic debugging approach with code.

Data Scientist

You are a statistician. I need to [analysis goal]. Data: [description, columns, rows]. Statistical requirements: alpha=[val], power=[val]. Output: [Python code / methodology / both]. Include assumption checks.

Business Intelligence

SQL dialect: [BigQuery/Snowflake/Postgres]. Tables: [schemas]. Write a query that [business requirement]. Handle: [edge cases]. Output: CTEs, comments, columns: [list]. Test with: [sample data note].

Data Analyst

Problem: [metric] changed by [amount] over [period]. Available tables: [list] in [SQL dialect]. Generate [N] diagnostic queries. Each: hypothesis, SQL, interpretation. Dimensions to cover: [list].

QA Engineer

System: [API endpoint / data table / pipeline]. Contract: [input schema, constraints, valid values]. Generate test cases covering: [happy path, validation, edge cases, security, performance]. Format: [table / pytest code / Great Expectations suite].

Data Governance

Organization: [industry, size, certifications]. Draft a [policy/classification/audit doc] for [data categories]. Compliance: [GDPR, CCPA, SOC 2, HIPAA]. For each category: [retention, controls, deletion, exceptions, ownership].

Decision Tree: Which Technique?

Technique Selection Guide
Is the task simple? → Yes → Zero-Shot
Need specific format? → Yes → Few-Shot
Requires reasoning? → Yes → Chain-of-Thought
Needs domain expertise? → Yes → Role Prompting
Multi-step workflow? → Yes → Prompt Chaining
Uses external docs? → Yes → RAG Pattern

Audience-Aware Prompt Template

Use this when the same underlying task needs different outputs for different stakeholders.

Audience-Aware Template
[What to analyze / summarize / report on]
[Executive / Engineer / PM / External client] — this controls vocabulary and detail level
[Impact & timeline / Root cause & fix / Scope & priority / Cost & risk]
[Plain language, no jargon / Technical with code / Data-driven with charts / Formal with recommendations]
[3 bullet points / Detailed with sections / 1-page summary / Slide-ready]

Agent Pre-Launch Checklist Template

Before deploying any prompt that powers an autonomous agent or automated workflow.

Agent Pre-Launch Template
What does this agent accomplish? e.g., "Monitor data quality and alert on anomalies"
What can it access? What is off-limits? e.g., "Read-only on analytics tables. No access to PII tables."
What does it do when uncertain? e.g., "Flag for human review. Never guess or fabricate."
When and how to involve a human. e.g., "If anomaly severity > high, page #data-oncall in Slack."
Which actions require human confirmation? e.g., "Can draft alerts. Cannot send without approval."
Happy path, missing data, bad input, adversarial input, edge cases

Prompt Quality Checklist

  • Single, clear taskOne prompt = one job
  • Output format specifiedJSON, table, markdown, code
  • Context providedTech stack, data volume, constraints
  • Constraints definedWhat NOT to do, boundaries, limitations
  • Examples includedFor complex or non-obvious formats
  • Role assignedIf domain expertise matters
  • Tested with edge casesWhat happens with bad input?
  • Guardrails in placePII handling, safety, error conditions
  • Evaluated on test setRun 3-5x and check consistency
  • Version controlledTracked in your team's prompt library

Token Optimization Tips

TechniqueDescription
Remove fillerCut "please", "I would like you to", "could you kindly". Direct instructions use fewer tokens.
Use abbreviationsIn structured prompts, use short keys: "ctx:" instead of "context:", "fmt:" instead of "format:"
Minimize examplesStart with 1-2 examples. Only add more if output quality is inconsistent.
Reference, don't repeatSay "use the same format as above" instead of restating the entire format specification.
Compress contextSummarize large documents before including them. Send the summary, not the full text.
Use system promptsMove static instructions (role, constraints) to the system prompt. They persist across turns.
Chapter 06

Resources & Next Steps

Where to go from here — tools, further reading, and how to build a team prompt library.

Recommended Tools

LLM Platform

Claude (Anthropic)

Strong reasoning, large context window, excellent for code and analysis tasks. System prompts for persistent instructions.

LLM Platform

ChatGPT (OpenAI)

Broad knowledge, code interpreter, image understanding. Custom GPTs for team-specific workflows.

Framework

LangChain / LangGraph

Build multi-step prompt chains with tooling. Great for RAG pipelines and agent workflows.

Framework

LlamaIndex

Purpose-built for RAG applications. Handles document ingestion, indexing, and retrieval-augmented generation.

Evaluation

Promptfoo / Braintrust

Test prompts against eval sets. Track quality metrics across versions. CI/CD for prompts.

IDE Integration

Cursor / GitHub Copilot

Code-focused AI assistants that use prompt engineering principles under the hood. Learn from how they construct prompts.

Further Reading

ResourceWhat You'll Learn
Anthropic DocsOfficial prompt engineering guide. Best practices for Claude, system prompts, and structured output.
OpenAI CookbookProduction patterns for GPT-based applications. Code examples for common tasks.
DAIR.AI Prompt GuideComprehensive academic reference covering all major prompting techniques with research citations.
Chain-of-Thought PaperThe original research (Wei et al., 2022) showing how step-by-step reasoning improves LLM accuracy.
Few-Shot Learners PaperGPT-3 paper (Brown et al., 2020) establishing in-context learning and few-shot patterns.

Building a Team Prompt Library

The most effective data teams maintain a shared prompt library. Here's how to start one.

Prompt Library Entry Template
e.g., "Pipeline Debugging - Spark OOM"
What task does this prompt solve?
The prompt text with [placeholders] for variable parts
One real-world example showing the prompt in action
How to judge if the output is good enough
v1.2 | @jane | Last updated 2026-03-15
💡
Library Best Practices Store prompts in version control (Git). Review prompt changes like code changes. Include eval results with each version. Assign owners per prompt. Retire prompts that haven't been used in 90 days.

Prompt Review Process

  • Peer reviewHave a colleague run the prompt blind and evaluate the output
  • Edge case testingTest with unusual inputs, empty data, adversarial queries
  • A/B comparisonCompare your new prompt against the current version on 10+ examples
  • Cost analysisMeasure token usage and compare against simpler alternatives
  • DocumentationUpdate the library entry with new eval results and learnings
Final Exercise

Build Your Team's First 5-Prompt Library

  1. Identify the 5 tasks your team performs most frequently with LLMs
  2. Write a prompt template for each using the principles from this playbook
  3. Test each prompt on 3 real-world examples and document the results
  4. Get peer review from at least one colleague per prompt
  5. Store in version control with the Library Entry Template above
  6. Set a calendar reminder to review and update the library quarterly
🎯
The Takeaway The best prompt engineers aren't the ones who memorize techniques — they're the ones who understand their domain deeply and translate that understanding into clear, structured instructions. Domain expertise + prompt structure = consistently great results.