Domain 4: Prompt Engineering | Claude Certified Architect Exam Guide

Task 4.1

Design Prompts with Explicit, Actionable Criteria

One of the most common mistakes in production prompt design is relying on vague qualitative instructions instead of specifying concrete, categorical criteria the model can evaluate deterministically. The difference between a useful code review agent and a noisy one often comes down to how the review standards are articulated in the prompt.

Concrete Categories Beat Vague Adjectives

Consider the difference between telling the model to "check that comments are accurate" versus instructing it to "flag any comment whose description contradicts the observable behavior of the surrounding code." The first phrasing gives the model latitude to interpret "accurate" however it wants, leading to inconsistent results. The second defines a precise condition the model can check mechanically.

Similarly, instructions like "be conservative" or "only report findings you're highly confident about" do not measurably improve precision. The model has no reliable internal confidence calibration, so these qualifiers add no filtering value. What actually reduces false positives is defining explicit categories of issues to look for — and equally important, specifying what to ignore.

False Positives Undermine Accurate Categories Too

Even when some categories in your review prompt are well-defined, a high false positive rate in other categories erodes user trust across the board. If developers learn to ignore the tool's output because half the warnings are noise, they'll also miss the legitimate findings. The prompt must be tuned holistically — every category needs to earn its place by maintaining a high signal-to-noise ratio.

Key Concept

Specific, categorical review criteria consistently outperform confidence-based filtering. Define exactly what constitutes a finding — don't ask the model to self-assess how sure it is.

Exam Trap

Watch for answer choices that include phrases like "be thorough", "find all issues", or "only report high-confidence findings". These sound reasonable but are ineffective in practice. The correct answer will specify concrete, enumerated criteria.

Task 4.2

Apply Few-Shot Prompting for Consistency

Few-shot examples are the single most effective technique for getting consistent, predictable output from Claude in production systems. When the task involves ambiguity — edge cases in classification, nuanced formatting requirements, or domain-specific reasoning — a small set of well-chosen examples communicates expectations far more reliably than lengthy written instructions.

How Many Examples and What Should They Show?

Two to four targeted examples generally hit the sweet spot. Each example should demonstrate not just the desired output format but also the reasoning process that leads to that output. For a code review agent, this might mean showing an example where a suspicious pattern is flagged along with an explanation of why it constitutes a real issue — and a counterexample where a similar-looking pattern is correctly classified as benign.

Reducing Hallucination and Enabling Generalization

Few-shot examples anchor the model's behavior in concrete precedent rather than abstract instruction. This reduces the tendency to hallucinate findings or invent categories not specified in the prompt. Crucially, well-chosen examples also help the model generalize to novel patterns — when it sees how you've reasoned about edge cases A, B, and C, it can apply analogous reasoning to edge case D even though D wasn't explicitly covered.

Key Concept

Few-shot examples are the most effective technique for achieving consistent output. They demonstrate format, reasoning, and handling of ambiguity simultaneously — something that instructions alone cannot accomplish as reliably.

Task 4.3

Enforce Structured Output with Tool Use and JSON Schemas

When your system requires guaranteed schema-compliant output — not "usually valid JSON" but always valid JSON matching a specific schema — the most reliable approach is tool_use combined with a JSON schema definition. This leverages the API's built-in enforcement mechanism rather than relying on prompt instructions alone.

Tool Choice Modes

The tool_choice parameter controls how the model selects tools:

"auto" — the model decides whether to call a tool or respond with plain text. Useful when tool use is optional.
"any" — the model must call some tool but can choose which one from the available set.
Forced tool selection — you specify a particular tool by name, guaranteeing the model calls exactly that tool. This is the strongest guarantee for structured extraction.

Structure vs. Semantics: A Critical Distinction

Strict JSON schemas eliminate syntax errors — you'll never get malformed JSON, missing required fields, or wrong data types. But schemas do not prevent semantic errors. For example, an invoice extraction tool might output a perfectly schema-valid response where the individual line items don't sum to the stated total. The structure is correct; the content is wrong.

Schema Design Best Practices

Thoughtful schema design reduces the surface area for errors:

Required vs. optional fields: Mark fields as required only when the source document reliably contains that information. Making everything required forces the model to fabricate values when data is absent.
Enums with an "other" escape hatch: When defining categorical fields, include an "other" value paired with a freeform detail string. This prevents the model from force-fitting novel inputs into the wrong category.
Nullable fields: Explicitly allow null for fields that may not exist in every document. This gives the model a safe way to say "not found" rather than inventing a plausible value.

Key Concept

tool_use guarantees structural compliance only, not semantic correctness. Your schema will always be syntactically valid, but the values within may still be wrong. Semantic validation requires separate business-logic checks.

Exam Trap

Don't fall for answers that imply tool_use output is always correct because it matches the schema. Schema conformance eliminates syntax issues, not content errors. The exam will test whether you understand this distinction.

Practice Question

Your data pipeline needs to extract structured information from unstructured documents and guarantee the output matches a predefined schema. Which approach provides the strongest guarantee of schema-compliant output?

A Add a system prompt instruction: "Always respond in valid JSON matching this schema: {…}"
B Post-process the model's text response with a JSON parser and retry on parse failure
C Define a tool with a JSON schema and use forced tool_choice to ensure the model always calls that tool
D Provide 5+ few-shot examples of correctly formatted JSON output in the prompt

Using tool_use with a JSON schema and forced tool_choice is the only approach that provides a built-in API-level guarantee of schema compliance. Prompt instructions (A) and few-shot examples (D) improve consistency but can't guarantee compliance. Post-processing retries (B) add latency and still depend on the model eventually producing valid output.

Task 4.4

Implement Validation, Retry, and Feedback Loops

Even with structured output guarantees, the extracted data may contain errors that require correction. An effective retry strategy doesn't just say "try again" — it appends specific, actionable error details to the prompt so the model knows exactly what to fix.

Retry with Targeted Error Feedback

When validation fails, the retry prompt should include the exact nature of the error: which field failed, what rule it violated, and what the expected vs. actual values were. For example: "The line_items total ($2,340) does not match the stated invoice_total ($2,430). Please re-extract and verify the amounts." This gives the model a concrete signal it can act on.

In contrast, a generic retry like "There was a validation error. Please try again." provides zero diagnostic information. The model is essentially guessing what went wrong, which often produces the same mistake or introduces new ones.

When Retries Don't Help

Retries are ineffective when the required information simply isn't present in the source document. If a field is missing from the input, no amount of retrying will conjure a correct value — the model will either keep producing the same fabrication or switch to a different one. Your validation logic should distinguish between "the model made a correctable error" and "the source data doesn't contain this information."

Tracking Patterns and Self-Correction

For production pipelines processing many documents, consider adding detected_pattern fields to your output schema so the model can flag recurring issues — for example, a specific document type that consistently triggers a particular false positive. This metadata helps you refine the prompt iteratively.

A powerful self-correction technique is to have the model extract both a calculated_total (summed from individual items) and a stated_total (read directly from the document). If these diverge, the system automatically flags the discrepancy for review, catching errors the schema alone can't detect.

Key Concept

Specific error details in retry prompts guide the model toward correction. A generic "try again" message provides no useful signal and typically does not improve results.

Exam Trap

Watch for answer choices that describe retry strategies with generic error messages like "validation failed, please correct." The correct approach always includes specific field-level error details — which field, what was wrong, and what was expected.

Task 4.5

Design Efficient Batch Processing Strategies

The Message Batches API offers 50% cost savings compared to synchronous requests, but with an important tradeoff: requests are processed within a 24-hour window with no guaranteed latency SLA. This makes batch processing ideal for specific workload types and completely wrong for others.

When Batch Processing Is Appropriate

Overnight report generation: Summaries, analytics, and dashboards that need to be ready by morning but don't block any real-time process.
Weekly or nightly audits: Code quality reviews, compliance scans, or documentation checks that run on a schedule.
Nightly test generation: Creating test cases from production logs or specification documents where results are consumed the next day.

When Batch Processing Is NOT Appropriate

Any workflow that blocks a developer or process from proceeding should use the synchronous API. Pre-merge CI checks are the canonical example — a pull request cannot be merged until the review completes, so a potential 24-hour delay is unacceptable. The cost savings don't matter if they create a bottleneck in the development workflow.

Technical Constraints

Batch requests do not support multi-turn tool calling within a single request. Each batch item is a standalone request-response pair. Use custom_id fields to correlate requests with responses when processing results — this is how you match each output back to the input that generated it.

Key Concept

Use the Batch API for latency-tolerant workloads (overnight reports, nightly audits) and the synchronous API for blocking workflows (pre-merge checks, real-time user interactions). The decision hinges entirely on whether anything is waiting on the result.

Practice Question

Your engineering manager proposes using the Message Batches API for two workloads: (1) pre-merge code review checks that block PR merging, and (2) a nightly audit that scans the full codebase for style violations. Which workloads should use the Batch API?

A Both workloads — the 50% cost savings apply equally to both
B Neither — batch processing doesn't support code analysis tasks
C Nightly audit only — pre-merge checks are blocking and cannot tolerate up to 24 hours of latency
D Pre-merge checks only — they benefit more from cost savings due to higher volume

Pre-merge checks block the developer from merging their PR, making them latency-sensitive. The Batch API has no latency SLA and can take up to 24 hours. The nightly audit, however, is inherently latency-tolerant — it runs overnight and results are consumed the next day. Batch processing is the perfect fit for nightly audits (saving 50%) while pre-merge checks must use the synchronous API.

Task 4.6

Design Multi-Instance and Multi-Pass Review Architectures

When a single model instance reviews its own output, there's an inherent limitation: it retains the reasoning context from the generation phase. This means it's systematically less likely to question decisions it already justified to itself. This is the fundamental problem with same-session self-review.

Why Independent Review Instances Are More Effective

A separate model instance — running in a fresh session with no memory of the generation process — evaluates the output on its own merits. It doesn't know the reasoning that led to each decision, so it can assess the output more objectively. This is analogous to how code review works on engineering teams: the reviewer wasn't present during implementation and evaluates the code without the author's mental context.

Multi-Pass Architecture for Large Reviews

For complex tasks like reviewing a 14-file pull request, a single pass over all files produces uneven results — some files get detailed feedback while others receive shallow analysis. The solution is to decompose the review into focused passes:

Per-file local analysis: Each file is reviewed individually in a dedicated pass, ensuring consistent depth and attention across all files.
Cross-file integration pass: A separate pass examines how the files interact — checking data flow consistency, interface contracts, and architectural coherence that only become visible when considering multiple files together.

This decomposition ensures that neither local detail nor global coherence is sacrificed.

Key Concept

Use separate sessions for generation and review. A model reviewing its own output in the same session retains reasoning context that biases the review. Independent instances produce more objective evaluations.

Claude Certified Architect: Prompt Engineering &
Structured Output

Design Prompts with Explicit, Actionable Criteria

Concrete Categories Beat Vague Adjectives

False Positives Undermine Accurate Categories Too

Apply Few-Shot Prompting for Consistency

How Many Examples and What Should They Show?

Reducing Hallucination and Enabling Generalization

Enforce Structured Output with Tool Use and JSON Schemas

Tool Choice Modes

Structure vs. Semantics: A Critical Distinction

Schema Design Best Practices

Implement Validation, Retry, and Feedback Loops

Retry with Targeted Error Feedback

When Retries Don't Help

Tracking Patterns and Self-Correction

Design Efficient Batch Processing Strategies

When Batch Processing Is Appropriate

When Batch Processing Is NOT Appropriate

Technical Constraints

Design Multi-Instance and Multi-Pass Review Architectures

Why Independent Review Instances Are More Effective

Multi-Pass Architecture for Large Reviews

Continue Exploring

Eligibility

Exam Content

Cost & Registration

Sample Questions

Practice Bank

How to Prepare

Career Outlook

Claude Certified Architect: Prompt Engineering &Structured Output

Design Prompts with Explicit, Actionable Criteria

Concrete Categories Beat Vague Adjectives

False Positives Undermine Accurate Categories Too

Apply Few-Shot Prompting for Consistency

How Many Examples and What Should They Show?

Reducing Hallucination and Enabling Generalization

Enforce Structured Output with Tool Use and JSON Schemas

Tool Choice Modes

Structure vs. Semantics: A Critical Distinction

Schema Design Best Practices

Implement Validation, Retry, and Feedback Loops

Retry with Targeted Error Feedback

When Retries Don't Help

Tracking Patterns and Self-Correction

Design Efficient Batch Processing Strategies

When Batch Processing Is Appropriate

When Batch Processing Is NOT Appropriate

Technical Constraints

Design Multi-Instance and Multi-Pass Review Architectures

Why Independent Review Instances Are More Effective

Multi-Pass Architecture for Large Reviews

Continue Exploring

Eligibility

Exam Content

Cost & Registration

Sample Questions

Practice Bank

How to Prepare

Career Outlook

Claude Certified Architect: Prompt Engineering &
Structured Output