Advanced Prompt Engineering: Chain-of-Thought, ReAct & Tree-of-Thought
Master-level techniques — reasoning chains, self-reflection & structured generation
Basic prompt engineering — giving clear instructions, providing examples, specifying format — gets you 70% of the way to good LLM output. The remaining 30% requires techniques that fundamentally change how the model reasons about problems. These advanced techniques are not marginal improvements; they are the difference between an AI that can handle textbook questions and one that can tackle complex, multi-step reasoning.
This guide covers five research-backed techniques with practical examples you can use immediately.
What You Will Learn
- Chain-of-thought (CoT) prompting and when it matters
- Self-consistency: the voting method for reliable answers
- Tree-of-Thought (ToT): parallel exploration of reasoning paths
- ReAct: combining reasoning with tool use
- Meta-prompting: using the LLM to write better prompts
- How to combine techniques for maximum effectiveness
For the foundations, start with our prompt engineering course and chain-of-thought guide.
Technique 1: Chain-of-Thought (CoT) Prompting
The Research
In 2022, Wei et al. at Google published "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." Their finding: simply adding "Let's think step by step" to a prompt improved accuracy on the GSM8K math benchmark from 17.7% to 78.7% for the PaLM 540B model.
The key insight: when models write out intermediate reasoning steps, each step constrains the next, dramatically reducing errors.
Zero-Shot CoT (No Examples Needed)
The simplest form. Just append a reasoning trigger to your prompt.
Without CoT:
A shop in Connaught Place, Delhi offers 25% off on all items. An additional
10% loyalty discount applies after the first discount. If a jacket costs
Rs 4,000, what is the final price?
Models frequently answer Rs 2,600 (incorrectly treating 25%+10% as 35%).
With CoT:
A shop in Connaught Place, Delhi offers 25% off on all items. An additional
10% loyalty discount applies after the first discount. If a jacket costs
Rs 4,000, what is the final price?
Think step by step, showing each calculation.
The model then produces:
Step 1: Original price = Rs 4,000
Step 2: First discount (25%): Rs 4,000 × 0.25 = Rs 1,000
Step 3: Price after first discount: Rs 4,000 - Rs 1,000 = Rs 3,000
Step 4: Loyalty discount (10% on Rs 3,000): Rs 3,000 × 0.10 = Rs 300
Step 5: Final price: Rs 3,000 - Rs 300 = Rs 2,700
Correct answer: Rs 2,700.
Few-Shot CoT (With Reasoning Examples)
For harder problems, provide an example that demonstrates the reasoning pattern.
Solve the following income tax calculation.
Example:
Q: Ravi earns Rs 10,00,000 annually. He has Rs 1,50,000 in Section 80C
deductions and Rs 25,000 in Section 80D (health insurance). Calculate
his tax under old regime for AY 2026-27.
Step 1: Gross income = Rs 10,00,000
Step 2: Section 80C deduction = Rs 1,50,000
Step 3: Section 80D deduction = Rs 25,000
Step 4: Taxable income = 10,00,000 - 1,50,000 - 25,000 = Rs 8,25,000
Step 5: Tax calculation (old regime slabs):
- Up to Rs 2,50,000: Nil
- Rs 2,50,001 to Rs 5,00,000: 5% × Rs 2,50,000 = Rs 12,500
- Rs 5,00,001 to Rs 8,25,000: 20% × Rs 3,25,000 = Rs 65,000
Step 6: Total tax = Rs 12,500 + Rs 65,000 = Rs 77,500
Step 7: Add 4% health & education cess: Rs 77,500 × 1.04 = Rs 80,600
Answer: Tax payable = Rs 80,600
Now solve:
Q: Priya earns Rs 15,00,000 annually. She has Rs 1,50,000 in 80C,
Rs 50,000 in NPS (80CCD(1B)), and Rs 30,000 in 80D. Calculate her
tax under old regime for AY 2026-27.
The example teaches the model the exact calculation methodology, slab application order, and cess addition.
When CoT Helps Most
| Task Type | CoT Improvement | Example | |-----------|----------------|---------| | Math/Calculation | Very High (30-50%+) | Tax calculations, financial analysis | | Logical Reasoning | High (20-35%) | Legal analysis, compliance checking | | Multi-step Planning | High | Project planning, debugging | | Simple Factual Q&A | Minimal | "What is the capital of India?" | | Creative Writing | Minimal | Story writing, poetry |
Technique 2: Self-Consistency
The Research
Wang et al. (2022) introduced self-consistency: instead of generating one chain of thought, generate multiple independent reasoning paths and take the majority answer. This is like asking 5 experts to solve a problem independently and going with the consensus.
Implementation
You cannot do self-consistency in a single prompt. It requires multiple calls or using the API with n>1 parameter.
API approach (Python):
import os
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
google_api_key=os.getenv("GOOGLE_API_KEY"),
temperature=0.7, # Higher temperature for diverse reasoning paths
)
question = """
A company in Mumbai imports components worth USD 50,000 from a US supplier.
The current exchange rate is Rs 83.5/USD. Import duty is 10%, IGST is 18%
(applied on assessable value + duty). Calculate the total landed cost in INR.
Think step by step.
"""
# Generate 5 independent reasoning paths
answers = []
for i in range(5):
response = llm.invoke(question)
# Extract the final numerical answer from each response
answers.append(response.content)
print(f"Path {i+1} answer: {response.content[-100:]}")
# The majority answer is most likely correct
# In production, parse the final number from each and take the mode
Manual approach (for single conversations):
I need you to solve this problem 3 times, each time using a different
reasoning approach. Then compare your three answers and give me the one
you are most confident in.
Problem: [your problem here]
Approach 1: Work forward from the given data
Approach 2: Work backward from what the answer should look like
Approach 3: Estimate first, then calculate precisely
After all three, state which answer you are most confident in and why.
When to Use Self-Consistency
- Critical calculations where accuracy matters (financial, legal, medical)
- Problems where different valid approaches exist
- When a single CoT gives you an answer you are not sure about
- NOT for creative tasks (there is no "correct" answer to converge on)
Technique 3: Tree-of-Thought (ToT)
The Research
Yao et al. (2023) published "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." ToT extends CoT by exploring multiple reasoning branches at each step, evaluating which branches are promising, and pruning dead ends.
How ToT Works
Problem: [Complex task]
│
┌────┼────┐
▼ ▼ ▼
Path Path Path
A B C
│ │ │
Eval Eval Eval
│ ✗ │ ← Path B pruned (unpromising)
│ │
┌─┼─┐ ┌──┼──┐
▼ ▼ ▼ ▼
A1 A2 C1 C2
│ ✗ │ ✗
│ │
▼ ▼
Eval Eval
│ │
▼ ▼
A1→Final C1→Final
│ │
Compare → Best answer
ToT Prompt Template
I need to solve a complex problem. Use the Tree-of-Thought method:
Problem: Design an optimal delivery route for a food delivery startup
operating in South Delhi that needs to make 8 deliveries from a central
kitchen in Hauz Khas within a 45-minute window during evening peak hours.
STEP 1: GENERATE 3 INITIAL APPROACHES
For each approach, write 2-3 sentences explaining the strategy.
Approach A: [Geographic clustering]
Approach B: [Time-priority based]
Approach C: [Traffic-pattern based]
STEP 2: EVALUATE EACH APPROACH
Rate each approach on:
- Feasibility (1-10)
- Efficiency (1-10)
- Robustness to traffic (1-10)
Select the top 2 approaches to explore further.
STEP 3: DEVELOP THE TOP 2 APPROACHES
For each selected approach, work out the detailed solution.
STEP 4: COMPARE AND SELECT
Compare the two detailed solutions and select the best one.
Explain why it is superior.
STEP 5: REFINE
Take the selected solution and optimize it further.
Present the final answer.
ToT for Code Architecture Decisions
I need to choose a database strategy for a high-traffic Indian e-commerce
platform (10 lakh daily active users, primarily mobile, 60% from
tier-2/tier-3 cities with variable connectivity).
Use Tree-of-Thought reasoning:
Branch A: Single PostgreSQL with read replicas
Branch B: PostgreSQL primary + Redis cache layer
Branch C: MongoDB for product catalog + PostgreSQL for transactions
For each branch, evaluate:
1. Read performance under load
2. Write consistency requirements
3. Operational complexity for a 5-person DevOps team in India
4. Cost on AWS Mumbai region (monthly estimate in INR)
5. Handling of network latency for tier-2/3 users
Prune the weakest branch after evaluation.
Develop the remaining two in detail.
Select and justify the winner.
Technique 4: ReAct (Reason + Act)
The Research
Yao et al. (2022) introduced ReAct in "ReAct: Synergizing Reasoning and Acting in Language Models." The key innovation: interleaving reasoning traces with actions allows the model to update its plan based on real-world observations.
ReAct Pattern
Question: What percentage of India's total electricity generation came
from renewable sources in FY2025-26, and how does this compare to the
government's 2030 target?
Thought 1: I need current data on India's renewable energy generation.
Let me search for the latest statistics.
Action 1: Search("India renewable energy percentage total generation FY2025-26")
Observation 1: According to the Ministry of Power, renewable energy
(including large hydro) contributed approximately 43% of India's total
installed capacity and generated about 31% of total electricity in FY2025-26.
Thought 2: I have the current percentage. Now I need the 2030 target.
Action 2: Search("India renewable energy target 2030 percentage")
Observation 2: India's target under the updated NDC (2023) is to achieve
50% of cumulative electric power installed capacity from non-fossil fuel
sources by 2030. The generation target is approximately 40%.
Thought 3: I now have both numbers. Let me calculate the gap and present
the comparison.
Action 3: Calculate((40 - 31) / (2030 - 2026))
Observation 3: 2.25 percentage points per year needed.
Thought 4: I have all the information. Let me synthesize the answer.
Answer: India generated approximately 31% of its electricity from
renewable sources in FY2025-26. The government's 2030 target is ~40%
generation from renewables. This means India needs to increase renewable
generation by about 9 percentage points over 4 years (2.25pp/year),
which is ambitious but achievable given current growth trends.
Using ReAct in Practice
You do not need a framework to use ReAct. You can structure any prompt to follow this pattern:
Answer the following question using the ReAct framework.
For each step, write:
- Thought: Your reasoning about what to do next
- Action: The action you would take (search, calculate, look up)
- Observation: What you found (use your training knowledge)
Continue the Thought-Action-Observation loop until you have enough
information to answer confidently.
Question: [Your complex question here]
For building actual tools that agents can use with ReAct, see our MCP servers tutorial and AI agents guide.
Technique 5: Meta-Prompting
The Concept
Meta-prompting uses the LLM to improve its own prompts. Instead of manually iterating on a prompt, you ask the model to analyze what makes a prompt effective and generate an optimized version.
The Meta-Prompt Template
You are a prompt engineering expert. I have a prompt that produces
inconsistent results. Analyze it and create an improved version.
CURRENT PROMPT:
"""
Write a market analysis for [industry] in India.
"""
PROBLEMS I AM SEEING:
- Responses vary wildly in depth and format
- Sometimes misses key data points
- Does not always include Indian-specific context
YOUR TASK:
1. Identify 3 specific weaknesses in my current prompt
2. For each weakness, explain why it causes inconsistent results
3. Write an improved prompt that addresses all weaknesses
4. Explain what you changed and why
The improved prompt should produce consistent, comprehensive market
analysis every time it is used.
Automated Prompt Optimization
For production systems, you can automate prompt improvement:
def optimize_prompt(original_prompt: str, test_cases: list, llm) -> str:
"""Use the LLM to optimize a prompt based on test results."""
# Run test cases with the original prompt
results = []
for test in test_cases:
response = llm.invoke(original_prompt.format(**test["input"]))
results.append({
"input": test["input"],
"output": response.content,
"expected": test["expected"],
"quality": evaluate_quality(response.content, test["expected"]),
})
# Ask the LLM to analyze failures and improve the prompt
optimization_prompt = f"""
Analyze these prompt test results and create an improved prompt.
Original prompt: {original_prompt}
Test results:
{format_results(results)}
Failures typically involve: [describe patterns you see]
Create an improved version of the prompt that would handle
these failure cases correctly. Return ONLY the improved prompt.
"""
improved = llm.invoke(optimization_prompt)
return improved.content
Meta-Prompting for Specific Domains
You are a prompt engineering consultant specializing in Indian legal
technology. A lawtech startup wants to build an AI assistant that
analyzes contracts for Indian companies.
They need a prompt that:
1. Identifies risky clauses in vendor agreements
2. Checks compliance with Indian Contract Act 1872
3. Flags missing standard clauses
4. Provides risk ratings
Write the optimal prompt for this task. Include:
- Role definition
- Specific legal knowledge domains
- Output format specification
- Edge cases to handle (multi-jurisdictional contracts, government contracts)
- Guardrails (what the AI should NOT do)
Test your prompt mentally against these scenarios:
- A standard IT services agreement
- A cross-border SaaS subscription
- A government procurement contract under GeM
Combining Techniques: A Decision Framework
| Problem Type | Recommended Combination | |-------------|------------------------| | Complex math/logic | CoT + Self-Consistency | | Open-ended analysis | ToT + ReAct | | Production AI systems | Meta-prompting + CoT | | Research tasks | ReAct + CoT | | Architecture decisions | ToT alone | | Reliable classification | CoT + Self-Consistency | | Dynamic information needs | ReAct + tools |
Combined Example: Financial Analysis
Analyze whether a Rs 50 lakh investment in commercial property in Pune
is better than investing the same amount in an index fund (Nifty 50)
over a 10-year period.
Use the following approach:
STEP 1 (Tree-of-Thought): Generate 3 analysis frameworks
- Framework A: Pure financial return comparison
- Framework B: Risk-adjusted return comparison
- Framework C: Total wealth impact (including tax, liquidity, leverage)
STEP 2 (Evaluate): Rate each framework on comprehensiveness (1-10)
and select the best two.
STEP 3 (Chain-of-Thought): For each selected framework, calculate
step by step:
- Property: rental yield, appreciation, maintenance, loan interest,
tax implications (Section 24, capital gains)
- Index fund: historical Nifty returns, LTCG tax, SIP vs lumpsum,
dividend reinvestment
Show all calculations.
STEP 4 (Self-Consistency): Solve the calculation twice using different
reasonable assumptions for appreciation rate and rental yield. Compare
results.
STEP 5: Synthesize into a recommendation with clear conditions
("Property is better IF... Index fund is better IF...").
Use current Indian tax laws (AY 2026-27), realistic Pune property
market data, and historical Nifty 50 returns.
Practical Tips for Advanced Prompting
-
Start with CoT — It is the highest ROI technique. If CoT gives good results, you may not need anything more complex.
-
Add self-consistency for high-stakes decisions — When an error costs money or reputation, generate 3-5 reasoning paths and take the consensus.
-
Use ToT for strategic decisions — Architecture choices, business strategies, and design decisions benefit from explicit exploration of alternatives.
-
ReAct needs real tools — Simulated ReAct (asking the model to pretend to search) is less effective than actual tool use. See our AI agents tutorial for implementation.
-
Meta-prompting is for production — If you use a prompt more than 10 times, invest in meta-prompting to optimize it.
-
Temperature matters — Use low temperature (0.1-0.3) for CoT accuracy. Use higher temperature (0.7-0.9) for self-consistency diversity. Use medium temperature (0.4-0.6) for ToT exploration.
-
Model selection — These techniques have the largest impact on capable models. Claude Opus/Sonnet, GPT-4, and Gemini Pro show the biggest improvements. Smaller models benefit primarily from basic CoT.
Further Reading
Research Papers:
- Wei et al. (2022): "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" — the foundational CoT paper
- Wang et al. (2022): "Self-Consistency Improves Chain of Thought Reasoning in Language Models"
- Yao et al. (2023): "Tree of Thoughts: Deliberate Problem Solving with Large Language Models"
- Yao et al. (2022): "ReAct: Synergizing Reasoning and Acting in Language Models"
- Zhou et al. (2022): "Large Language Models Are Human-Level Prompt Engineers" — on automatic prompt optimization
Related Guides on PromptAndSkills:
- Chain-of-Thought Prompting Guide — focused CoT tutorial with examples
- System Prompts Guide — structuring persistent AI behavior
- AI Agents Tutorial — building agents that use ReAct in practice
- Best System Prompts for Indian Professionals — ready-to-use prompts
- Prompt Engineering Free Course — start from the fundamentals
Community Questions
0No questions yet. Be the first to ask!