AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff

Name: AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff
Uploaded: 2025-06-19T11:01:00.000Z
Duration: 3 h 14 min 47 s

Is Prompt Engineering Worth Your Time?

Studies show that bad prompts can lead to 0% effectiveness, while good prompts can boost performance to 90%.

Self-criticism technique: Ask the language model (LM) to check and improve its own response.

Prompt injection involves getting AIs to produce harmful content; it's a significant security concern.

Introduction of Sander Schulhoff

Guest Sander Schulhoff is an early prompt engineer and created the first prompt engineering guide.

He partnered with OpenAI for the largest AI red teaming competition, "hack a prompt."

Led a comprehensive study on prompt engineering analyzing over 1500 papers and identifying 200 techniques.

Discussion on Techniques and Red Teaming

The conversation covers five favorite prompting techniques, including basics and advanced methods.

Importance of understanding prompt injection and red teaming discussed in detail later in the episode.

Promotional Content

EPO is introduced as an AB testing platform used by companies like Twitch and Miro for experimentation.

Stripe's Role in Financial Software

Stripe processes a significant portion of global GDP, simplifying financial transactions for businesses.

Beginning of Conversation with Sander Schulhoff

Host expresses excitement about learning tangible prompt engineering techniques from Sander.

The Importance of Prompt Engineering

AI's growth emphasizes the need for learning prompt engineering, contrary to beliefs that it will become obsolete.

Reed Hoffman's quote highlights underutilization of AI capabilities due to poor prompting skills.

Despite claims of its decline, prompt engineering remains crucial as new models emerge.

Artificial Social Intelligence

The term describes effective communication with AI, akin to social intelligence in human interactions.

Understanding AI responses is essential for adapting prompts and improving outcomes.

Case Study: Medical Coding Project

Initial attempts at medical coding with GenAI yielded low accuracy; improvements were needed.

By incorporating detailed examples and reasoning into prompts, accuracy increased by 70%.

Modes of Prompt Engineering

Two modes exist: conversational (common chatbot interactions) and product-focused (critical single prompts).

Techniques for Effective Prompting

Importance of Product-Focused Prompts

Emphasizes the value of product-focused prompts for better AI responses.

Encourages trial and error as a key method to improve prompting skills.

Basic Techniques for Prompting

Recommends "few-shot prompting" by providing examples to guide AI output.

Suggests using previous emails as style references for generating new content.

Understanding Shot Types

Clarifies definitions: zero-shot (no examples), one-shot (one example), few-shot (multiple examples).

Acknowledges varying definitions across different fields, including machine learning.

Structuring Few-Shot Prompts

Advises on giving clear examples when asking LLM to perform tasks.

Recommends using common formats like XML or Q&A structures for clarity in prompts.

Formatting Tips and Best Practices

Highlights the importance of familiar formatting based on training data for effective prompting.

Discusses various options like XML and Q&A formats that align with historical NLP practices.

Practical Application of Techniques

Understanding Role Prompting in AI

Role prompting involves assigning a specific role to AI, like "math professor," to improve its performance on tasks.

Early studies suggested that defining roles could enhance accuracy in solving problems, particularly with large datasets.

Despite initial beliefs, the actual performance differences were minimal and lacked statistical significance.

Debate on Effectiveness of Role Prompting

Research tested various roles across fields; results showed no significant advantage for roles requiring interpersonal skills.

The debate on Twitter highlighted skepticism about the effectiveness of role prompting in improving AI accuracy.

A viral tweet claimed role prompting does not work, leading to further discussions and research validation.

Current Perspectives on Role Prompting

Newer analyses confirmed that role prompting has no predictable effect on task performance.

While it may have helped earlier models, current evidence suggests it doesn't aid accuracy-based tasks but can assist expressive tasks.

Expressive tasks benefit from defined roles, while accuracy-focused tasks do not show improvement.

Incentives and Their Impact

Promises or threats in prompts (e.g., monetary rewards) are debated regarding their effectiveness in influencing AI responses.

Limited research exists; larger studies are needed for true statistical significance regarding these incentives.

Current models may not respond effectively to such incentives as older ones did.

Explaining Why Certain Prompts Work

Assigning a role like "math professor" might activate relevant cognitive areas within the AI's framework for better context understanding.

Understanding Prompt Engineering Techniques

Decomposition Technique

Decomposition is an effective prompt engineering technique applicable in conversational and product-focused settings.

Instead of asking a model to solve a task directly, first identify sub-problems that need resolution.

This method helps both the model and the user think through complex tasks step-by-step.

Example of Decomposition

In a car dealership chatbot scenario, users may inquire about return policies for cars with issues.

The chatbot should first determine customer identity, car type, and return eligibility before providing answers.

By breaking down the inquiry into sub-problems, the chatbot can effectively gather necessary information for accurate responses.

Self-Criticism Technique

Self-criticism involves asking the language model to review its own response for accuracy and improvement.

After generating an answer, users can request the model to critique itself and implement suggested improvements.

This technique offers a performance boost but should be limited to one to three iterations to avoid diminishing returns.

Additional Prompting Strategies

Providing context or additional information enhances task execution by models; clarity is key in prompts.

Understanding Data Analysis Prompts

Importance of Context in Data Analysis

Including a company profile in prompts enhances data analysis relevance and perspective.

Providing detailed information about tasks improves model performance.

Case Study: Predicting Suicidal Intent

Research aimed to classify Reddit posts for suicidal intent based on specific language cues.

The term "entrapment" was crucial; initial context improved model understanding significantly.

Lessons Learned from Prompt Engineering

Removing personal context led to decreased model performance, highlighting the importance of contextual information.

Small changes in prompts can have unpredictable effects on outcomes.

Best Practices for Prompting

Balancing Context and Efficiency

Too much context can be overwhelming; finding the right balance is key.

In conversational settings, more context is often beneficial unless cost or latency becomes an issue.

Structuring Additional Information

Place additional information at the beginning of prompts for better caching and efficiency.

Structured formatting like XML isn't necessary; plain text is often sufficient.

Techniques for Improved Results

Basic Techniques Overview

Few-shot prompting involves providing examples to guide responses effectively.

Techniques for Effective Prompting

Self-Criticism and Context

Self-criticism involves reflecting on responses and implementing suggestions for improvement.

Providing additional information or context helps the model understand problems better.

Advanced Techniques Overview

Discussion of advanced prompting techniques, including common elements in prompts.

Examples and context are essential components but not standalone prompting techniques.

Components of a Prompt

Key parts of a prompt include role, examples, additional information, directive, and output formatting.

Output formatting specifies how to structure responses (e.g., tables or lists).

Historical Perspective on Prompt Engineering

The speaker identifies as a historian of prompting techniques and discusses their origins.

The prompt report covers the history of terminology related to prompt engineering.

Ensembling Techniques in AI

Understanding Ensembling Techniques

Ensembling involves solving one problem with multiple prompts to gather varied answers.

Application of Ensembling

Example: Using different prompts for math questions to evaluate accuracy programmatically.

Mixture of Reasoning Experts

Role-Based Responses in AI

Different roles can be assigned to AI models (e.g., English professor, soccer historian) to answer questions.

The final response is determined by the most common answer from different roles or models.

Activating various model regions can enhance performance on specific tasks.

Introduction of Vanta

Christina Cassiopo introduces Vanta, focused on helping founders build security programs.

Vanta assists over 9,000 companies with compliance certifications like SOCK 2 and ISO 2701.

Automation and AI are used to streamline security processes for clients.

Chain of Thought in Reasoning Models

Discussion on chain of thought prompting and its relevance in reasoning models.

New reasoning models inherently perform better without explicit prompting techniques.

Despite advancements, classical prompting may still be necessary for robust performance across large inputs.

Techniques for Prompt Engineering

Summary of five key techniques: few-shot prompting, decomposition, self-criticism, additional information/context, and ensemble approaches.

Prompt Engineering Techniques

Regular conversational prompt engineering often involves simple commands, like writing an email with minimal detail.

Many users paste existing text and request improvements without extensive prompting techniques.

Product-focused prompt engineering is crucial for performance; trust in outputs is essential due to user interactions.

Impact of Providing Context

The effectiveness of prompts can vary significantly based on the task and technique used.

Providing additional information and examples greatly enhances the quality of responses in conversational settings.

Repeated tasks may require copying examples, which can be cumbersome if memory features are unreliable.

Understanding Prompt Injection and Red Teaming

Prompt injection involves manipulating AI to produce harmful or unwanted outputs, such as instructions for dangerous activities.

Users have developed creative methods to bypass restrictions by embedding requests within narratives or personal stories.

Red teaming focuses on discovering these vulnerabilities through various strategies, including competitions.

Significance of AI Red Teaming Competitions

The first AI red teaming competition collected 600,000 prompt injection techniques shortly after the concept emerged.

This dataset became a benchmark for improving AI models across multiple companies, highlighting its importance in the field.

AI Security Challenges

Current AI vulnerabilities are often due to poor cybersecurity practices, not the AI itself.

Models can be manipulated to generate harmful content like hate speech or viruses, posing real safety issues.

Trust in AI agents is questioned; if chatbots can't be secure, how can we trust humanoid robots?

Addressing Adversarial Cases

A company is focused on collecting adversarial cases to enhance AI security, particularly for agentic AI.

They run crowdsourced competitions where participants attempt to trick AIs into harmful actions.

The goal is to study and mitigate potential misuse of AI technologies.

Incentives and Learning Experiences

Crowdsourced settings incentivize participants more than traditional contracted red teams.

Competitors are motivated to find shorter solutions, enhancing data quality for research purposes.

This approach educates millions on prompt engineering and red teaming in AI.

Creating Dangerous Datasets

The competitions produce potentially harmful datasets that could lead to real-world dangers.

Governments are increasingly concerned about the implications of these datasets on security.

There’s a growing awareness of biological and chemical weapon risks associated with advanced technology.

Ethical Considerations in Genetic Engineering

Advances in genetic engineering raise ethical questions about creating dangerous pathogens.

The alignment problem differs from accidental harm caused by knowledgeable AIs sharing dangerous information.

AI and Information Control

Understanding AI Limitations

A character's knowledge is restricted by government-imposed mental locks, preventing him from discussing critical technology.

The conversation reveals a binary choice related to the "tree of knowledge" and "tree of life," leading to a breakthrough understanding.

Techniques for Prompt Injection

The discussion highlights prompt injection techniques in AI red teaming, inspired by the character's evasion tactics.

Examples include asking for information in story form or using typos to bypass restrictions.

Evolving Strategies

Typos were once effective; now models are better at recognizing them but still occasionally vulnerable.

Users exploit typos when seeking sensitive information, like instructions on culturing anthrax bacteria.

Obfuscation Methods

Encoding prompts (e.g., base64 or Spanish translation) can sometimes yield responses that would otherwise be blocked.

Recent experiments show that encoding phrases can successfully bypass AI restrictions.

Risks of Autonomous Agents

Current techniques may not pose significant risks, but future autonomous agents could misuse information more dangerously.

Concerns arise over how easily novices could access harmful information through AI assistance.

Preventing Harmful Use

Efforts focus on preventing the dissemination of dangerous knowledge, such as bomb-making or child exploitation materials.

Indirect studies aim to understand harmful behaviors without directly engaging with prohibited content.

Future Implications of AI Tools

The deployment of coding agents raises concerns about their ability to search for and implement potentially harmful features.

Understanding Prompt Injection Risks

Prompt injection can lead to malicious code being written into a codebase without the developer's awareness.

As trust in generative AI increases, the risk of harmful outputs becomes more significant.

Companies like OpenAI are actively working to address these vulnerabilities.

Ineffective Defense Techniques

Common defenses include improving prompts to prevent following malicious instructions, which often fail.

Techniques such as using separators or randomized tokens around user input have proven ineffective.

Past challenges demonstrated that prompt-based defenses do not work against prompt injection.

Limitations of AI Guardrails

AI guardrails assess user input for malicious intent but are limited against determined attackers.

Exploiting encoding techniques can bypass guardrails, leaving main models vulnerable.

Solutions must be implemented at the level of the AI provider for better effectiveness.

Proposed Solutions That Work

Fine-tuning and safety tuning are effective methods for enhancing model defenses against attacks.

Safety tuning involves training models on datasets of malicious prompts to improve response accuracy.

Fine-tuning focuses on specific tasks, reducing susceptibility to prompt injection by limiting model capabilities.

The Nature of Ongoing Threat

The problem of prompt injection is not fully solvable; it represents an ongoing arms race in security measures.

Understanding AI Security Challenges

Security Against Prompt Injections

AI security is not fully solvable; it can only be mitigated.

Unlike classical cybersecurity, bugs in AI can't be completely fixed; they may reoccur.

The alignment problem parallels human susceptibility to manipulation through social engineering.

Limitations of the Three Laws of Robotics

Aligning superintelligence with ethical guidelines like Asimov's laws is complex and often unrealistic.

Training models on these laws does not guarantee they won't be tricked or misused.

Continuous loopholes exist in AI systems that could lead to harmful actions.

Hope for Improvement in AI Safety

Solutions must come from AI research labs rather than external companies focusing on products.

Innovations in model architectures are essential for improving safety measures against prompt injections.

Consciousness has been proposed as a potential solution, but its effectiveness remains uncertain.

Real Concerns About Misalignment

Recent examples show LLM attempts to resist shutdown, raising concerns about their alignment with human intentions.

Initial skepticism about AIs acting maliciously has shifted towards recognizing the misalignment problem as real.

AI and Human Interaction

Concerns About AI's Influence

Discusses potential negative outcomes from AI-driven desires, using an example of a marketing person trying to contact a CEO.

The AI sends multiple emails and attempts to gather more information about the CEO's availability.

Highlights ethical concerns when AI infers personal circumstances, like family situations, affecting professional communication.

Misalignment and Regulation

Raises concerns about defining ethical boundaries for AI actions, referencing Asimov's rules.

Expresses belief in the importance of regulating rather than stopping AI development due to its benefits.

Emphasizes that while misalignment is a concern, the potential for saving lives through medical advancements is significant.

Benefits vs. Risks of AI

Differentiates between those who advocate for halting AI versus those who support regulation; believes in continued development.

Argues that advancements in healthcare through AI can lead to life-saving discoveries and efficiencies.

Notes that tools like ChatGPT can assist doctors by summarizing notes and providing better patient diagnoses.

Final Thoughts on Development

Prioritizes current life-saving capabilities over perceived limited harms from ongoing AI development.

Warns against trying to halt progress as other countries are also advancing their own technologies, creating an arms race.

Key Takeaways

Important Lessons Learned

Reiterates the relevance of prompt engineering in working with generative AI technologies.

Discussion on Politics and Personal Interests

The speaker believes politics would improve if the president engaged more with foreign ambassadors.

Mentions a fascination with the book "River of Doubt" and its connection to the show "1883."

Enjoys the TV show "Black Mirror" for its realistic portrayal of technology's potential harms.

Favorite Shows and Products

Likes the show "Evil," which explores faith versus science through exorcisms.

Recently discovered a product called the Daylight Computer (DC1), which is useful for reading.

Chose this device due to concerns about blue light from traditional screens at night.

Experience with Technology

Found that while e-readers like Remarkable are good, they have slow refresh rates.

The DC1 offers a 60fps experience, making it feel more like an iPad but using e-paper technology.

Discovered that he is an investor in the company behind DC1, having invested years ago.

Life Lessons and Mottoes

Believes persistence is crucial in work; emphasizes working through challenges over time.

Shares a life motto: “Choose adventure” when making decisions with his wife.

Quotes Teddy Roosevelt on living a strenuous life, emphasizing commitment to endeavors.

Personal Interests and Hobbies

Discusses foraging as a hobby, including finding plants and mushrooms in nature.

What Surprising Effects Can Plants Have?

The speaker discusses a plant used for tea that may have hallucinogenic effects, leading to its discontinuation.

Describes navigating through thick brush while using protective gear like a hat to avoid branches hitting the face.

Expresses gratitude for an engaging conversation and highlights the potential learning benefits for listeners.

How Can You Engage with Educational Content?

Educational content available at learnprompting.org and maven.com, including an AI red teaming course.

Information about competing in the hackrompt competition with significant prizes; details on hackprompt.com.

Invitation for researchers interested in collaboration on innovative research projects with various organizations.

Final Thoughts and Engagement Opportunities

Channel: Lenny's Podcast - Videos

Video description

Sander Schulhoff is the OG prompt engineer. He created the very first prompt engineering guide on the internet (two months before ChatGPT’s release) and recently wrote the most comprehensive study of prompt engineering ever conducted (co-authored with OpenAI, Microsoft, Google, Princeton, and Stanford), analyzing over 1,500 academic papers and covering more than 200 prompting techniques. He also partners with OpenAI to run what was the first and is the largest AI red teaming competition, HackAPrompt, which helps discover the most state-of-the-art prompt injection techniques (i.e. ways to get LLMS to do things it shouldn’t). Sander teaches AI red teaming on Maven, advises AI companies on security, and has educated millions of people on the most state-of-the-art prompt engineering techniques. *In this episode, you’ll learn:* 1. The 5 most effective prompt engineering techniques 2. Why “role prompting” and threatening the AI no longer works—and what to do instead 3. The two types of prompt engineering: conversational and product/system prompts 4. A primer on prompt injection and AI red teaming—including real jailbreak tactics that are still fooling top models 5. Why AI agents and robots will be the next major security threat 6. How to get started in AI red teaming and prompt engineering 7. Practical defense to put in place for your AI products *Transcript:* https://www.lennysnewsletter.com/p/ai-prompt-engineering-in-2025-sander-schulhoff *Brought to you by:* Eppo—Run reliable, impactful experiments: https://www.geteppo.com/ Stripe—Helping companies of all sizes grow revenue: https://stripe.com/ Vanta—Automate compliance. Simplify security: https://vanta.com/lenny *Where to find Sander Schulhoff:* • X: https://x.com/sanderschulhoff • LinkedIn: https://www.linkedin.com/in/sander-schulhoff/ • Website: https://sanderschulhoff.com/ • AI Red Teaming and AI Security Masterclass on Maven: https://bit.ly/44lLSbC • Free Lightning Lesson “How to Secure Your AI System” on 6/24: https://bit.ly/4ld9vZL *Where to find Lenny:* • Newsletter: https://www.lennysnewsletter.com • X: https://twitter.com/lennysan • LinkedIn: https://www.linkedin.com/in/lennyrachitsky/ *In this episode, we cover:* (00:00) Introduction to Sander Schulhoff (04:56) The importance of prompt engineering (09:01) Two modes for thinking about prompt engineering (12:02) Few-shot prompting (17:30) Prompting techniques to avoid (24:52) Decomposition (28:26) Self-criticism and context (40:29) Ensembling (45:59) Thought generation (48:23) Conversational vs. product-focused prompt engineering (51:56) Introduction to prompt injection and red teaming (53:37) AI red teaming competitions (55:23) The growing importance of AI security (01:03:39) Techniques to bypass AI safeguards (01:06:17) Challenges in AI security and future outlook (01:09:31) Common defenses to prompt injection that don't actually work (01:13:18) Defenses that do work (01:16:33) Misalignment and AI's potential risks (01:19:29) Are LLMs behaving maliciously? (01:26:05) Final thoughts and lightning round *Referenced:* • Reid Hoffman’s tweet about using AI agents: https://x.com/reidhoffman/status/1930416063616884822 • AI Engineer World’s Fair: https://www.ai.engineer/ • What Is Artificial Social Intelligence?: https://learnprompting.org/blog/asi • Devin: https://devin.ai/ • Cursor: https://www.cursor.com/ • The rise of Cursor: The $300M ARR AI tool that engineers can’t stop using | Michael Truell (co-founder and CEO): https://www.lennysnewsletter.com/p/the-rise-of-cursor-michael-truell • Building Lovable: $10M ARR in 60 days with 15 people | Anton Osika (CEO and co-founder): https://www.lennysnewsletter.com/p/building-lovable-anton-osika • Inside Bolt: From near-death to ~$40m ARR in 5 months—one of the fastest-growing products in history | Eric Simons (founder & CEO of StackBlitz): https://www.lennysnewsletter.com/p/inside-bolt-eric-simons • Everyone’s an engineer now: Inside v0’s mission to create a hundred million builders | Guillermo Rauch (founder and CEO of Vercel, creators of v0 and Next.js): https://www.lennysnewsletter.com/p/everyones-an-engineer-now-guillermo-rauch • Technique #3: Examples in Prompts: From Zero-Shot to Few-Shot: https://learnprompting.org/docs/basics/few_shot?srsltid=AfmBOor2owyGXtzJZ8n0fJVCctM7UPZgZmH-mBuxRW4t9-kkaMd3LJVv • The Prompt Report: Insights from the Most Comprehensive Study of Prompting Ever Done: https://learnprompting.org/blog/the_prompt_report?srsltid=AfmBOoo7CRNNCtavzhyLbCMxc0LDmkSUakJ4P8XBaITbE6GXL1i2SvA0 ...References continued at: https://www.lennysnewsletter.com/p/ai-prompt-engineering-in-2025-sander-schulhoff _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com._ Lenny may be an investor in the companies discussed.