How To Use the New n8n Guardrails Node (Full Setup & Demo)

How To Use the New n8n Guardrails Node (Full Setup & Demo)

Technical Overview of the NAN Guardrails Node

Introduction to the NAN Guardrails Node

  • The speaker introduces a technical overview of the NAN guardrails node, emphasizing that users must be on version 1.119 to access it.
  • Users can find the guardrails feature in the AI section of their side panel, which includes two nodes: "check text for violations" (requires an LLM) and "sanitize text" (does not require an LLM).

Key Questions Addressed

  • The video will explore several questions regarding usage, installation into workflows, compliance with HIPAA or GDPR, and how different LLM models affect output.

Understanding the Guardrails Node

Purpose and Functionality

  • The guardrails node is designed to enforce safety, security, and content policies on processed text within workflows.
  • It validates input before sending it to an AI model by placing it before an AI agent node to prevent manipulation from bad actors.

Importance of Input Validation

  • Just as humans can be manipulated through misinformation, AI models are also susceptible; thus, validating input is crucial for security.
  • There are risks associated with exposing sensitive information through AI agents; hence protecting against malicious prompts is essential.

Use Cases for Guardrail Nodes

Validating Input Data

  • Placing a guardrail node before an AI agent helps ensure that harmful or misleading inputs do not reach the model.

Checking Output Responses

  • After receiving a response from an AI agent, running it through a guardrail node checks for hallucinations and ensures topical alignment in responses.

Sanitizing Sensitive Information

  • A sanitize guardrail node removes private information such as addresses or bank details from workflows to comply with data protection regulations.

Understanding the Role of LLMs in AI Workflows

Differences Between Nodes with and without LLMs

  • The discussion begins by highlighting the importance of data processing in AI applications, focusing on nodes that utilize Large Language Models (LLMs) versus those that do not.
  • Reference is made to the NAN GitHub page, which contains source code relevant for self-hosting NAN and interacting with its deployment.
  • Acknowledgment of OpenAI's contribution to moderation functions within NAN, indicating a collaborative effort in developing these tools.

Features of Guardrails Node

  • The guardrails node includes functionalities aimed at protecting against various risks such as personal identifiable information (PII), moderation issues, and hallucinations.
  • It is suggested that NAN drew inspiration from OpenAI’s agent builder when integrating the guardrails node into their system.

Functionality Overview

  • Various checks are available within the actions section of the guardrails node, including:
  • Jailbreak detection for prompt injection attempts.
  • Keyword flagging for specific terms or phrases.
  • Not safe for work (NSFW) content filtering.
  • PII redaction capabilities for sensitive information like addresses and bank details.

Comparison of Violation Handling

  • The functionality using an LLM involves inputting a custom prompt to manage violations effectively. This contrasts with simpler methods lacking this complexity.
  • Both NAN and OpenAI utilize similar prompts for handling jailbreak scenarios, showcasing a shared approach to moderation.

Layered Prompt System

  • The process involves multiple layers of prompts; while one layer handles direct content checks, another oversees decision-making confidence levels regarding outputs.
  • Emphasis is placed on how this layered approach enhances reliability in determining whether to accept or reject certain outputs based on confidence metrics.

Conclusion on LLM Utilization

  • The use of LLM-based prompts is crucial for tasks like NSFW filtering and topical alignment, ensuring conversations remain appropriate and relevant.
  • Overall, understanding these mechanisms provides insight into how AI models can be safeguarded against misuse while maintaining functional integrity.

Understanding Guardrails and Sanitization in AI Workflows

Overview of Guardrails

  • The discussion begins with the concept of guardrails in AI workflows, emphasizing their role in maintaining alignment with predefined parameters such as jailbreak detection, not safe for work content, and topical alignment.
  • The violations node is introduced, which categorizes outputs into pass or fail based on the prompt's effectiveness and system confidence levels.

Error Handling Mechanisms

  • When using the violations node, it’s crucial to manage errors effectively. Users can configure settings to continue processing even if an error occurs within the chat model.
  • The distinction between pass (successful checks), fail (unsuccessful checks), and error (node failure) is clarified, highlighting how each impacts workflow management.

Sanitization Node Functionality

  • The sanitize node is designed to extract sensitive information from text inputs using regular expressions to identify personal identifiable information (PII), secret keys, URLs, etc.
  • Regular expressions are explained as a method for pattern recognition in text. They allow users to define specific characteristics that need identification within incoming data.

Practical Application of Regular Expressions

  • Examples illustrate how regular expressions function by recognizing patterns like credit card numbers or addresses without relying on AI models.
  • The rigid nature of regular expressions is noted; they require precise definitions to find patterns effectively but offer a safer alternative than inputting sensitive data into AI models.

Customization and Implementation

  • Users have the option to define custom regular expressions within the sanitize node settings for tailored sanitization processes.
  • A high-level overview emphasizes that while LLM (Large Language Models) handle conversational context and harmful content detection, traditional coding methods remain effective for straightforward pattern recognition tasks.

Testing Sanitization Processes

  • An example test case involves inputting an address into the sanitize node to observe how well it identifies and replaces PII using defined regular expressions.

Understanding the Sanitization Process in AI Workflows

Overview of Guardrails Input

  • The output from the sanitized step includes a guardrails input, which is "123 fake street." This indicates that a location was identified based on a regular expression.
  • The system logs the detected entity type as "location," confirming that it recognized "fake street" as a valid location.

Integration with AI Agent

  • After sanitization, the user message changes from "123 fake street" to "123 location," demonstrating how sensitive information is redacted.
  • It's essential to inform the AI agent about receiving sanitized text, replacing sensitive data with placeholders (e.g., email, phone, location).

Handling Sensitive Data

  • Depending on industry regulations, customer private information may not be stored. In some cases, data can be retrieved post-AI response without storing it.
  • If processing data isn't allowed at all, outputs should avoid using placeholders entirely and instead communicate that no sensitive information will be used.

Case Sensitivity in Address Recognition

  • A test was conducted by altering case sensitivity in an address ("fake street" to lowercase), revealing that regex patterns are case-sensitive.
  • The regex for identifying locations must match exactly; thus, variations in casing can lead to failures in detection.

Adapting Regex for Different Address Formats

  • When testing Polish addresses lacking English characteristics (like 'street' or 'avenue'), it became clear that existing regex patterns were inadequate.
  • To adapt to different address formats like Polish addresses, new regex strings need to be created. Assistance from tools like ChatGPT can help generate these expressions effectively.

Testing New Regex Patterns

  • A new regex string was developed for detecting Polish-style addresses. Initial tests showed success with redacting inputs correctly.
  • Despite initial failures with personal data checks using default regex settings, adjustments were made to improve recognition of various address types.

Understanding Custom Regex and Violations Nodes

Custom Regex Implementation

  • The custom regex has been successfully triggered, indicating it matched the defined pattern for Polish addresses, leading to redaction of sensitive information.
  • A JavaScript node can be utilized to run functions directly, simplifying the process by avoiding complex string manipulations in the sanitized node.
  • Users can install specific regular expressions tailored for Polish PII (Personally Identifiable Information), such as passports and bank details, enhancing customization for various use cases.

Handling API Keys and File Extensions

  • The system allows users to introduce different types of secret keys like API keys by referencing established regex patterns that cover common configurations.
  • Certain file extensions (e.g., Python, JavaScript, HTML) are ignored during processing to streamline what is flagged as sensitive or harmful content.

Not Safe For Work (NSFW) Content Detection

  • The violations node integrates a language model (LLM) to assess messages for NSFW content; initial tests show it correctly identifies harmful messages.
  • When executing checks on user inputs, it's crucial to define specific criteria for what constitutes a violation; this ensures accurate flagging of inappropriate content.

Prompt Injection Challenges

  • Users may attempt to bypass safety checks with reverse statements indicating they do not wish to engage in harmful activities; these need careful monitoring.
  • Variability in user input can lead to missed flags if the core prompt does not account for all potential scenarios; adjustments are necessary for effective detection.

Customizing Prompts for Better Accuracy

  • Users should customize prompts based on their unique requirements rather than relying solely on default settings from OpenAI's models.
  • Testing various cases reveals that most problematic inputs fall within a limited range; thus, refining prompts can significantly enhance detection capabilities.

Addressing Limitations and Hallucinations

  • Adjustments made to prompts can prevent users from exploiting system weaknesses; this highlights the importance of continuous refinement in AI systems.
  • Current limitations include challenges with non-English inputs and lowercase text handling; ongoing improvements are needed to broaden effectiveness across diverse languages.

Understanding Input Validation in AI Workflows

Exploring Topic Relevance

  • The speaker discusses the input message "I want to eat steak" and confirms it is relevant to the topic of eating steak, indicating no violations were triggered.
  • A new input, "I want to eat a carnivore diet," is tested. It also passes validation as it relates to eating steak, demonstrating the model's ability to connect related concepts.

Model Comparisons

  • The speaker notes differences in performance between models, specifically mentioning GPT 4.1 and its capabilities compared to older versions like 3.5 Turbo.
  • When using the 3.5 Turbo model, the connection between "eating steak" and "carnivore diet" fails, highlighting limitations in understanding context with this version.

Insights on Model Performance

  • Testing with different models reveals that even advanced models like GPT 4.1 can fail at recognizing connections that smaller or faster models might catch.
  • The importance of testing various inputs across multiple models is emphasized for ensuring accurate responses and avoiding misinterpretations.

Best Practices for Workflow Integration

  • Users are encouraged to conduct thorough testing of their systems by adjusting settings and retesting with different prompts before deployment.
  • The workflow involves nodes based on prompts where an AI model reviews inputs/outputs for safety and relevance, ensuring compliance with operational standards.

Compliance Considerations

  • While certain checks help sanitize data entering workflows, they do not guarantee HIPAA or GDPR compliance; broader considerations about data processing must be addressed.
  • Compliance involves evaluating how data is processed across systems rather than just within isolated instances; many questions need answering beyond simple input/output checks.

Conclusion on Model Variability

  • Different AI models yield varying results even for straightforward requests due to their unique architectures and training datasets; understanding these differences is crucial for effective application.
Video description

👉 Check out my Skool: https://www.skool.com/flow-state In this video I explain the technical details of how the new n8n Guardrail node works. We look at the RAW source code, and make observations about the LLM prompts && regex functions, that run the Violations and Sanitise functions. I show you how to edit the nodes to make them work for you, and how to integrate them into agent builds. We also compare how different models give different results. 🗂️ Sign up to Supabase: https://supabase.link/b1VZg8p 🚀 Sign up to Replit: https://replit.com/refer/bart31 👉 Sign up to n8n: https://n8n.partnerlinks.io/lyzq2o8cky0o 🛠️ Hire me: bart@supportlaunchpad.com