Simplify Building RAG Agents with OpenAI File Search and n8n

Simplifying RAG agents with the OpenAI Responses API in n8n means moving from a multi-node, infrastructure-heavy setup to a small number of OpenAI nodes with built-in File Search and Web Search tools.

Share by Nick Puru

Overview: From Old Stack to Native Tools

Traditionally, a RAG stack in n8n required several components: an embeddings node, a vector store (like Supabase), one or more chat models, and an agent node to orchestrate the tools. The attached workflow shows this “old way” using an Embeddings OpenAI node feeding a Supabase Vector Store node, which is then wired into an AI Agent node alongside a standard OpenAI Chat Model.

The same workflow also demonstrates an old approach to web search via a Perplexity Tool node plus another OpenAI Chat Model and agent, which adds more moving parts and more third-party dependencies. In contrast, the new approach uses a single OpenAI Chat Model node configured for the Responses API with built‑in tools for File Search and Web Search, then plugs that directly into an agent.

Download workflow n8n: https://romhub.io/n8n/OpenAI_Responses_API_Agent

The New Core: OpenAI Responses API in n8n

The heart of the updated architecture is the lmChatOpenAi node in n8n, version 1.3, configured to use the Responses API and builtInTools. In the attached workflow, there are three key OpenAI nodes:

“Web” node (Web Search only) using gpt-5-mini with webSearch enabled and searchContextSize: "medium" plus allowedDomains: "google.com".
“File Search” node (File Search only) using gpt-4.1-mini with fileSearch enabled and a vectorStoreIds array plus a simple filters JSON string.
“OpenAI Combined” node (Web + File Search together) using gpt-5-mini with both webSearch and fileSearch turned on and sharing a single configuration object.

Each of these language model nodes connects to an agent node (for example, “RAG”, “Research”, or “New”) which adds a system message such as “You are a helpful assistant” and exposes the model as an agent for your chat trigger or other workflow entry points.

Step-by-Step: Implementing the BMW Dealership Agent

1. Create and Populate the Vector Store

The first step still happens on the OpenAI platform, but it is much simpler than a custom ingestion pipeline. You go to the Storage → Vector Stores section of the OpenAI dashboard and create a new store.

Create a Vector Store (e.g., “BMW X7 Manual”) and upload the 444‑page owner’s manual PDF.
OpenAI automatically performs embeddings, chunking, and vector storage; there is no separate embeddings model or vector DB to manage.
Copy the generated Vector Store ID (for example, vs_6930761d7e20819199d554cb28876218) for use inside n8n.

At this point, the entire “database + embedding pipeline” for the manual is encapsulated in that single Vector Store ID.

2. Configure File Search in n8n

Inside n8n, File Search is wired through an OpenAI Chat Model node configured for Responses API tools. In the attached workflow, this is represented by the “File Search” and “OpenAI Combined” nodes.

Key parameters for a File Search–only node look like this (conceptually):

model: gpt-4.1-mini or gpt-5-mini.
builtInTools.fileSearch.vectorStoreIds: a JSON array string containing your Vector Store ID (e.g., ["vs_6930814fe41c8191b934ac05de8d97a1"]).
builtInTools.fileSearch.filters: a JSON string such as { "filters": [], "type": "and" } to define how filters are combined.

That node is then connected via the ai_languageModel output to an agent node, such as the “RAG” node in the workflow, which runs with a simple “You are a helpful assistant” system message. When a user asks, “How do I adjust the air suspension on an X7?”, the agent automatically uses File Search to query the BMW manual vector store and generate a grounded answer.

3. Configure Web Search in n8n

For real-time pricing and availability, you enable Web Search on a different OpenAI node or on the same combined node. The “Web” node in the workflow shows a dedicated Web Search configuration:

model: gpt-5-mini.
builtInTools.webSearch.searchContextSize: "medium" (or "low" / "high" depending on how deep you want the research to go).
builtInTools.webSearch.allowedDomains: "google.com" in the example, but in a BMW use case you could restrict this to bmwusa.com and trusted dealer domains.

This node is wired to a “Research” agent node, which exposes a web-augmented agent that can answer time-sensitive questions. For example, “What is the current price of a 2024 BMW X7 in California?” can be served by restricting the search to specific domains and regions through the Web Search configuration.

The Combined Agent: File + Web in One Node

The most powerful pattern in the workflow is the “OpenAI Combined” node feeding into the “New” agent. Here, the same lmChatOpenAi node enables both built-in tools:

builtInTools.webSearch.searchContextSize: "medium".
builtInTools.fileSearch.vectorStoreIds: ["vs_6930761d7e20819199d554cb28876218"] with filters set to an "and" filter group.

This configuration allows a single agent to decide dynamically when to query the internal BMW manual via File Search and when to query the web for real-time prices. When a user asks, “What is the current price of a 2024 BMW X7 and how does the parking assistant work?”, the agent can:

Use Web Search to pull up-to-date pricing from allowed domains.
Use File Search against the BMW manual Vector Store to retrieve and summarize the parking assistant instructions.

The result is a single coherent answer that merges authoritative internal documentation with fresh market data, all through one OpenAI node plus one agent node.

Old vs New: Architecture Comparison

The attached workflow explicitly contrasts the “old way” and the “new way” with sticky notes and separate node groups.

Dimension	Old RAG/Web Stack	New Responses API Stack
RAG implementation	Embeddings OpenAI → Supabase Vector Store → AI Agent	`lmChatOpenAi` with `builtInTools.fileSearch` → Agent
Web search	Perplexity Tool → OpenAI Chat Model → Agent	`lmChatOpenAi` with `builtInTools.webSearch` → Agent
Combined capabilities	Multiple nodes wired manually	Single `OpenAI Combined` node with both tools enabled
External dependencies	Supabase, Perplexity, separate embeddings pipeline	Only OpenAI API (vector store, embeddings, retrieval included)
Node count per agent	4–5 nodes plus external services	2–3 nodes inside n8n
Maintenance burden	High: custom plumbing, multiple APIs, schema coordination	Low: one platform handles storage, embeddings, search, and RAG

In practice, this means you can build a production-ready BMW dealership agent in minutes: upload documents to a Vector Store, configure one combined OpenAI node in n8n, connect it to an agent, and expose it via the chat trigger node. The heavy lifting—embeddings, vector storage, retrieval, chunking, and web search—is all managed for you, letting you focus on prompts, business logic, and user experience.