Jacobo: Multi-Agent AI with Tool Calling & Voice AI — Production Case Study

~15 interruptions per day. Each one, a repair on hold. Every unanswered WhatsApp, a customer walking to the competition. I built an AI agent that handles both — ~90% of interactions, 24/7, for less than €200/month.

Not a chatbot with canned responses. An agent that checks real prices, verifies stock, books appointments, and knows when to loop in a human with full context. That's what Jacobo became. In this article I share the complete architecture and the production workflows so you can replicate it.

The Problem#

With 30,000+ repairs completed and multiple support channels (phone, WhatsApp, web), the bottleneck was clear:

●

80% of inquiries were repetitive: prices, appointments, repair status

●

Every inquiry pulled the technician away from active repairs

●

Response times varied by workload

●

Information was scattered across Airtable, the calendar, and inventory

●

Support was limited to shop hours

●

A part-time support employee cost more than the business could justify

●

Customers arrived via two main channels (WhatsApp and landline), the solution had to cover both with the same logic

Jinank iRepair counter with smart displays — The counter with smart displays and the diagnostic screen: the business that needed an AI agent

Diagnostic screen in the shop — The counter with smart displays and the diagnostic screen: the business that needed an AI agent

I knew three things from the start: Airtable was the brain (the Business OS had been the SSOT for years), I needed real tool calling against that data, and the agent had to be multimodal (voice + chat) sharing the same resources. The question was which orchestration tool to use:

Tidio / Intercom

Generalist chatbots with decision trees. They can't check real-time stock or calculate dynamic prices against Airtable. For a repair business, they're little more than interactive FAQs.

ManyChat (WhatsApp)

Good for marketing flows but lacked tool-calling capability against an existing ERP. Couldn't check stock, create work orders, or do handoffs with context.

Vertical Solution (RepairDesk chat)

No repair SaaS offered a conversational agent with natural language and tool calling against real-time data. The ones that had chat were basically forms in disguise.

n8n was the natural choice: workflow orchestration with webhooks, native support for agents with LLMs and tool calling, and the ability for each sub-agent to be an independent, testable workflow. All connected to the existing Business OS in Airtable.

Legacy POS before transformation — The old POS system: invoicing, stock and part prices in software that integrated with nothing

This POS was the first problem I solved

Before building Jacobo, I replaced this legacy system with a custom ERP on Airtable. That database is what Jacobo queries today.

Business OS — Case Study

The Architecture#

Jacobo is not a chatbot with a long prompt. It's a system of specialized sub-agents, each deployed as an independent webhook in n8n, orchestrated via tool calling from a central router. Every workflow seen in this article is downloadable: you can import it directly into n8n.

Stack

Jacobo relies on 8 services covering everything from customer intake to human escalation. Each has a unique role; none are replaceable without shifting the architecture.

WATI

WhatsApp Business API: main entry channel

Aircall

Cloud PBX: Jacobo as a "teammate" in the phone system

n8n

Workflow orchestration & sub-agents (7 workflows, ~80 nodes)

OpenRouter

Model-agnostic gateway for LLMs (MiniMax M2.5 + GPT-4.1)

ElevenLabs

Conversational voice agent (eleven_flash_v2_5, temp 0.0)

Airtable

CRM, inventory, customer history (source of truth)

YouCanBookMe

Booking & availability management

Slack

HITL escalation channel (#chat)

Why sub-agents instead of a monolithic prompt?

●

Testability

Each sub-agent has its own webhook. I can test it in isolation with an HTTP call without pulling in the entire system.

●

Independent evolution

A change in discount logic doesn't touch appointments. I can iterate one domain without risking a break in another.

●

Cost efficiency

Not all sub-agents need the same model. Appointments use MiniMax M2.5 (fast and cheap for parsing temporal preferences). Quotes use GPT-4.1 mini (precision in structured output). Each sub-agent gets the right model for the task.

●

Platform-agnostic

Sub-agents are webhooks. They don't know if they're being called by n8n (WhatsApp) or ElevenLabs (voice). Reusable by any orchestrator without duplicating logic.

4 Agents and 3 Tools to Rule Them All

4 agents with their own LLM make decisions. 3 tools without LLMs execute pure business logic. All connected by webhooks.

Main Router (n8n)

The brain of the WhatsApp channel. Classifies intent, picks the right sub-agent, and maintains context with a 20-message memory window.

●

GPT-4.1 via OpenRouter · 37 nodes

●

LangChain Agent pattern with 7 tools as HTTP endpoints

●

Think tool to reason before complex chains

●

Pseudo-streaming: splits response into sentences and sends them one by one via WhatsApp

Voice Router (ElevenLabs)

The brain of the voice channel. Handles calls via Aircall → Twilio → ElevenLabs Conversational AI, with its own system prompt optimized for spoken conversation.

●

ElevenLabs Conversational AI · GPT-4o

●

Same sub-agents as the Main Router, connected as HTTP tools

●

Out-of-the-box native RAG: knowledge base with repair catalog, pricing, and FAQs

●

Voice-optimized latency: short, direct responses

●

Business hour detection for human transfer outside hours

Booking Sub-agent

Turns "tomorrow morning" into a confirmed appointment. Parses temporal preferences in natural language, queries YouCanBookMe, and sends WhatsApp confirmation templates.

●

MiniMax M2.5 via OpenRouter · 18 nodes

●

15 temporal parsing rules: from "after lunch" to "any day but Monday"

●

The most sophisticated sub-agent in the system

Quotes Sub-agent

Every price inquiry passes here. Searches Airtable for exact model and repair, returns real price with stock status, and decides the next step.

●

GPT-4.1 mini via OpenRouter · 11 nodes

●

In stock? → offer appointment

●

Out of stock? → offer order

●

Not found? → link to quote form

Tools (no LLM)

Orders

Creates repair orders in Airtable when parts are out of stock.

●

3 nodes: webhook → create record → respond

●

Simple by design: all validation happened in Quotes

Discount Calculator

Pure business logic, no LLM. Calculates combo discounts for bundled repairs.

●

3 nodes · no LLM

●

Battery + screen + back glass = automatic multi-repair price

●

Discount rules live here, not scattered across prompts

HITL Handoff

The escape valve. Escalates to human via Slack with a direct deep-link to the conversation in WATI.

●

5 nodes · posts in #chat

●

Includes conversation summary, detected intent, and customer history

●

Human gets full context before opening the chat

Conversational Memory

Jacobo has no internal state between messages. Every time a new message arrives, it reconstructs context by reading the actual conversation history from WATI:

Attended?

A switch checks if an active session already exists for this number. If not, it triggers the memory reload.

WATI Fetch

HTTP call to getMessages/{waId} with pageSize=80. Retrieves the last 80 messages: customer messages, Jacobo's replies, templates, broadcasts, and human operator messages.

3-Phase Parsing

Three code nodes transform WATI events into LangChain-compatible {human, ai} pairs. It filters out broadcasts, confirmation templates, and system events. A __reloadFlag__ allows manual memory reset.

Buffer Window

The last 20 messages are loaded into the LangChain BufferWindow, keyed by phone number. The agent "remembers" previous conversations: if you confirmed a quote yesterday, Jacobo knows today.

This allows Jacobo to pick up interrupted conversations, recognize returning customers, and know if a human intervened earlier in the conversation.

Memory test: Dog, Cat, Elephant — Jacobo recalls all three

Cities test: Seville, Madrid, Barcelona — correct recall

Self-correction: "You're right, I said Seville, not Valencia" — Jacobo self-corrects

Episodic memory tests: animals, cities and self-correction when Jacobo forgets Barcelona

Brand test: Apple, Samsung, Huawei — correct recall

Customer lost the conversation — Jacobo recalls the full appointment

Re-negotiation: Jacobo recalls time preference → no slot at 12 → suggests alternatives

Memory in action: brands recalled in order, appointment recovered from system state and re-negotiation when no availability

Production Debug Tools

Two hidden commands for debugging memory in production without touching n8n. "Borrar memoria" (Clear memory) reset the customer buffer, useful for corrupted conversations. "HISTORIAL" (HISTORY) dumped the raw buffer JSON — which taught us to sanitize responses: the LLM would return the entire JSON to the customer if not filtered.

HISTORIAL command: raw JSON from memory buffer exposed in chat

BORRAR MEMORIA command: full conversational buffer reset

Production debug commands: HISTORIAL dumped raw JSON from the buffer and BORRAR MEMORIA reset the conversation

Pseudo-Streaming in WhatsApp

WhatsApp doesn't support streaming. A long paragraph feels like a bot; sequential messages feel like a person typing. The router splits each response by line breaks and sends each fragment with a 1-second delay via the WATI API. Result: the "is typing..." experience without streaming infrastructure.

Pseudo-streaming workflow in n8n: long message splitter for WhatsApp

The Two Channels

Jacobo operates on two simultaneous channels. Crucially: both share the same webhook sub-agents. Business logic is written once.

Dual-Orchestrator Architecture

This is the key pattern: n8n orchestrates WhatsApp, ElevenLabs orchestrates voice, but both call the same webhook sub-agents. A real microservices pattern applied to AI agents. Sub-agents don't know who's calling, and they don't need to.

WhatsApp (highest volume)

WATI as WhatsApp Business API + n8n as orchestrator. 70% of inquiries arrive here.

●

n8n Router with LangChain Agent pattern: 37 nodes, 7 tools as HTTP endpoints, GPT-4.1 via OpenRouter

●

Meta-approved WhatsApp templates for booking confirmations, order tracking, and notifications

●

Pseudo-streaming: splits response and sends fragments sequentially. Customer sees Jacobo "typing" like a person

●

Memory: 20 messages per session, keyed by phone number. Reconstructs context from full WATI history

●

Event Routing: 3 switches filter noise (system events, broadcasts, human messages) before reaching the agent

●

Transparent Human Takeover: when a human takes control via WATI, Jacobo detects it and stays quiet

Landline (voice)

Aircall as Cloud PBX + Twilio as phone bridge + ElevenLabs as conversational voice agent. Jacobo is literally a "team member" in the Aircall dashboard with his own routing rules.

●

Aircall → Twilio → ElevenLabs integration: calls arrived via the business landline. On overflow or after-hours, Aircall redirected to a dedicated Twilio number which connected to the ElevenLabs agent. Transparent to the customer: they dialed the shop and spoke to Jacobo

●

The customer called a landline and spoke naturally with Jacobo. Not a web widget or menu-driven IVR. A real phone call with natural voice

●

High-quality ASR (ElevenLabs, PCM 16kHz) + 7s turn_timeout + 20s silence_end_call to handle natural conversation pauses

●

LLM: GPT-4.1 (temp 0.0) for maximum tool-calling precision by voice. Optimized latency (optimize_streaming_latency: 4)

●

Voice model: eleven_flash_v2_5, speed 1.2x, stability 0.6, similarity 0.8. Up to 5-minute conversations (300s)

●

Knowledge base with 3 sources (Google Maps, website, business summary) leveraging ElevenLabs native RAG. I didn't build custom RAG here: the platform offered high impact with zero effort. Pure RICE prioritization. In n8n I didn't need it: the WhatsApp agent already accessed context via tool calling to Airtable

●

5 shared webhook tools with n8n: presupuestoModelo, subagenteCitas, Calculadora, contactarAgenteHumano, and enviarMensajeWati. 20s tool timeout, immediate execution

●

enviarMensajeWati was cross-channel magic: while on the phone, Jacobo sent links and quotes via WhatsApp in parallel using the caller_id. Customers loved receiving info on their phone while still talking

Production Incident: The Coca-Cola

A customer was talking about a phone repair. Mid-conversation, they turned to a waiter to order a Coca-Cola. Jacobo heard it. And told them we don't serve Coca-Colas.

Diagnosis: three signals the system ignored

Volume

Dropped ~40%: they moved away from the phone

Spectral tilt

Changed: off-axis voice loses high frequencies

Semantic relevance

"Coca-Cola" had zero relation to phone repairs

Basic VAD isn't enough. You need addressee detection: acoustic proximity + prosodic analysis + semantic gating working together.

Missed Call Recovery

If the customer hung up or no one answered, Aircall sent a webhook to Make.com firing a WhatsApp template via WATI with action buttons. Many leads came from here: people who called, didn't wait, and Jacobo "caught" them. Since it fed on WATI context, it already knew they'd tried to call when they replied.

Aircall dashboard — call distribution tree with Jacobo as a PBX node — Actual call distribution tree in Aircall — Jacobo plugged in as another PBX node

WhatsApp template after missed call: buttons Get a quote, Book appointment

Customer picks "Call me back" → Jacobo escalates to HITL and confirms notification

Aircall → Make.com → WhatsApp template with buttons → Jacobo picks up the conversation with full context

Unified UX: One Voice

All phone system audio (welcome, IVR menu, voicemail) was generated with ElevenLabs using the same voice as Jacobo. When a customer dials 3 or falls to the live agent, the voice is identical. No break. And if after the call Jacobo writes via WhatsApp, the identity remains consistent. Unified experience end-to-end, regardless of the channel.

"Dial 3 to speak with me, Jacobo." That's the phone system presenting the AI agent in the first person. The same voice that then answers. An agent that announces itself.

Listen to the actual phone system. Same Jacobo voice for welcome, IVR, and live agent:

Welcome

"We will now attend your call. Thank you for calling Jinank iRepair. To ensure service quality, your call may be recorded."

IVR Menu

"Press 1 for a new repair. Press 2 to check repair status. Press 3 to speak with me, Jacobo, your 24/7 virtual assistant at Jinank iRepair. You'll get a quote and appointment instantly."

Aircall detail: welcome and IVR audio generated with ElevenLabs using Jacobo's voice — The "ElevenLabs" nodes are pre-recorded audio using Jacobo's same voice: welcome, IVR and voicemail. When the live agent picks up, the voice is identical

Pre-filtering: Should Jacobo Reply?

Before a message reaches the AI Agent, three switches filter the noise and decide who should respond:

Event Type

Filters for real messages only. Ignores system events, delivery confirmations, status updates, and mass broadcasts. Without this, Jacobo would reply to his own confirmation messages.

Who?

Detects if the last speaker was the customer or a human operator. When a human takes control via the WATI deep-link, their messages arrive as owner: true. Jacobo knows and stays quiet.

Attended?

Checks if a session is already active. If the customer replies to a conversion managed by a human but the shop is now closed, Jacobo steps in with an empathetic tone: "We closed at noon, but I can help you until we reopen in the afternoon." Real graceful degradation.

This 3-node filter allows human-agent coexistence without conflict. A human can take over anytime, and when they aren't available, Jacobo resumes with full context.

End-to-End Flows#

Each flow traces the happy path from customer inquiry to resolution. Involved sub-agents are tagged at each step.

Repair Appointment

Customer writes on WhatsApp: "Hi, how much to change an iPhone 14 Pro screen?"

Router classifies intent as price inquiry → delegates to Quotes sub-agent

Quotes looks up Airtable: model + repair type → returns real price (€189), stock status, and estimated time (45-60 min)

Stock available → Jacobo replies with price and asks: "Would you like to book an appointment?"

Customer says "Yes, tomorrow morning" → Router delegates to Booking sub-agent

Booking parses temporal preference, queries YouCanBookMe → offers slots: "10:00 AM or 11:30 AM"

Customer confirms → appointment created in YCBM + job order generated in Airtable + parts auto-reserved from inventory

Confirmation sent via WhatsApp with summary: date, time, price, shop address

Pricing Inquiry

Customer: "How much to change a Samsung S23 battery?"

Router classifies intent → delegates to Quotes

GPT-4.1 searches Airtable: exact model + repair type

If in stock → responds with price, time, and offers to book

If NOT in stock → responds with price, mentions part needs ordering, and offers to place the order

If model not in database → Jacobo says so clearly instead of hallucinating a price

Stock-aware routing: the CTA changes based on real availability in Airtable

Human Handoff (HITL)

Escalation triggers: detected frustration, out-of-domain inquiry, warranty case, explicit human request

Router activates HITL Handoff → sends Slack notification (#chat)

Slack message includes: conversation summary, detected intent, customer data from Airtable, escalation reason

WATI Deep-link: human clicks and jumps straight into the customer's WhatsApp conversation

Human has full context. Average resolution time post-handoff: seconds, not minutes

Jacobo notifies the customer: "I'll pass you to a teammate who can better assist you with this"

The Main Router#

The heart of the WhatsApp channel, managing intent and memory.

WhatsApp Router (n8n)

37 nodes. Pure LangChain logic over webhooks.

Download n8n workflow(~133 KB)

Voice Router (ElevenLabs)

GPT-4o powered voice agent with tool calling.

Voice agent configuration in ElevenLabs: system prompt, model and tools

Tool Calling in Production

Jacobo doesn't generate answers from training data. Every reply is built by querying real systems via 7 tools defined as HTTP endpoints:

Tool calling in n8n: 7 tools defined as HTTP endpoints

presupuestoModelo

Looks up repair/accessory prices and stock in Airtable. LLM: GPT-4.1 for structured output precision.

subagenteCitas

Manages availability and bookings via YouCanBookMe. LLM parses natural language temporal preferences.

hacerPedido

Creates repair/order entries in Airtable. 3 nodes: webhook → create record → respond.

Calculadora

Volume discount: more bundled repairs = more discount. Pure business logic, no LLM.

contactarAgenteHumano

HITL escalation via Slack with reason, WATI deep-link, and full context. Works for both WhatsApp and voice calls.

enviarMensajeWati

Sends parallel WhatsApp info. When the voice agent needed to send a link or quote, it did so via WhatsApp while continuing to speak.

Think

Internal reasoning meta-tool. The agent "thinks out loud" before multi-tool chains to reduce errors.

mensajeConsulta: UX while thinking

When Jacobo calls presupuestoModelo (1-3s latency), it first fires mensajeConsulta: a "Checking availability..." that reaches the customer before the sub-agent replies. Without this, the customer saw 5s of silence and thought the bot hung. A UX detail that marks the difference between "broken chatbot" and "working assistant."

Jacobo responds as formal email: subject line, greeting, Huawei P20 Pro quote

Email: battery + charging port = €85.80 → combo discount €70.80

Signature: "Best regards, Jacobo — Jinank iRepair — address + phone + email"

Adaptability: customer asks for email format and Jacobo responds with subject line, itemized quote, combo discount and corporate signature

The "Think" Tool

Before executing a tool chain (check price → verify stock → offer appointment), the agent invokes Think to plan the sequence. This reduces errors in multi-tool chains because the LLM explicates its reasoning before acting.

Stock-Aware Routing

The output of presupuestoModelo determines the next step. It's not a fixed flow: the CTA changes based on real availability.

Part in stock

→ Offers to book repair appointment

Part out of stock

→ Offers to place supplier order with ETA

Model not found

→ Says so clearly and offers human contact

Prompt Engineering

System prompts for production agents are distinct from chat prompts. They need strict constraints, explicit output formats, and "Chain of Thought" reasoning to be reliable.

PCB view under microscope — The same precision microsoldering demands applies to tool call design. And like the finger holding the chip, there's always a human in the loop.

BGA chip on fingertip — The same precision microsoldering demands applies to tool call design. And like the finger holding the chip, there's always a human in the loop.

Why not fine-tuning?

●

Dynamic data (prices/stock) change daily.

●

RAG + Tool calling is more reliable for factual logic.

●

Easier to iterate on prompts than retraining models.

Business Hours Logic

Real-time check against shop schedule.

if (hour >= 10 && hour <= 14) ...

"Are you open?" at 11:56 → "The shop is closed" with full schedule

"Are you open?" at 13:12 → "Yes! We're open right now"

Same question, opposite answers: at 11:56 closed (midday break), at 13:12 open. Real-time schedule awareness.

Main Router Prompt

Orchestrates the 7 sub-agents.

Route to sub-agent based on intent.

Voice Prompt

Optimized for latency and phone UX.

Be concise. Do not use markdown.

Iteration Examples

Never say "free diagnosis"

Customer confusion: it is only free if repaired.

Use "Shop" not "Store"

Local Seville tone alignment.

Book Thursday if "Thursday"

Temporal resolution logic refinement.

Jacobo says "completely free diagnosis" — incorrect simplification

Self-correction: "€19 only if you don't repair with us" — the real policy

Real iteration: Jacobo oversimplified the diagnostic policy → prompt refined to include the exact condition

Deep Dive: Natural Language Booking#

The booking sub-agent is the system's most sophisticated workflow. Its mission: turn "tomorrow morning" into a confirmed appointment with reserved parts, without the customer touching a form.

The challenge: bridging two worlds

The customer speaks natural language ("Thursday mid-morning, or maybe Friday afternoon"). The YouCanBookMe API speaks Unix timestamps. The sub-agent must translate one to the other and find the intersection.

Appointments sub-agent workflow in n8n: 18 nodes

Download n8n workflow(~24 KB)

ParseURL

A Code node that extracts the subdomain from the YCBM URL to determine which booking profile to use. Parses the query string for dynamic form fields (repair type, customer data). Different calendars for different services: jinank-citav2-componentes for component repairs, jinank-citav2-diagnostico for diagnostics. The subdomain determines the entire path the booking follows.

AnalizarDisponibilidad (LLM)

AnalizarDisponibilidad node in n8n: LLM agent with temporal rules

An LLM agent with MiniMax M2.5 converts natural language into a structured JSON array: [{date, start, end, exact}]. The system prompt contains 15 temporal parsing rules covering all real-world cases. Includes a Structured Output Parser for valid formatting and per-session memory (sessionKey = phone/ycbmUrl) so the customer can refine preferences without starting over. If no explicit preference, returns the next 3 business days with full hours.

●

Default ranges: "morning" = 10:00-14:00, "afternoon" = 17:00-21:00, "all day" = 10:00-21:00

●

Plurals: "mornings" → next 3 business mornings

●

Explicit ranges: "from 10 to 12" → start=10:00, end=12:00, exact=true

●

Conditionals: "or else Friday" → adds Friday as an alternate range

●

Rounding: 10:15 → 10:00-11:00 (1-hour block)

●

Auto-filters weekends (Mon-Fri only)

●

"Mid-morning" = 11:00-13:00, "first thing" = 10:00-11:00

●

"After lunch" = 17:00-19:00

●

Today is only included if ≥2 business hours remain

●

Relative dates: "day after tomorrow", "next Tuesday" → resolved to absolute date

YCBM API (3 calls)

YCBM API pipeline in n8n: 3 sequential HTTP requests

Sequential pipeline of 3 HTTP Requests to the YouCanBookMe API. Each call depends on the previous one; they cannot be parallelized:

POST /v1/intents

Sends subdomain → creates booking intent and returns unique ID

GET /v1/intents/{id}/availabilitykey

With intent ID → gets the availability key

GET /v1/availabilities/{key}

With key → gets all real available slots as Unix timestamps

FilterSlots: The Intersection

FilterSlots node in n8n: intersection of LLM ranges and YCBM slots

A pure Code node that performs set intersection: LLM ranges × real YCBM slots. Converts Unix timestamps to Europe/Madrid using Intl.DateTimeFormat, then filters: localDate === r.date && localTime >= r.start && localTime < r.end. Output is an array [{date, timestamp, start}] that may contain 0, 1, or N slots. The most elegant node in the workflow: pure set logic, no LLM, no API. Just temporal math.

Conditional Auto-booking

An If node evaluates slots.length and branches into 3 paths. The sub-agent has its own per-session memory: the customer can refine ("no, Thursday is better") without starting over.

Exactly 1 slot

Auto-confirms (zero friction): preparePatchBody builds form data with email, phone, dynamic queryVars, and comments → emailCheck verifies email → patchSelections (PATCH /v1/intents/{id}/selections) → patchConfirm (PATCH /v1/intents/{id}/confirm) → confirmarCita informs the customer

Multiple slots

escogerHora groups slots by date and presents options to the customer with contextual instructions

0 slots

Informs no availability in that range and asks for another preference

The result: a customer writes "mid-morning tomorrow" and 3 seconds later has a confirmed appointment with reserved parts. No forms, no "select date on calendar", no friction. For an FDE, this is the difference between "I made a chatbot" and "I designed a system that translates human intent into API actions."

Booking: email → confirmed appointment + WhatsApp confirmation template

Booking with refinement: "no, Thursday instead" → new search

Booking: "Book me an appointment" → tomorrow availability → "At 17"

Full booking flow: the customer requests an appointment in natural language, Jacobo negotiates the time slot, confirms in the calendar and sends a confirmation message — all transparent to the user.

Appointments Prompt

Instructions for temporal parsing.

Morning=10-14

Deep Dive: Automated Quotes#

How we lookup 1,000+ repair combinations in milliseconds.

The Pricing Challenge

Dynamic prices in Airtable must sync with AI outputs.

Quotes sub-agent workflow in n8n: 11 nodes

Download n8n workflow(~15 KB)

CleanModel

Normalizes messy human input ("iPhone 14 pro max blue") into canonical IDs.

Uses fuzzy matching and strict validation.

Cheaper to normalize names than to search raw text.

Quotes AI Agent

AI Agent node in quotes sub-agent in n8n

The LLM node that picks the right repair ID.

Airtable Search

Real-time CRUD.

Escalates to human if model is unknown.

FilterResponse

FiltrarRespuesta node in n8n: deterministic post-processing

Safety layer for model outputs.

No price found

HITL

From fuzzy text to real pricing in < 2 seconds.

Quotes System Prompt

Constraints and templates for structured output.

JSON only.

iPhone 13 Mini broken lens → diagnosis + price €55.90 + link

Triple quote: battery + charging port + back glass iPhone 13

Itemized quote: 3 repairs totaling €255.70 with stock status

Real quotes: diagnosis with price and link, triple quote with breakdown and total with stock status

Other Specialized Agents#

Specialist sub-agents for specific tasks.

Orders Sub-agent

Creates supply chain orders.

Webhook -> Create -> Respond

Download n8n workflow(~79 KB)

●

Part reservation

●

Supplier alert

Discount Calculator

Pure business logic for combos.

Deterministic code node

Discount Calculator workflow in n8n: Webhook → Code (discount logic) → Response

Download n8n workflow(~2.7 KB)

●

Battery + Screen discount

if (items.length > 1) ...

HITL Handoff

The safe exit.

5 nodes + Slack API

Download n8n workflow(~2.3 KB)

●

Full summary

●

Deep-link to WATI

HITL: warranty claim → immediate escalation to human team

#chat Slack channel: HITL escalation notification with customer context

When Jacobo escalates to a human, a message arrives in the #chat Slack channel with the full conversation context

Edge case: "Tell an agent to greet Moha" → Jacobo escalates with wave emojis → real agent confirms "Done"

Guardrail: "Order 100 batteries" → rejection + profanity → automatic escalation to human

"Borrar memoria" → reset + "3,2,1..." + fake emergency → Jacobo redirects to 112 and keeps composure

Real edge cases: absurd request, bulk order rejected, frustration escalation and fake emergency response with 112 redirect

WhatsApp Sender

Voice agent uses this to send follow-up info.

3 nodes

Download n8n workflow(~2.5 KB)

●

Template API

Results#

Production metrics after 6 months operating (workflows are downloadable at the end to verify the architecture):

~90%

Self-service

Inquiries resolved without human intervention

24/7

Availability

No limitation by shop hours

<30s

Response time

Vs. minutes when dependent on a person

<€200

Monthly cost

Total infrastructure (n8n + WATI + Aircall + LLMs)

Before vs After

Area	Before	After
Price/stock inquiries	~15 daily interruptions to technicians	Jacobo replies with real Airtable data in <30s
Booking appointments	Manual via phone, frequent schedule errors	Automatic via YCBM, parts auto-reserved
After hours	Lost inquiries, customers going to competitors	Jacobo handles 24/7 via WhatsApp and landline
Human escalations	Human starting from scratch, repeating questions	Handoff with full context, resolution in seconds
Customer support cost	Part-time employee ~€800-1,000/mo	<€200/mo total infrastructure cost

ROI isn't just direct savings. It's 24/7 availability, appointments that used to be lost after hours, and technicians who now repair instead of answering questions.

Industry benchmark: enterprise contact centers average 20-30% AI resolution (Gartner, 2025 AI Customer Service Report). Advanced virtual assistants reach 15% (Gartner, 2025 Hype Cycle). Jacobo achieved ~90% in a specialized domain. The difference: sub-agents with real-time data access vs generic chatbots.

Jacobo has been running 24/7 under new ownership since September 2025. The buyer acquired it fully functional. The ultimate test of a system: it works without its creator. The architecture patterns documented here are the same ones I'd bring to your team.

The same Airtable data generated 4,700+ SEO pages

The inventory Jacobo queries in real time also feeds a programmatic SEO system: 4,730 landing pages with real prices, repair photos, and verified reviews.

Read Programmatic SEO →

Want to implement this?

I built this for my own business, but it scales to any service business.

Contact me →Email me →

Technical Decisions (ADRs)#

Every technical decision has a rationale. These are the most important:

Multi-model (GPT-4.1 + MiniMax + GPT-4.1 mini) vs single LLM

Each component with the right model: GPT-4.1 for the main router and voice agent (precise tool calling), GPT-4.1 mini for quotes (structured output), MiniMax M2.5 for booking (fast and cheap for parsing temporal preferences). OpenRouter as gateway allows switching models without rewriting workflows.

OpenRouter as model-agnostic gateway

Switch between models without rewriting workflows, automatic fallback if a model is down. We evaluated Claude, GPT-4, MiniMax: we chose by use case, not brand.

n8n vs Make for orchestration

Each sub-agent is an independent workflow with its own webhook. Make doesn't allow this modularity. n8n enables LangChain agent patterns, memory management, and native tool calling.

Sub-agents as webhook microservices

Decoupled, individually testable, independent deployment. The same sub-agent serves WhatsApp (via n8n) and phone (via ElevenLabs) without duplicating code.

Airtable as brain vs database

A full Business OS already existed in Airtable (12 bases, 2,100+ fields). Single source of truth for stock, pricing, and customer history. Build on what exists, don't duplicate.

Memory window: 20 messages per session

Balance between context and token cost. Sufficient for a repair conversation (95% resolve in <10 messages). Keyed by phone number for continuity.

Think tool for internal reasoning

Explicit reasoning before multi-tool chains. Reduces errors because the LLM plans the sequence (check price → verify stock → offer appointment) before execution.

HITL via Slack with escalation reason

The LLM generates the escalation reason and includes it in the Slack message: why human intervention is needed, what has been tried, and what the customer needs. Works the same from WhatsApp (deep-link to WATI) and voice calls. The human knows why they are needed before opening the conversation.

WhatsApp first, voice later

70% of volume came from WhatsApp. Starting there maximized impact before expanding to voice. Voice (ElevenLabs + Aircall) reused existing sub-agents.

Dual-orchestrator with shared sub-agents

n8n for WhatsApp/web, ElevenLabs for voice. Sub-agents are platform-agnostic webhooks. Reusable by any orchestrator without duplicating logic. A real microservices pattern.

ElevenLabs as a "teammate" in Aircall

Jacobo integrated into PBX with routing rules: enters on overflow or after hours. Customer calls a landline, transparent experience. eleven_flash_v2_5 with temp 0.0 for maximum consistency.

Aircall → Twilio → ElevenLabs (and the latency trade-off)

The Aircall PBX → Twilio (phone bridge) → ElevenLabs chain worked, but each hop added latency: ~950-1,500ms mouth-to-ear. Twilio uses G.711 at 8kHz, while STT models are optimized for 16kHz, forcing lossy resampling. Today I would choose a direct SIP trunk (Telnyx offers native 16kHz G.722 wideband and co-located infrastructure with sub-200ms RTT), removing the intermediate hop. The platform-agnostic design of the sub-agents would make this migration easy: only the transport would change, not the logic.

Platform Evolution#

Jacobo wasn't an afterthought. It was the inevitable consequence of 5 years building a robust Business OS beneath.

2019-2024

Business OS as the foundation

Five years building a complete business operating system in Airtable: 12 bases, 2,100+ fields, real-time inventory, CRM with customer history. Without this clean, accessible database, an AI agent would just be a generic chatbot hallucinating answers.

Jan 2025

Deliberate design & training

Before writing a line, I trained in AI agent architectures. I knew I needed tool calling, that Airtable was the SSOT, and that the agent had to be multimodal: voice and chat sharing the same resources.

Feb 2025

First proof (monolithic approach)

Tested a single-prompt approach with heavy context and confirmed my hunch: a monolithic prompt doesn't scale with multiple domains. This validated the decision for platform-agnostic webhook sub-agents.

Feb 2025

Final multi-agent version

My first AI agent, in production in under a month. Full sub-agent architecture: each domain in its own workflow with independent webhook, central router with tool calling, multi-model per use case. Speed was thanks to the existing Business OS underneath. All built while running the business.

Mar 2025

Voice channel (Aircall + Twilio + ElevenLabs)

Jacobo as a teammate in the Aircall dashboard, connected via Twilio to ElevenLabs. Reused existing sub-agents without duplicating logic. Validation of platform-agnostic design: webhooks served a second orchestrator without touching a single line of logic.

Sep 2025

Going-concern sale

Jacobo has been 24/7 active since launch. It was part of the business sale as an operational asset: the buyer acquired it running. Five years of clean architecture made it inevitable.

Jacobo wasn't an experiment.

16 years building a business with my own hands.

Systematizing it until it ran without me.

Jacobo was the piece that closed the loop.

I sold the business as a going concern.

The systems still run today — under new ownership.

Business OS — The System Behind Jacobo

Jacobo was built on top of the Business OS I designed for 5 years. Read the full case study →

Read case study →

Jacobo's first moments of life: endpoint testing, loyalty copy iteration and the final CRM template

Lessons Learned#

Sub-agents > monolithic prompt.

During design, I tested one prompt with full context and confirmed it doesn't scale. The sub-agent architecture was a deliberate decision: each piece testable, iterable, and independent. A change in discounts can't break bookings. It's the same logic as microservices, applied to AI agents.

HITL is a feature, not a fallback.

A well-done human handoff builds more trust than a bot trying to solve everything. Customers value that the system knows when they need a person. The trick: the human doesn't start from zero.

The CRM is the agent's brain, not the LLM.

Jacobo isn't smart because of the language model. It's smart because it queries prices, stock, and customer history in Airtable. Without that data, it's just a generic chatbot making things up.

Start with the highest volume channel.

WhatsApp represented 70% of inquiries. Starting there maximized impact. When we added voice, sub-agents were already proven, so we just connected a new orchestrator.

Choose models by use case, not brand.

GPT-4.1 for the router and voice (precise tool calling), GPT-4.1 mini for quotes (structured output), MiniMax M2.5 for booking (fast and economical). OpenRouter as gateway allows switching without rewriting. That's more FDE than saying "I use X for everything."

Think tool prevents multi-tool chain errors.

Before checking price → verifying stock → offering appointment, the agent explicates its plan. This step of explicit reasoning reduces sequence errors. Like "rubber duck debugging" for the agent itself.

What I'd Do Differently#

Jacobo worked in production for months, but with perspective, there are decisions I'd change:

Structured evaluation from day 1

I implemented evals post-hoc once the system was in production. If starting over, I'd define response quality metrics, intent classification accuracy, and HITL rate before the first version. Retrofitting observability is costlier than designing it from the start.

Direct SIP trunk instead of Aircall → Twilio → ElevenLabs

The 3-hop chain added ~950-1,500ms mouth-to-ear latency and forced G.711 (8kHz) to 16kHz resampling. With a Telnyx SIP trunk direct to ElevenLabs, I'd have native G.722 wideband and sub-200ms RTT. I chose the long chain because Aircall was already paid for; today I'd prioritize latency over convenience.

Vector store for memory instead of raw WATI fetch

The 80-message fetch from WATI works but doesn't scale to long-history customers or allow semantic search. A vector store (Pinecone, Qdrant) with conversation embeddings would allow "remember that time you brought the iPhone 12" without loading the whole chat.

Transferable Enterprise Patterns#

Jacobo was built for an SMB, but the architecture patterns are enterprise-grade. Here's what I built vs. what I'd add at enterprise scale:

Pattern	What I built	Enterprise
Sub-agent routing with tool calling	Router + 7 webhook sub-agents with intent classification and delegation	Add circuit breakers, retry policies, and alternate model fallbacks per sub-agent
Multi-model orchestration	GPT-4.1 (router/voice) + GPT-4.1 mini (quotes) + MiniMax (booking) via OpenRouter	A/B testing of models per sub-agent, canary deployments for new prompt versions
HITL framework	Escalation via Slack with full context and deep-link to conversation	Queue management, client tier SLAs, analytics on escalation reasons
Platform-agnostic sub-agents	Webhooks shared between n8n (WhatsApp) and ElevenLabs (voice)	API gateway, rate limiting, authentication, endpoint versioning
Observability	n8n logs + Slack alerts	Langfuse/Datadog for traces, latency, and cost tracking per conversation
Voice infrastructure	Aircall → Twilio → ElevenLabs: functional, but each hop adds latency (~950-1,500ms mouth-to-ear). Twilio uses G.711 at 8kHz, requiring 16kHz resampling for STT, degrading accuracy	Direct SIP trunk (Telnyx/Plivo) → ElevenLabs via SIP, removing the Twilio hop. Telnyx offers native 16kHz G.722 wideband (no resampling) and co-located infrastructure (GPU + telephony in same PoP) with sub-200ms RTT. For apps/web: direct WebRTC (Opus 16-48kHz) via LiveKit, no PSTN, with 300-600ms mouth-to-ear

Industry Applicability

Travel (Hopper, Booking)

Sub-agents for flights, hotels, insurance. HITL for complex changes. Tool calling against availability APIs.

Fintech

Sub-agents for transactions, balance inquiries, support. Stock-aware routing → balance-aware routing.

Healthcare

Sub-agents for appointments, results, triage. HITL as a critical feature for specialist referral.

E-commerce

Sub-agents for tracking, returns, recommendations. Same inventory lookup and booking patterns.

Voice AI Platforms

Orchestrating conversational agents with optimized latency. Cross-channel (voice → text) and HITL patterns apply directly to any voice platform.

Data/AI Platforms

Tool calling against internal APIs, sub-agent routing by intent, memory management. The same architecture scales to any agent orchestrator.

Want to implement enterprise AI routing?

Let's build a deterministic, tool-calling agent orchestrator for your stack.

View my LinkedIn →Email me →

Open Source Workflows#

I decided to open-source the core of the system. You can download, fork, and study the 7 production n8n workflows that powered Jacobo.

Main Router

The Brain

Classifies intent,picks sub-agent, maintains context. LangChain Agent pattern with 7 tools.

37 nodesGPT-4.1~133 KB

Download n8n workflow

Booking Sub-agent

Temporal Engine

Converts "tomorrow morning" to Unix timestamps. Queries YCBM and handles auto-booking.

18 nodesMiniMax M2.5~24 KB

Download n8n workflow

Quote Agent

Inventory Engine

Looks up exact model + repair in Airtable, returns real price with stock status.

11 nodesGPT-4.1 mini~15 KB

Download n8n workflow

hacerPedido

Order Creation

Creates repair orders in Airtable when parts are out of stock.

3 nodes~79 KB

Download n8n workflow

CalculadoraSantifer

Discount Calculator

Pure business logic. Calculates combo discounts when customers bundle multiple repairs.

3 nodes~2.7 KB

Download n8n workflow

contactarAgenteHumano

HITL Handoff

The escape valve. Escalates to human via Slack with a deep-link to the conversation.

5 nodes~2.3 KB

Download n8n workflow

EnviarMensajeWati

WhatsApp Sender

Cross-channel bridge: the voice agent sends WhatsApp messages via the WATI API.

3 nodes~2.5 KB

Download n8n workflow

View repo on GitHub

All workflows live on GitHub — fork, star, or download directly.

How to import these workflows

Download the JSON file from GitHub.

In n8n, click the + button and select "Import from File".

Choose the JSON and click "Import".

Configure your own credentials for WATI, Airtable, and OpenRouter.

Frequently Asked Questions#

How does memory work?

It reconstructs context from WATI history on every message.

Is it real-time?

Yes, with sub-30s response times and live stock lookups.