Agent UX · Behavior Design · Research

When the Agent Knows How to Use the Tools But Not How to Think

Designing reasoning behavior for the Pipefy MCP — without a single screen to design.

A fully functional MCP that didn't actually work

At Pipefy, AI isn't a team — it's a company-wide mandate. Everyone is expected to understand, test, and contribute to how AI capabilities evolve across the product. When the Pipefy MCP server launched, I started using it.

The MCP was technically complete. It had 128 tools covering every part of the platform — creating processes, configuring phases and fields, setting up automations, querying data. An AI agent connected to it could, in theory, build a full process from scratch.

The processes it built were broken.

Not broken in the sense of errors or failed API calls. Broken in a subtler way: the agent would create automations referencing fields that didn't exist yet. It would produce a structure that looked complete but was missing the connective logic — the conditions, the dependencies, the sequencing — that makes a Pipefy process actually work in practice.

The problem wasn't capability. The agent had all the tools. What it didn't have was judgment — knowing what to ask before building, which order things need to be created in, and how to verify that what it built matched what was requested.

This was a design problem. Not a technical one.

Independent initiative. Two days. No dedicated team.

I identified the problem through direct use — not from user feedback, not from a brief. I proposed and executed the solution independently, in two days, with no dedicated team.

The key reframe that shaped everything: the UX of the MCP isn't a screen. It's the quality of what gets built. When a user asks the agent to create a process, the experience is the result. A process that looks complete but fails in practice is a failed experience — regardless of how the agent behaved along the way.

That meant the design problem wasn't visual. It was behavioral. I needed to design how the agent reasons, not how it looks.

My approach had three phases:

1

Synthesis first

I had three sources of existing knowledge: technical documentation of the MCP tools, product descriptions of Pipefy's features, and design principles already documented internally. I used these — combined with my own deep product knowledge as a designer who works inside Pipefy — to map the gap between what the agent could do and what it needed to understand.

2

Document as design artifact

The output wasn't a prototype or a flow. It was a set of six behavioral documents — each covering a specific moment in the agent's reasoning cycle. Writing them required making explicit what's usually tacit: what questions to ask before acting, what order things need to happen in, how to infer versus when to ask, how to know when something is actually done.

3

Validate with domain experts

After building, I brought the documents to the SE (Solution Engineer) team — the people whose expertise the documents were meant to encode. Their role was to validate whether what I'd written matched how they actually reason about Pipefy processes. They reviewed, confirmed, and adopted.

A system of six documents. A complete reasoning loop.

The solution is a system of six behavioral documents — not a single prompt, not a style guide. Each covers a distinct moment in the agent's reasoning cycle, and together they form a complete loop: from the first contact with a user request to verification of what was built.

Before designing any of the documents, one reframe was necessary: the real user of the MCP isn't the agent — it's whoever asks the agent to build something. A customer, an internal SE, an ops team member. The agent is the medium. That shifted the entire brief: the documents couldn't just describe how to use the tools. They needed to help the agent understand what the user actually needs — which is often not the same as what they asked for.

1

Process Model — the foundation

Establishes what each Pipefy element means beyond the API, and who the different user types are. The agent needs to understand the domain before it can reason about it.

2

Discovery Protocol — mandatory first move, no exceptions

The agent's core failure was building without understanding — receiving a request and immediately creating. The Discovery Protocol makes exploration mandatory before any write operation: understand what already exists, confirm the goal, surface ambiguities. This mirrors what a good SE does before touching anything. Destructive operations require explicit confirmation, always.

3

Creation Protocol — fixing the construction order

One of the most concrete failures was sequencing: the agent would create automations before the fields they referenced existed. This isn't a bug — it's a knowledge gap. The Creation Protocol enforces the correct order: phases → fields → connections → automations. Each layer depends on the previous one existing. It also encodes a key design decision: inference by default. Rather than asking 20 questions upfront, the agent infers what it can from context, proposes, marks its assumptions explicitly, and only blocks when an error would require destructive changes.

4

Design Principles — the institutional knowledge

Encodes what lives in SE heads, not in any documentation: the anti-patterns, the workarounds, the quality criteria that distinguish a process that works operationally from one that's merely technically valid.

5

Self-Evaluation — closing the build loop

The agent's previous behavior was to build and declare done — without verifying. The Self-Evaluation document makes a post-build check mandatory: compare what was requested against what was created, verify structural integrity at every layer, report gaps explicitly. "Done" means verified, not just finished.

6

Diagnosis Protocol — a second mode of operation

Enables analyzing broken or underperforming processes that already exist. A symptom-to-cause map guides the agent from observation to root cause to concrete recommendations — a distinct role from building, requiring a distinct reasoning path.

Tying it together is a routing table — a decision layer that determines which protocol applies to each incoming situation, using a silent search_pipes call as the tiebreaker when context is ambiguous. This prevents the agent from treating a partial process as a new one, or a diagnostic request as a build.

Adopted by the SE team. ~50% drop in internal modeling time.

The SE team reviewed the documents, validated that the reasoning they encoded matched how they actually think about Pipefy processes, and adopted them for internal use. External deployment is currently limited by compliance requirements — this is in-house use only for now.

The practical effect: tasks that previously required SEs to manually configure dozens of automations and conditionals per process are now handled by the agent. Internal modeling time has dropped by approximately 50%.

~50%

drop in internal process modeling time

120h

avg. SE hours clients contract for initial process modeling

6

behavioral documents covering the full agent reasoning cycle

To put that in context: clients typically contract an average of 120 hours of SE time for initial process modeling. The ceiling on what these documents can unlock grows as integrations — currently still configured manually — are brought into scope.

UX design without a visual surface is still UX design

Key insight

The design process here was recognizable — define the user, identify the gap between what the system does and what the user needs, research the domain knowledge that closes that gap, translate it into something the system can apply. The medium shifted from screens to behavior. The logic didn't.

Key insight

Tacit expertise is the hardest thing to design for. What makes a great SE isn't documented anywhere. It lives in the questions they ask before building, the anti-patterns they recognize, the workarounds they've learned. Making that explicit — turning judgment into criteria an agent can apply consistently — is a design problem as much as a knowledge management one.

What's next

A template library for common process types — procurement, HR onboarding, support — so the agent can load a starting structure from context rather than building from zero every time. And integration coverage: the remaining manual step that currently limits what the agent can fully handle end-to-end.

Next case

Building the Intelligence Layer