The answer: in my experience 5% of the time goes into building the actual AI agent.
30% goes into building the MCP servers and MCP tools that act as a connector between the agent and the data.
And a whopping 65% goes into making sure the data the agent needs sits in a place the agent can have access to. And architected in a way the agent can understand and interpret which needs building a semantic layer for that purpose.
I've been building AI agents for e-commerce brands for over two years now. CRO agents, influencer agents, ad-making agents, data analysis agents, retention agents, support agents.
Looking back, I can tell you the one thing that was true on every single project.
The model was never the bottleneck. The data was.
Every time we hooked an agent up to a brand's real data, the quality jump was massive. The agent went from confidently wrong to actually useful. But getting to that point was the entire job. The agent part was the easy part.
Let me walk you through what that actually looked like, because the pattern is the story.
Every agent we built started with a data rescue mission
The CRO agent. The fun part was Claude analyzing session behavior, diagnosing drop-offs, and rewriting the landing page code. But before any of that could happen, we had to get a brand's Hotjar data – 50K daily sessions worth of clicks and scrolls – out of Hotjar and into an S3 + Athena pipeline the agent could actually query. The agent work took days. The data plumbing took weeks.
The influencer agents. The matching magic everyone saw was semantic vibe matching between a brand and a creator. What made it possible was unglamorous: embedding the brand's essence and each creator's last 25 posts into a vector database with millions of creators with emails before any matching could happen. No vectors, no vibe match.
The ad-making agents. Ten agents assembling a video ad in six and a half minutes. But Part 1 of that series wasn't about ads at all. It was the Scene Prep Agent – slicing raw footage into scenes, analyzing each one, embedding it all into a searchable library. Because raw video sitting in an asset folder is invisible to an agent.
Then the support and analytics work. Agents kept making stuff up because they couldn't see the brand's actual docs, policies, and past tickets. So we built kb, a vector knowledge base.
Brands kept uploading CSVs to ChatGPT and getting wrong numbers back, because LLMs can't see your whole file and are bad at math. So we built AI DB – a real database the AI writes SQL against.
Klaviyo learnings were trapped in dashboards that show one campaign at a time. So we built campaign memory for that.
Here's the thing I eventually had to admit.
Every one of these was the same product. Each one took data that was trapped somewhere – a SaaS dashboard, an asset folder, a spreadsheet, someone's head – and put it in a shape an agent could work with.
We weren't building agents. We were freeing data, over and over, one silo at a time.
What this looks like inside a 9-figure brand
If you operate a brand at scale, you already know where your data lives. Everywhere.
Orders and customers in Shopify. Campaigns in Klaviyo. Reviews in Yotpo. Support tickets in Gorgias. Session behavior in Hotjar. Ad performance split between Meta and Google. Brand guidelines in a PDF somewhere. Decisions in Slack threads. Creative assets in a DAM.
Every system speaks its own dialect. Nothing holds the whole picture.
Now you want agents. Everyone loves to build agents right now. They are hot and they are the future.
And every useful agent question crosses those silos. "What's my actual ROAS when I factor in returns?" touches ad platforms and orders. "Is this complaint a pattern or a one-off?" touches support tickets and reviews. "What rate did we pay the last creator like this one?" lives in an email thread from May.
An agent that can only see one silo can answer one-silo questions. Those are the questions your dashboards already answer.
"Just connect an MCP server" doesn't fix this
I've been the loudest MCP advocate you'll find and probably one of the earliest adopters of the protocol to a point I would help framework providers fix their bugs in the early days of the protocol.
MCP is how agents reach data, and brands should absolutely be building on it.
But here's what I've learned watching brands wire up vendor connectors: every vendor's MCP server (if they even have it) gives your agent another isolated interface. A window into Klaviyo. A window into Shopify. A window into your analytics tool.
Windows, not memory.
Your agent can look through each window and answer a question. What it can't do is hold what it learned. The research one agent did this morning isn't there for the agent writing creative this afternoon. The landing page an agent generated last week has no connection to the performance data that came back. Every workflow starts from zero, and everything an agent produces evaporates into a chat thread.
That's the real blocker. Not model capability. Not even tool access.
It's that your agents have nowhere shared to work.
You can't be an agentic brand if every new agent means rebuilding data access and memory from scratch. That's why so many agent pilots produce one impressive demo and then stall. The demo agent had its data hand-fed to it. The second agent has to start the rescue mission all over again.
So we built the thing all our agents needed
After enough rounds of this, the conclusion was hard to avoid. Every agent we built needed the same foundation: one place to read the brand's data, search its knowledge, write its results, and leave something behind for the next agent.
So we built it. It's called GenticDB.
The simplest way I can describe it: the database your agents share.
Commerce and operational data – products, orders, customers, reviews, campaigns, analytics events – lives in structured tables. Brand knowledge – guidelines, briefs, transcripts, the Brand Brain and wiki – lives right alongside it. And the tables hold relational data and embeddings together, so the same store answers a precise SQL query and a semantic search. No separate vector database bolted on the side.
It's not just rows and documents either. The creative work your agents produce lands there too. When a Gentic Creative agent generates a static image or edits a video, or an agent builds a landing page, the output gets saved with a catalog record that makes it findable by the next agent.
Remember the Scene Prep Agent slicing footage into a searchable library? That asset memory is now part of the same store as everything else. Your agents get a DAM they can actually search - not a folder of untagged files. Which is why "database" undersells it.
It's an all-in-one data store built for AI agents: operational data, knowledge, and creative assets, one layer.
Every agent reaches it through one Gentic MCP interface with its 12 MCP servers and 250+ ecom-specific MCP tools.
Your in-house agents, Gentic Computer (Mother), n8n automations, Claude, ChatGPT – they all read and write against the same store. What one agent writes is what the next agent reads. A research insight can feed a creative brief. A generated landing page can come back with performance data attached.
Underneath it all is an open lakehouse. DuckDB for compute, DuckLake managing the table state, open Parquet files on object storage.
That's the same architecture class Snowflake and Databricks built the big-data era on – and it matters at 9-figure scale, where "your data" means millions of order lines, session events, and embeddings, not a tidy spreadsheet.
This isn't a lookup table with a vector index stapled on. It's built to chew through warehouse-scale analytical queries.
The lakehouse is also where the part I personally care most about comes from, because I've spent two years telling brands their vectorized data is intellectual property: everything is stored in open formats. Query your tables with DuckDB. Export CSV, Parquet, or JSON. You don't need Gentic running to read your own files.
You can even connect your own Cloudflare R2 bucket, and your structured and vector data files get written into storage you control – inspect them, export them, revoke our access whenever you want.
To be clear about what it's not: it doesn't replace Shopify or your systems of record. It's the shared working layer your agents operate in – the layer that, until now, every brand had to duct-tape together one pipeline at a time.
We just took it live, hosted version first, with bring-your-own-bucket in early access.
Where this leaves you
If you've been following this newsletter and building your own agents at work – keep going. The skills transfer. But check where your time actually goes. If most of it is spent getting data into a usable shape before your agent can do anything, you've found the same wall we did.
GenticDB is live at gentic.co/db. If you're wrestling with this problem in-house, I'm genuinely happy to compare notes. Reply to this email or DM me on LinkedIn.

