I work with 9-figure e-commerce brands. And I'm spending more and more of my time building high-volume video editing agents for them.

Thousands of videos per month. Mostly UGC and creator clips that need small edits before they can run as ads. Cut the off-brand sentence. Move the hook to the front. Burn captions. Drop a discount badge. Speed it up 1.5x. Repeat 200 times this week.

The creative teams I work with used to live in CapCut and Premiere. They barely open them anymore.

Instead they talk to an AI agent in Slack. In plain English. The agent uses the Gentic Creative MCP server to do the actual edits.

The New Behavior

Here's what a typical session looks like at one of these brands.

A creator clip lands in their content library. The editor opens Slack and types something like:

"Take that new UGC clip from Sarah, cut the part where she mentions the discount because we don't run that promo anymore, swap it for our launch offer in her voice, move the hook about morning routines to the front, speed it up 1.5x, burn captions, drop a 50% off badge in the corner. Then give me an 8-second cinematic intro shot of a sunrise over a beach and stack it on top of her video as a 9:16 Reel with background music."

That's it. The agent does the rest.

What used to take a video editor 1-2 hours in Capcut or Premiere now happens in 3 minutes. The editor approves the result, sends it to the media buyer, and moves to the next clip.

I demoed this exact workflow from Claude Code instead of Slack. Same toolkit, different surface. One session, five tool calls, a full ad pipeline from raw creator clip to finished Reel:

1. understand_video_contents – pulls the transcript and scene tags from the raw UGC clip so the agent knows what's actually inside the video before editing it.

2. edit_video – cuts the non-compliant sentences, speeds it 1.5x, burns captions, drops the 50% off badge, moves the hook to the front, reduces volume. All in one tool call.

3. replace_voice – swaps the old discount line for the new launch promo using the original creator's AI-cloned voice. Stays on-brand. Stays in their voice.

4. generate_video_clip – renders the 8-second cinematic hook shot using Veo 3.1.

5. split_screen – stacks the AI shot over the speaker as a 9:16 Reel with background music. Ready for Meta.

The user describes the outcome. The agent handles the edit.

You can imagine what this does to the demand for video editing SaaS.

The Infrastructure Behind It

Video editing is heavy. CPU-intensive. You can't just call an LLM and get a finished MP4.

The actual editing runs on Modal servers. Modal handles the compute. The LLM handles the reasoning. The MCP server bridges the two.

When the agent calls edit_video, Modal spins up a worker, pulls the source clip from S3, runs the edit, uploads the new file, and returns the URL. All async. All scalable.

This part matters because it's what makes the whole thing work at scale. You're not editing one video at a time on someone's MacBook. You're editing 200 at once on cloud compute.

You can sign up at gentic.co/creative connect it to your Slack or Claude and let your AI agent to edit your videos in minutes.

Keep Reading