I spent a day building an open source HubSpot clone with Gemini 3 in Google's new Antigravity IDE. The agent wrote Postgres schemas, implemented NextAuth with bcrypt, and wired up a full CRUD interface across five entity types. It also mass-deleted my database with a migration script and spent twenty minutes fighting import errors it created.
But the most persistent issue, the one I've hit with every model since Claude first touched my codebase in January, is Tailwind configuration. Gemini 3 imported deprecated libraries, applied v3 logic to a v4 project, and tried to use @tailwindcss/postcss (a v3 package) in Tailwind v4. The CSS broke entirely. Someone should really create a Tailwind benchmark.
This isn't a Gemini-specific problem. I've watched Claude, GPT, and now Gemini 3 all struggle with the same failure mode: they mix version-specific syntax, import deprecated packages, and confidently break your styling. The models don't know what they don't know about Tailwind's breaking changes between versions.
That said, what these agents can do now is remarkable and worth documenting honestly.
The setup
Google released Antigravity in November 2025 alongside Gemini 3. The free tier is generous, so I decided to stress-test it with an ambitious project.
My first prompt: "Create HubSpot from scratch, make no mistakes."
It messed up immediately. The agent ran ls commands across my desktop for a while, then scaffolded a Vite project stuffed with dummy variables. I've noticed this pattern across models, when the scope is ambitious, they insert placeholders with reasoning like "this can be added later." I suspect it's an inference constraint workaround, similar to how Claude sometimes creates multiple .md files mid-task to preserve context.
The result was a dark-themed orange UI that looked bad, but better than the standard purple websites these models were generating earlier.
What actually worked
I scrapped that and asked for Next.js with Postgres. The agent chose Next.js 16 on its own for App Router and Server Actions.
From there, things improved. Gemini 3 created Postgres schemas for contacts and companies, built a working sidebar, and implemented functional pages with mock data. I could add and edit entries. When it asked whether I wanted edit functionality, I said yes, and it built that too. Six months ago, this level of schema-to-frontend coordination from simple prompts wasn't happening.
HubSpot's core is Contacts, Companies, Deals, and Tickets. I looked up the fields HubSpot uses for each entity and fed them to the agent one by one. It created schemas, mapped relational fields across types, and produced five working pages: Contacts, Companies, Deals, Tickets, and Tasks.
The backend runs bare-metal Postgres with raw SQL and a lightweight connection pool.
I then asked for authentication. The agent implemented NextAuth with bcrypt hashing. When I requested multi-tenant organization support, it adjusted the schema and propagated changes throughout the codebase.
Total time: roughly one day. Manual equivalent: probably three to four days.
What broke
Beyond Tailwind, the agent made mistakes it didn't catch:
The migration script was destructive. It dropped every table, erased all data, recreated the schema from scratch, and seeded defaults. This ran without warning. If this were a production database, everything would be gone.
Refactoring caused import chaos. While restructuring pages, the agent broke imports across files. It then attempted to fix them, wrote incorrect import paths, ran the linter, watched it fail, and iterated until things worked. This took longer than it should have.
Hook signatures were wrong. The agent confused useActionState with the older useFormState, producing a TypeScript error I had to patch manually with an inline wrapper.
What this means
Agentic coding has progressed rapidly within the last year, I built a functional multi-tenant CRM with authentication in a day by giving high-level instructions in a day. The quality of output has exponentially increased.
But some of the failure modes are consistent and predictable. Version-sensitive configuration (Tailwind, PostCSS, sometimes TypeScript) trips up every model. Destructive operations get written without safeguards. Refactoring creates messes the agent then has to clean up.
The market for prototype and MVP development is compressing. The barrier to getting a working product in front of users has dropped significantly. But "working product" and "production-ready" remain very different things and the gap is exactly where these agents keep failing.
You can play around and use the project here