July 3, 2026 · 18 min read
Your AI-Generated Database Schema Is Probably Wrong
Why ChatGPT, Claude, Copilot, and AI Tools Get Database Design Wrong
When a client comes to me with a schema, my first question is always the same: show me how and what you query. Or give me your monitoring so I can figure out where to start solving the mystery.
Because here is the thing most people miss: database design is driven by access patterns, not entity relationships. AI only sees the entities. It looks at your prompt, figures out what tables you need, draws relationships between them, and gives you a CREATE TABLE dump. Whether it is PostgreSQL, MySQL, or any other database, the schema looks clean. It compiles. It runs.
And then production happens.
1. No Indexing Strategy: Why Your Queries Are Slow
AI creates tables but almost never adds indexes beyond the primary key. And honestly, there is nothing wrong with that initially. When you are starting out, you don't have query patterns yet. Indexing is mostly a post-facto decision. But the problem is AI does not come back and look at it on its own. Unless you explicitly trigger it to check database optimizations, it will never revisit what it generated.
Queries work fine with 100 rows. They fall apart at 100k. But before you jump to partitioning and sharding, ask the basic questions first. Did we normalize the table? Are we doing LIKE queries across wide columns? Are we joining with limitless rows and struggling at the end? Is it an N+1 query nightmare firing hundreds of calls where one would do? And if indexes are added, are they added only for what is actually needed? Because indexing everything and anything can shoot your resource usage and eventually your bills.
Composite, Partial, and Covering Indexes Explained
Composite indexes are for where clauses that consider more than one column in the search, and they need to be created exactly as per the query patterns. The column order matters. If any column is missing from the order or a new clause column is added, your index is broken. Do individual indexes on each column help when you have multiple columns in a WHERE clause? May not really.
Partial indexes are for columns with limited values. Think true/false, or status being active/inactive. They have a much smaller resource footprint, so blindly creating a full normal index on such columns is not useful. Covering indexes will fetch some required columns together in the index itself rather than spending time fetching them from the row. Consider you need user name and phone number where you search by the indexed column email. The covering index gives you those values without touching the actual table row.
Full Table Scan on 1 Million Rows: A Real Example
Without an index, a query like:
SELECT * FROM users
WHERE status = 'active'
AND created_at > ?;This forces the database to perform a full table scan, checking every row to find matches. On a table with 1 million rows, that means examining all 1 million records with O(n) time complexity. Depending on hardware, storage, and database engine, this can take 100ms to over 1 second. Adding a composite index on (status, created_at) allows the database to locate matching rows efficiently, often reducing execution time to just a few milliseconds.
2. Missing Foreign Keys, UNIQUE, and CHECK Constraints
In RDBMS, we keep foreign keys to make sure we don't end up with data inconsistency. There is a strong reference between related tables. But for AI, everything comes out loosely coupled. No foreign keys, just an ID stored as a column with no constraint. Unless you specifically want it that way and your deletion logic is aware of which and all places to delete and update, this is a recipe for orphaned data. Also, mostly there is no knowledge base for AI on how your tables are connected in relation to each other.
Then there is the question of ON DELETE CASCADE vs SET NULL vs RESTRICT. AI does not think about which behavior fits your business logic.
Missing UNIQUE constraints are another one. If a business rule says one vote per user per post, and that constraint exists only in your application code but not in the database, then someone writing a new dependent function can abuse that functionality. They don't know the limit exists, they don't reuse the existing function, and suddenly your data integrity is gone.
Same with CHECK constraints. Request validation catches format issues at the API level, but if it is not enforced at the database level too, nothing stops negative prices, future birth dates, or impossible values from getting in through a different code path.
3. Wrong Data Types That Cost You Later
Wrong data types always bite at a later stage. Initially we ask for unnecessary space for a given column despite the real value being a fixed size. Getting this right can literally improve storage requirements, and if the column is indexed, the index takes less space and fewer resources too.
VARCHAR(255) for everything. That is the AI default. But is it the exact need for you? Or do you want your QA or end user to come and say it is not sufficient when they try to enter a longer value?
Storing money as FLOAT instead of DECIMAL. This is a specific use case but it matters. It is better to operate with the required precision that your system needs, because floating point can result in inaccuracies depending on the business context. Financial calculations with rounding errors is not something you want to explain to your finance team.
Using TEXT where VARCHAR with a limit makes sense. Tough to justify TEXT as it can allow anything and everything. Some MySQL versions may restrict it too. It is not good practice. Do you want to go into the nightmare of migration later when users are already running in production?
Storing booleans as strings. Humans read "true" or "false" but do computers need to understand it literally? No. It can better be 0 or 1.
Timestamps without timezone awareness. TIMESTAMP vs TIMESTAMPTZ. Do not set yourself up for the painful migration when your application goes global and you never thought about why timezone is important.
UUIDs as VARCHAR(36) instead of the native UUID type. Using the out-of-the-box right data type is never old fashioned. Use UUID type where needed instead of storing it as a string.
4. No Migration Plan: ALTER TABLE on 10 Million Rows
AI gives you a CREATE TABLE dump. No versioning, no migration files. No Flyway, no Liquibase, no Alembic, no migration framework at all. That is completely fine if you are starting from scratch, but did you ever think about how migration works when your schema needs to evolve?
What happens when you need to add a column to a table with 10 million rows? Adding a column to a 10M-row table requires careful planning to avoid locking your production database. The performance and downtime depend heavily on the database engine, whether the column has a default value, and the constraints applied. Do you think AI is asking you these questions while you simply added a prompt saying "add a new column"? In production, users will be affected because you have not even run this migration in a lower environment with the same data volume as production. You never knew how long the outage would be due to locked tables.
Zero consideration for zero-downtime deployments. Adding a NOT NULL column without a default locks the table. If your users are on the application 24x7, how do you think migration can run while users still use the application with no downtime? Who likes to lose business just for a database migration?
5. Normalization vs Denormalization: Getting It Wrong Both Ways
AI gives you either fully normalized schemas with 7 joins for a user profile, or fully denormalized god tables with everything crammed into one place. If we keep repeating information across tables in a denormalized way, updates become a nightmare. If fully normalized, are the tables linking to each other without repeating info at multiple places? These are the important questions to address.
AI does not understand when to denormalize because it is a performance decision, not a default. You don't always need an axe where a knife will do.
Storing comma-separated values in a single column instead of a junction table. Typically it happens as a quick fix and becomes a nightmare to manage. Querying, updating, deleting individual values from a comma-separated string is painful and error-prone.
Duplicating data without any sync strategy. If the propagation of an update requires duplicating data across tables, you are waiting for gaps to appear in how people handle that sync on their own. Unless someone who built it or has both business and table knowledge can tell you how the sync strategy needs to work, it is a ticking bomb. And this is not even a need in the first place if you take care to avoid duplication properly.
6. localStorage and SQLite in Production: The AI Default
AI tools default to the simplest storage. localStorage, JSON files, SQLite in production. It is easy to figure out on your own shortly, but is it worth going in that direction where you do not know that ephemeral storage is being used? Everything will be wiped on restart. That restart might not happen for months, and then you wake up to the nightmare of losing all data that may be important.
The Reddit horror stories of production apps with localStorage as the "backend" are real. That kind of experience does the unpleasant job of destroying user trust, and once trust is gone, gaining it back is another problem entirely.
No concept of concurrent access, ACID, or what happens when two users write at the same time. "It works on my machine" taken to the extreme. Because local machines do not talk about production data volumes.
7. No Soft Deletes, No Audit Trail, No Compliance
AI does hard DELETE by default. Data just vanishes. It is irreversible unless you want to retrieve from backup and backfill, but does that really seem like the right thing to do?
No deleted_at, no created_by, no updated_at. These housekeeping fields help you understand who did what and when. Absolutely needed for critical business data modifications.
In regulated industries like finance and healthcare, this is a compliance violation. It needs to be handled for adherence to policies and terms set by the compliance framework.
And it is not just the CEO asking what happened to that record last Tuesday. Some day a customer runs to you asking why their critical data got deleted. Was it a mistake? Did some employee leaving the organization do it? Without audit trails, you have no answers.
8. Inconsistent Table and Column Naming
Mixed naming in the same schema: userId, user_id, UserID. And please do not tell me you have spaces in your table names.
Singular vs plural table names inconsistently applied. It creates inconsistency across the business logic and does not convey meaning by itself. Someone new has to dig deep on what to use and keeps second-guessing even if their choice is right.
Generic names like data, info, details, type, status with no context. It is better to convey meaning from the name rather than having developers wonder what thought process went behind it.
Junction tables named randomly instead of user_roles or order_items. Names that do not match the parent tables. A developer new to the team with 100 other tables has to figure out which one is the junction table if they only know the schema. Now consider during a production issue: if your AI has written code with no developer oversight, the first mystery to solve is understanding and correlating table information before you can even start debugging.
9. No Multi-Tenancy: One Missing WHERE Clause Leaks Data
AI builds everything single-tenant. When multi-tenancy becomes an afterthought, it requires significant rework at all levels including the database.
No tenant_id, no row-level security, no schema separation strategy. Just a mess of data on its own.
"Just add a WHERE clause" they say. Until someone forgets it and leaks data across tenants. That is not just a bug, that is a security incident and possibly a legal one.
10. Read-Heavy vs Write-Heavy: AI Does Not Ask
AI does not ask "is this read-heavy or write-heavy?" Because you may have missed ingesting all of the business context that a human naturally thinks about. AI will not consider it unless you explicitly tell it or ask it to think about it. Each type of workload needs to be weighed against the CAP theorem and you need to decide which trade-off to pick on a use case by use case basis. Though do not take this as strictly pre-facto. It can very well be post-facto if significant growth comes later before you were able to think about how your read/write pattern would look.
No materialized views, no read replicas consideration. Because everything works perfectly, right? No monitoring to tell you how much P99 and P95 latencies are burning and what heat the user is facing. Determining and addressing optimized reads is the most basic thing an application should handle and implement at the database level.
Schema optimized for neither reads nor writes. Just "store the data somewhere." That is quite immature because we are missing the craftsmanship that separates a working prototype from a production system.
Database Design Is Architecture, Not a Generated File
I covered database design as architecture in my system design post: schema decisions haunt you for years. The table structure you choose in month one determines how painful your queries are in year two. These are architecture decisions, not database admin tasks.
AI gives you a schema that looks correct on paper. It has the right tables, the right column names, and it runs without errors. But it does not think about how your application will actually use that data. It does not think about what happens at scale, what happens during migrations, what happens when multiple teams are building on top of the same tables.
Database design is driven by access patterns, not entity relationships. AI only sees the entities. The access patterns, the growth trajectory, the business constraints, the compliance requirements, the operational reality of running this in production? That takes an engineer who asks the right questions before writing the first CREATE TABLE.
Need a second opinion on your database design?
I help engineers review schemas, fix query performance, and make database architecture decisions that hold up in production. 726+ sessions, 5.0 rating.
Get notified when I publish new posts
Architecture, system design, and production engineering.