Normalization — 1NF, 2NF, 3NF, BCNF with real examples
Intermediate5 min read·eng-06-005
sqlinterview
Concept
Normalization is the process of organizing a relational database to reduce data redundancy and improve data integrity. Each normal form (NF) adds progressively stricter rules.
Why normalize: If user email is stored in both users and orders, and the user changes their email, you need two updates — and they can become inconsistent.
First Normal Form (1NF):
- Every column contains atomic (indivisible) values.
- No repeating groups or arrays in a column.
- Each row is uniquely identifiable (has a primary key).
- Violation:
tags = "php,laravel,sql"— storing multiple values in one column. Fix: create a separatetagstable.
Second Normal Form (2NF):
- Must be in 1NF.
- Every non-key column is fully dependent on the ENTIRE primary key (no partial dependencies).
- Only applies to tables with composite primary keys.
- Violation:
order_items(order_id, product_id, product_name)—product_namedepends only onproduct_id, not the full (order_id, product_id) key. Fix: putproduct_namein a separateproductstable.
Third Normal Form (3NF):
- Must be in 2NF.
- No transitive dependencies — non-key columns must depend ONLY on the primary key, not on other non-key columns.
- Violation:
employees(id, department_id, department_name)—department_namedepends ondepartment_id, not onid. Fix: separatedepartmentstable.
BCNF (Boyce-Codd Normal Form):
- A stricter 3NF. Every determinant is a candidate key.
- Handles edge cases in 3NF with multiple overlapping candidate keys.
Denormalization: Intentionally adding redundancy for performance. Pre-computed order_total on orders table to avoid summing order_items every time. Trade-off: faster reads, more complex writes.
Code Example
sql
-- ❌ VIOLATES 1NF — multiple values in one column
CREATE TABLE posts_bad (
id INT PRIMARY KEY,
title VARCHAR(200),
tags VARCHAR(500) -- "php,laravel,sql" — not atomic!
);
-- ✅ 1NF — separate table for tags
CREATE TABLE posts (id INT PRIMARY KEY, title VARCHAR(200));
CREATE TABLE post_tags (
post_id INT REFERENCES posts(id),
tag VARCHAR(50),
PRIMARY KEY (post_id, tag)
);
-- ❌ VIOLATES 2NF — product_name depends only on product_id (partial dependency)
CREATE TABLE order_items_bad (
order_id INT,
product_id INT,
product_name VARCHAR(100), -- depends only on product_id!
quantity INT,
PRIMARY KEY (order_id, product_id)
);
-- ✅ 2NF — product name lives in products table
CREATE TABLE products (id INT PRIMARY KEY, name VARCHAR(100), price DECIMAL(10,2));
CREATE TABLE order_items (order_id INT, product_id INT REFERENCES products(id), quantity INT, PRIMARY KEY(order_id, product_id));
-- ❌ VIOLATES 3NF — department_name depends on department_id (transitive dependency)
CREATE TABLE employees_bad (
id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT,
department_name VARCHAR(100) -- depends on department_id, not directly on id!
);
-- ✅ 3NF — departments in their own table
CREATE TABLE departments (id INT PRIMARY KEY, name VARCHAR(100));
CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(100), department_id INT REFERENCES departments(id));
-- DENORMALIZATION example — order total pre-computed for performance
ALTER TABLE orders ADD COLUMN total DECIMAL(10,2) NOT NULL DEFAULT 0.00;
-- Update when items change (via application logic or trigger)
-- Trade-off: avoid summing order_items on every read