0

Normalization — 1NF, 2NF, 3NF, BCNF with real examples

Intermediate5 min read·eng-06-005
sqlinterview

Concept

Normalization is the process of organizing a relational database to reduce data redundancy and improve data integrity. Each normal form (NF) adds progressively stricter rules.

Why normalize: If user email is stored in both users and orders, and the user changes their email, you need two updates — and they can become inconsistent.

First Normal Form (1NF):

  • Every column contains atomic (indivisible) values.
  • No repeating groups or arrays in a column.
  • Each row is uniquely identifiable (has a primary key).
  • Violation: tags = "php,laravel,sql" — storing multiple values in one column. Fix: create a separate tags table.

Second Normal Form (2NF):

  • Must be in 1NF.
  • Every non-key column is fully dependent on the ENTIRE primary key (no partial dependencies).
  • Only applies to tables with composite primary keys.
  • Violation: order_items(order_id, product_id, product_name)product_name depends only on product_id, not the full (order_id, product_id) key. Fix: put product_name in a separate products table.

Third Normal Form (3NF):

  • Must be in 2NF.
  • No transitive dependencies — non-key columns must depend ONLY on the primary key, not on other non-key columns.
  • Violation: employees(id, department_id, department_name)department_name depends on department_id, not on id. Fix: separate departments table.

BCNF (Boyce-Codd Normal Form):

  • A stricter 3NF. Every determinant is a candidate key.
  • Handles edge cases in 3NF with multiple overlapping candidate keys.

Denormalization: Intentionally adding redundancy for performance. Pre-computed order_total on orders table to avoid summing order_items every time. Trade-off: faster reads, more complex writes.

Code Example

sql
-- ❌ VIOLATES 1NF — multiple values in one column
CREATE TABLE posts_bad (
    id INT PRIMARY KEY,
    title VARCHAR(200),
    tags VARCHAR(500)  -- "php,laravel,sql" — not atomic!
);

-- ✅ 1NF — separate table for tags
CREATE TABLE posts (id INT PRIMARY KEY, title VARCHAR(200));
CREATE TABLE post_tags (
    post_id INT REFERENCES posts(id),
    tag     VARCHAR(50),
    PRIMARY KEY (post_id, tag)
);

-- ❌ VIOLATES 2NF — product_name depends only on product_id (partial dependency)
CREATE TABLE order_items_bad (
    order_id     INT,
    product_id   INT,
    product_name VARCHAR(100),   -- depends only on product_id!
    quantity     INT,
    PRIMARY KEY (order_id, product_id)
);

-- ✅ 2NF — product name lives in products table
CREATE TABLE products    (id INT PRIMARY KEY, name VARCHAR(100), price DECIMAL(10,2));
CREATE TABLE order_items (order_id INT, product_id INT REFERENCES products(id), quantity INT, PRIMARY KEY(order_id, product_id));

-- ❌ VIOLATES 3NF — department_name depends on department_id (transitive dependency)
CREATE TABLE employees_bad (
    id              INT PRIMARY KEY,
    name            VARCHAR(100),
    department_id   INT,
    department_name VARCHAR(100)  -- depends on department_id, not directly on id!
);

-- ✅ 3NF — departments in their own table
CREATE TABLE departments (id INT PRIMARY KEY, name VARCHAR(100));
CREATE TABLE employees   (id INT PRIMARY KEY, name VARCHAR(100), department_id INT REFERENCES departments(id));

-- DENORMALIZATION example — order total pre-computed for performance
ALTER TABLE orders ADD COLUMN total DECIMAL(10,2) NOT NULL DEFAULT 0.00;
-- Update when items change (via application logic or trigger)
-- Trade-off: avoid summing order_items on every read