Normalization in DBMS

Published On: 31 March 2023.By Shivam Goyal.

Data, AI & Analytics
Quality Engineering

Normalization in DBMS

Database Normalization is the process of organizing a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd.

“Normal Forms” (NF) are the different stages of normalization

1 NF (First Normal Form)
2 NF (Second Normal Form)
3 NF (Third Normal Form)
BCNF (Boyce -Codd Normal Form)
4 NF (Fourth Normal Form)
5 NF (Fifth Normal Form)
6 NF (Sixth Normal Form)

4NF to 6NF applies to multivalued dependencies and complex table scenarios. In this article, we discuss up to BCNF.

1 NF

A relation R is said to be in 1 NF (First Normal) if and only if:

All the attributes of R are atomic.
There should not be any multi-valued attributes.

In the above-taken example of the Retail_Outlets table, we have stored multiple values in an address field, such as street name, city name, and pin code.

What if we want to know about all retail outlets in a given city? We may need to perform some string operations on the address field, which is not preferable. So we need to make all these values be stored atomic in separate fields.

A multi-valued attribute is an attribute that can have multiple values like Contact numbers. They should also be separated like ContactNo1, ContanctNo2,.. to achieve 1st Normal form.

Advantage :

1 NF allows users to use the database queries effectively as it removes ambiguity by removing the non-atomic and multi-valued attributes, which creates major issues in the future while updating and extracting the data from the database.

Limitation :

Even after 1st Normal form, the data redundancy still exists, so we need further normalization.

2 NF

A relation R is said to be in 2 NF (Second Normal) form if and only if:

R is already in 1 NF
There is no partial dependency in R between non-key attributes and key attributes.

Suppose we have a composite primary or candidate key in our table. Partial dependency occurs when a part of the primary key (Key attribute) determines the non-key attribute.

In the Retail Outlets table, the Item_Code and Retail_Outlet_ID are key attributes. The item description is partially dependent on Item_Code only. Outlet_Location depends on Retail_Outlet_ID. These are partial dependencies.

To achieve normalization, we need to eliminate these dependencies by decomposing the relations.

From the above decomposition, we eliminated the partial dependency.

Advantage :

2 NF attempts to reduce the amount of redundant data in a table by extracting it, placing it in a new table(s), and creating relationships between those tables.

Limitation :

There are still some anomalies as there might be some indirect dependencies between Non-Key attributes, leading to redundant data.

3 NF

A relation R is said to be in 3 NF (Third Normal Form) if and only if:

R is already in 2 NF
There is no transitive dependency that exists between key attributes and non-key attributes through other non-key attributes

A transitive dependency exists when another non-key attribute determines a non-key attribute. In other words, If A determines B and B determines C, then automatically, A determines C.

Some other examples :

The Year of birth determines the Age of the person
The price of an Item determines the class of the Item
The ZIP code of a city determines the City’s Name

Advantage :

3 NF ensures data integrity. It also reduces the amount of data duplication.

Boyce-Codd Normal Form

It is an upgraded version of the 3rd Normal form it is also called as 3.5 Normal Form.

A relation R is said to be in 3 NF (Third Normal Form) if and only if:

R is already in 3 NF
For any dependency A –> B then A should be Super key.

In simple words if A –> B then A cannot be a non-prime Attribute if B being a prime attribute which means that A non prime attribute cannot determine a prime attribute.

You must be wondering how this is possible? but Yes, there can be some cases in which the Non Prime attribute will determine the prime attributes even if the relation was in 3rd Normal form. BCNF does not allow this kind of dependency.

let us take an example :

Look at the below Relation of Student Enrolments:

Student_ID	Course_Name	Professor
101	JAVA	Prof. Java
102	C++	Prof. CPP
101	Python	Prof. Python
103	JAVA	Prof. Java_2
104	Python	Prof. Python_2

In the above relation:

One student can enroll in multiple courses.
Multiple professors can teach one course.
One professor can be assigned only one course.

So the primary key will be formed by ( Student_ID & Course_Name ), so these 2 will compositely determine all other attributes in the Relation. In our case, it is only the professor.

The Relation is clearly in 1st Normal Form as there are No Multivalued attributes and all attributes have atomic values.
The Relation is in 2nd Normal Form as there are No Partial dependencies.
Student_Id cannot determine Course_Name as one student can enroll multiple courses.
Course_Name can not determine the professor as one course can be taught by multiple professors.
The relation is in 3rd normal form as there are no transitive dependencies.

If we observe here, The “Professor” Attribute, a non prime attribute, can determine the Course_Name as each professor could be assigned only one course. But Course_Name is a prime attribute, and Professor is not a Super Key. That means a non-prime attribute determines the prime attribute.

This is Not Allowed in BCNF. So, how do we decompose this relation? :

Until here, while normalizing the database, we have seen normal forms up to BCNF, and some guidelines are to be followed. Some of them are listed below.

Guidelines for using normalization:

Depending on the business requirements, the tables can be normalized up to the 2nd normal form or the 3rd normal form.
Tables in 3 NF are preferred in applications with extensive data modifications.
Tables in 2 NF are preferred in applications with extensive data retrieval.
Reason: retrieving data from multiple tables is a costly operation.
Converting the tables from higher normal form to lower normal form is called “Denormalization”.

The below picture summarizes how to reach the third normal form from an unnormalized form :

Any relational database without normalization may lead to problems like large tables, difficulty maintaining the database as it involves searching many records, poor disk space utilization, and inconsistencies. If we failed to eliminate this kind of problem, it would lead to data integrity and redundancy problems. Normalization of a relational database helps to solve these problems. Normalization applies by a series of transformations in terms of normal forms. To get efficient access to the database, we can normalize any data in the database. Each Normal form eliminates each type of dependency and improves the data integrity.