Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like insertion, updates and deletion Anomalies.
It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purposes.
- Eliminating redundant (useless) data
- Ensuring data dependencies make sense i.e. data is logically stored.
Problem without Normalization becomes difficult to handle and update the database without facing data loss.
Insertion, updation and deletion anomalies are very frequent if database is not normalized.
To understand those anomalies, let us take an example of student table.
S_ID | S_Name | S_Address | Subject_opted |
401 | Adam | ||
402 | Alex | ||
403 | Smith | ||
404 | Adam |
- Updation Anomaly: To update address of a student who occurs twice or more than twice in a table, we will have S_address column in all rows, else data will become inconsistent
- Insertion Anomaly: Suppose for a new admission, we have a student id (S_id), name and address of a student but if student has not opted for any subjects yet then, we have to insert NULL there, leading to insertion Anomaly.
- Deletion Anomaly: If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.
Normalization Rule
Normalization rules are divided into following normal form
- First Normal Form 1NF
- Second Normal Form 2NF
- Third Normal Form 3NF
First Normal Form (1NF)
As per first Normal Form, no two rows of data must contain repeating group of information i.e. each set of column must have a unique value, such that multiple columns cannot be used to fetch the same row. Each table should be organized into rows and each row should have a primary key that distinguishes in a unique way.
The primary key is usually a single is usually a single column but sometimes more than one column can be combined to create a single primary key. E.g. consider a table which is not first normal form
Student table
Student | Age | Subject |
Adam | 15 | Biology, Maths |
Alex | 14 | Maths |
Smith | 17 | Maths |
In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas.
Rather than that, we must separate such data into multiple rows.
Student table following 1NF will be;
Student | Age | Subject |
Adam | 15 | Biology |
Adam | 15 | Maths |
Alex | 14 | Maths |
Smith | 17 | Maths |
Using the First Normal Form, data redundancy increase, as there will be many columns with same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
As per the Second Normal Form there must not be partial dependency of any column on primary key. It means that for a table that has concatenated primary key each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second Normal Form.
In example of First Normal Form, there are two rows for Adam, to include multiple subjects that he has opted for. While this is searchable and follows First Normal Form, it is an inefficient use of space. Also in the above table in First Normal Form, while the candidate key is {student, subject}, Age of student only depends on students column which is incorrect as per Second Normal Form.
To achieve Second Normal Form, it would be helpful to split out the subjects into an independent table and match them up using the student names as foreign key.
New Student Table following 2NF will be:
Student | Age |
Adam | 15 |
Alex | 14 |
Smith | 17 |
In Student Table the candidate key will be student column, because all other columns i.e. Age is dependent on it.
New Subject Table introduced for 2NF will be
Students | Subject |
Adam | Biology |
Adam | Maths |
Alex | Maths |
Smith | Maths |
In Subject Table, the candidate key will be (Student, Subject) column. Now, both the above tables qualify for Second Normal Form and will never suffer from update anomalies. Although there are a few complex cases in which table in Second Normal Form suffers update anomalies and to handle those scenarios, Third Normal Form is there.
Third Normal Form (3NF)
This applies that every no prime attribute of table must be dependent on primary key, or we can say that, there should not be case that a non-prime attribute.
So this transitive functional dependency should be removed from the table and also the table must be Second Normal Form for e.g. Student_Detail_Table
Std_id Std_name DOB Street City State Zip
In this table, student_id is primary, but street, city and state depends upon upon zip.
The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table with zip as primary key.
New Student_Detail_Table
Student_id student_name DON zip address Table.
Advantages of removing transitive dependency
- Data integrity achieved
- Amount of data duplication is reduced
Objectives of Normalization
According to E.F Codd, objectives of normalization are stated below:
- To free the collection of relations from undesirable insertion, update and deleting dependencies
- To reduce the need for restructuring the collection of relations as new types of data are introduced and thus increase the life span of application programs.
- To make the relational model more informative to users.
- To make the collection of relations neutral to the query statistics where these statistics are liable to change as time goes by.
Determinant of Normal Forms
1st Normal Form
- Each table must have a primary key, minimal set of attributes/fields which can uniquely identify records
- The values in each column of a table cannot be divided i.e. atomic values
- There are no repeating group
2nd Normal Form
- All requirement for 1st NF must be met
- Redundant data across multiple rows of a table must be moved to a separate table. The resulting tables must be related to each other by use of foreign key.
Foreign key is a field that links one table to another table. A table can have an unlimited number of foreign keys linking to other tables.
3rd Normal Form
- All requirement for 2NF must be met
- Eliminate fields that do not depend on the primary key
- That is any field that is dependent not only on the primary key but also on another field must be moved to another table.
Evaluation
- Draw a table in an normalized form & convert it to 1NF
- Mention five (4) objectives of Normalization
- What is normalization of database.