Normalization Interview Questions

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that helps to guide you in creating a good database structure.

Data modification anomalies can be categorized into three types:

  • Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship due to lack of data.
  • Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss of some other important data.
  • Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to be updated.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The relation is said to be in particular normal form if it satisfies constraints.

Following are the various types of Normal forms:

DBMS Normalization
Normal FormDescription
1NFA relation is in 1NF if it contains an atomic value.
2NFA relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key.
3NFA relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNFA stronger definition of 3NF is known as Boyce Codd’s normal form.
4NFA relation will be in 4NF if it is in Boyce Codd’s normal form and has no multi-valued dependency.
5NFA relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be lossless.

Advantages of Normalization

  • Normalization helps to minimize data redundancy.
  • Greater overall database organization.
  • Data consistency within the database.
  • Much more flexible database design.
  • Enforces the concept of relational integrity.

Disadvantages of Normalization

  • You cannot start building the database before knowing what the user needs.
  • The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
  • It is very time-consuming and difficult to normalize relations of a higher degree.
  • Careless decomposition may lead to a bad database design, leading to serious problems.

Normalization

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a database with anomalies is next to impossible.

  • Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
  • Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else.
  • Insert anomalies − We tried to insert data in a record that does not exist at all.

Normalization is a method to remove all these anomalies and bring the database to a consistent state.

First Normal Form

First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.

unorganized relation

We re-arrange the relation (table) as below, to convert it to First Normal Form.

Relation in 1NF

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form

Before we learn about the second normal form, we need to understand the following −

  • Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.
  • Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds true.

Relation not in 2NF

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

Relation  in 2NF

We broke the relation in two as depicted in the above picture. So there exists no partial dependency.

Third Normal Form

For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy −

  • No non-prime attribute is transitively dependent on a prime key attribute.
  • For any non-trivial functional dependency, X → A, then either −
    • X is a superkey or,
    • A is a prime attribute.
Relation not in 3NF

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City so there exists transitive dependency.

To bring this relation into the third normal form, we break the relation into two relations as follows −

Relation in 3NF

Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −

  • For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip

and

Zip → City

This confirms that both the relations are in BCNF.

Database Normalization with Examples

Database Normalization Example can be easily understood with the help of a case study. Assume, a video library maintains a database of movies rented out. Without any normalization in database, all information is stored in one table as shown below. Let’s understand Normalization database with normalization example with solution:

Database Normalization With Example

Here you see Movies Rented column has multiple values. Now let’s move into 1st Normal Forms:

1NF (First Normal Form) Rules

  • Each table cell should contain a single value.
  • Each record needs to be unique.

The above table in 1NF-

1NF Example

Example of 1NF in DBMS

Example of 1NF in DBMS

Before we proceed let’s understand a few things —

What is a KEY in SQL?

A KEY in SQL is a value used to identify records in a table uniquely. An SQL KEY is a single column or combination of multiple columns used to uniquely identify rows or tuples in the table. SQL Key is used to identify duplicate information, and it also helps establish a relationship between multiple tables in the database.

Note: Columns in a table that are NOT used to identify a record uniquely are called non-key columns.

What is a Primary Key?

Primary Key in DBMS

Primary Key in DBMS

A primary is a single column value used to identify a database record uniquely.

It has following attributes

  • A primary key cannot be NULL
  • A primary key value must be unique
  • The primary key values should rarely be changed
  • The primary key must be given a value when a new record is inserted.

What is Composite Key?

A composite key is a primary key composed of multiple columns used to identify a record uniquely

In our database, we have two people with the same name Robert Phil, but they live in different places.

Composite key in Database

Composite key in Database

Hence, we require both Full Name and Address to identify a record uniquely. That is a composite key.

Let’s move into second normal form 2NF

2NF (Second Normal Form) Rules

  • Rule 1- Be in 1NF
  • Rule 2- Single Column Primary Key that does not functionally dependant on any subset of candidate key relation

It is clear that we can’t move forward to make our simple database in 2nd Normalization form unless we partition the table above.

2NF Example
2NF Example in DBMS

We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information. Table 2 contains information on movies rented.

We have introduced a new column called Membership_id which is the primary key for table 1. Records can be uniquely identified in Table 1 using membership id

Database – Foreign Key

In Table 2, Membership_ID is the Foreign Key

Foreign Key Example
Foreign Key in DBMS

Foreign Key in DBMS

Foreign Key references the primary key of another Table! It helps connect your Tables

  • A foreign key can have a different name from its primary key
  • It ensures rows in one table have corresponding rows in another
  • Unlike the Primary key, they do not have to be unique. Most often they aren’t
  • Foreign keys can be null even though primary keys can not
Foreign Key in Database

Why do you need a foreign key?

Suppose, a novice inserts a record in Table B such as

Why Foreign Key is important in Database

You will only be able to insert values into your foreign key that exist in the unique key in the parent table. This helps in referential integrity.

The above problem can be overcome by declaring membership id from Table2 as foreign key of membership id from Table1

Now, if somebody tries to insert a value in the membership id field that does not exist in the parent table, an error will be shown!

What are transitive functional dependencies?

A transitive functional dependency is when changing a non-key column, might cause any of the other non-key columns to change

Consider the table 1. Changing the non-key column Full Name may change Salutation.

Transitive functional dependencies in Database

Let’s move into 3NF

3NF (Third Normal Form) Rules

  • Rule 1- Be in 2NF
  • Rule 2- Has no transitive functional dependencies

To move our 2NF table into 3NF, we again need to again divide our table.

3NF Example

Below is a 3NF example in SQL database:

3NF Example
3NF Example
Example of 3NF in Database

We have again divided our tables and created a new table which stores Salutations.

There are no transitive functional dependencies, and hence our table is in 3NF

In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in Table 3

Now our little example is at a level that cannot further be decomposed to attain higher normal form types of normalization in DBMS. In fact, it is already in higher normalization forms. Separate efforts for moving into next levels of normalizing data are normally needed in complex databases. However, we will be discussing next levels of normalisation in DBMS in brief in the following.

BCNF (Boyce-Codd Normal Form)

Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.

Sometimes is BCNF is also referred as 3.5 Normal Form.

4NF (Fourth Normal Form) Rules

If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in 4th Normal Form.

5NF (Fifth Normal Form) Rules

A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without loss of data.

6NF (Sixth Normal Form) Proposed

6th Normal Form is not standardized, yet however, it is being discussed by database experts for some time. Hopefully, we would have a clear & standardized definition for 6th Normal Form in the near future…

That’s all to SQL Normalization!!!

Summary

  • Database designing is critical to the successful implementation of a database management system that meets the data requirements of an enterprise system.
  • Normalization in DBMS is a process which helps produce database systems that are cost-effective and have better security models.
  • Functional dependencies are a very important component of the normalize data process
  • Most database systems are normalized database up to the third normal forms in DBMS.
  • A primary key uniquely identifies are record in a Table and cannot be null
  • A foreign key helps connect table and references a primary key

Leave a comment

Comments (

0

)

Design a site like this with WordPress.com
Get started