Introducing database normalization with PostgreSQL and Prisma

November 13, 2023

Database normalization is a common topic in database design discussions. However, it’s usually explained using complicated terms, making it hard to understand. In this article, we’ll explain what normalization means and give examples using Prisma and PostgreSQL.

The goal of database normalization is to enhance the accuracy and minimize the duplication of our data. We do that by arranging the data according to specific rules known as normal forms. These rules might sound complex initially, but they are logical and easy to understand when explained in plain language.

1NF – first normal form#

The main rule of the first normal form is that each field in a table should hold just a single piece of information. Let’s look at an example to understand this:

schema.prisma#

model User {
  id       Int    @id @default(autoincrement())
  fullName String
  address  String
}

Upon closer inspection, it becomes clear that the entity mentioned violates the first normal form rule.

In the given example, we’re putting more than one piece of information in a single column. For instance, the fullName field holds both first and last names. It’s important to consider whether we’ll need to access just part of that information. A good example is when we want to find everyone with the last name “Williams”.

Watch out, because some names have prefixes. A common example is van in the name Ludwig van Beethoven.

Similarly, keeping it as one string could be fine if we only need to show the address. But, if we ever need to retrieve users from a specific country, we’d have a problem. That’s why we must carefully consider how we set up our columns. It might be wise to play it safe and divide the data into several fields.

schema.prisma#

model User {
  id              Int    @id @default(autoincrement())
  firstName       String
  lastName        String
  buildingNumber  String
  apartmentNumber String
  city            String
  zipCode         String
  country         String
}

Designing for scalability#

It’s important to steer clear of making groups of columns with very similar names and purposes. For instance, consider a situation with articles that can have multiple authors.

schema.prisma#

model Article {
  id             Int    @id @default(autoincrement())
  title          String
  content        String
  firstAuthor    User   @relation(fields: [firstAuthorId], references: [id])
  firstAuthorId  Int
  secondAuthor   User   @relation(fields: [secondAuthorId], references: [id])
  secondAuthorId Int
}

This approach doesn’t scale well, unfortunately. For instance, if an article requires a third author, we’d need to add another column. Also, we’d have to check each column to find articles written by a specific user.

To address these problems, we can implement a many-to-many relationship.

schema.prisma#

model User {
  id              Int       @id @default(autoincrement())
  firstName       String
  lastName        String
  buildingNumber  String
  apartmentNumber String
  city            String
  zipCode         String
  country         String
  articles        Article[]
}
 
model Article {
  id      Int     @id @default(autoincrement())
  title   String
  content String?
  authors User[]
}

Using Array and JSON Columns with PostgreSQL#

While using JSON and array columns can come in handy, they can be perceived as breaking the first normal form rule. However, they don’t automatically mean bad database design, because we should choose the appropriate tool for the task. In some cases, it’s better to use an array instead of creating separate tables, as this can make the database smaller and simplify queries by eliminating the need for joins.

2NF – second normal form#

The first requirement of the second normal form is that the data must already comply with the first normal form. Additionally, the table should not have any partial dependencies.

Let’s imagine a table with a composite primary key. If some other column relies only on a part of our primary key, it means a partial dependency.

A primary key is a unique identifier for each record in a database table, ensuring that no two rows have the same key value. If the primary key consists of more than one column, we refer to it as a composite primary key.

schema.prisma#

model ProductPurchase {
  clientName   String
  productName  String
  productPrice Int
 
  @@id([clientName, productName])
}

In our table, we have a composite primary key that consists of the clientName and productName columns. We can see a partial dependency because the productPrice depends only on one of the primary key fields – the productName. This can lead to duplicating our data.

Above, two rows describe people who purchased microwaves at 100 per unit. This shows a violation of the second normal form, leading to repeated information about the microwave’s price. This uses extra space in the database and creates problems when updating data. For instance, if the price of the microwave changes, it might require changes in several rows.

3NF – third normal form#

To achieve the third normal form, a table must first satisfy the requirements of the second normal form. Additionally, all attributes should be functionally dependent only on the primary key.

schema.prisma#

model ProductPurchase {
  clientName      String
  productName     String
  productPrice    Int
  courierCarBrand String
  courierCarModel String
 
  @@id([clientName, productName])
}

In the above example, we add the courierCarBrand and courierCarModel properties that depend on each other. If a model is “Prius”, it tells us the brand is “Toyota”. Unfortunately, this breaks the third normal form.

Having the data set up like that could cause problems with keeping the information consistent. For example, if we change the brand of a car in just one row and not everywhere, the same car model might show up with different brands in different spots.

Key takeaways from 2NF and 3NF#

The second and third normal forms tell us each table should describe one specific entity. We can make several related tables instead of putting all the data in one table. Also, it’s usually better to have an automatically created ID column than to use keys made from combining several columns.

Let’s use this knowledge to create multiple tables for the products, clients, card brands, and car models. We can then use those tables through a many-to-one relationship with the ProductPurchaseModel.

schema.prisma#

model Product {
  id              Int               @id @default(autoincrement())
  name            String
  price           Int
  ProductPurchase ProductPurchase[]
}
 
model Client {
  id              Int               @id @default(autoincrement())
  name            String
  ProductPurchase ProductPurchase[]
}
 
model CarBrand {
  id       Int        @id @default(autoincrement())
  name     String
  CarModel CarModel[]
}
 
model CarModel {
  id              Int               @id @default(autoincrement())
  name            String
  brand           CarBrand          @relation(fields: [brandId], references: [id])
  brandId         Int
  ProductPurchase ProductPurchase[]
}
 
model ProductPurchase {
  id              Int      @id @default(autoincrement())
  client          Client   @relation(fields: [clientId], references: [id])
  product         Product  @relation(fields: [productId], references: [id])
  courierCarModel CarModel @relation(fields: [courierModelId], references: [id])
  clientId        Int
  productId       Int
  courierModelId  Int
}

Thanks to this approach, the ProductPurchase model only contains IDs that relate to various other tables.

Summary#

In this article, we’ve covered database normalization basics, using examples with Prisma and exploring different normal forms. Even though the official definitions can seem complex, the fundamental ideas behind data normalizations are pretty simple. Remembering these concepts will help ensure our database stays efficient and is scalable.