Introducing database normalization with PostgreSQL and Prisma
Database normalization is a common topic in database design discussions. However, it’s usually explained using complicated terms, making it hard to understand.…
Database normalization is a common topic in database design discussions. However, it’s usually explained using complicated terms, making it hard to understand. In this article, we’ll explain what normalization means and give examples using Prisma and PostgreSQL. The goal of database normalization is to enhance the accuracy and minimize the duplication of our data. We do that by arranging the data according to specific rules known as normal forms. These rules might sound complex initially, but they are logical and easy to understand when explained in plain language. 1NF – first normal form The main rule of the first normal form is that each field in a table should hold just a single piece of information. Let’s look at an example to understand this: schema.prisma model User { id Int @id @default(autoincrement()) fullName String address String }Upon closer inspection, it becomes clear that the entity mentioned violates the first normal form rule. In the given example, we’re putting more than one piece of information in a single column. For instance, the fullName field holds both first and last names. It’s important to consider whether we’ll need to access just part of that information. A good example is when we want to find everyone with the last name “Williams”. Watch out, because some names have prefixes. A common example is van in the name Ludwig van Beethoven. Similarly, keeping it as one string could be fine if we only need to show the address. But, if we ever need to retrieve users from a specific country, we’d have a problem. That’s why we must carefully consider how we set up our columns. It might be wise to play it safe and divide the data into several fields. schema.prisma model User { id Int @id @default(autoincrement()) firstName String lastName String buildingNumber String apartmentNumber String city String zipCode String country String } Designing for scalability It’s important to steer clear of making groups of columns with very similar names and purposes. For instance, consider a situation with articles that can have multiple authors. schema.prisma model Article { id Int @id @default(autoincrement()) title String content String firstAuthor User @relation(fields: [firstAuthorId], references: [id]) firstAuthorId Int secondAuthor User @relation(fields: [secondAuthorId], references: [id]) secondAuthorId Int } If you’re interested in learning more about setting up relationships, take a look at API with NestJS #33. Managing PostgreSQL relationships with Prisma This approach doesn’t scale well, unfortunately. For instance, if an article requires a third author, we’d need to add another column. Also, we’d have to check each column to find articles written by a specific user. To address these problems, we can implement a many-to-many relationship. schema.prisma model User { id Int @id @default(autoincrement()) firstName String lastName String buildingNumber String apartmentNumber String city String zipCode String country String articles Article[] } model Article { id Int @id @default(autoincrement()) title String content String? authors User[] } Using Array and JSON Columns with PostgreSQL In this series, we have articles about how to store arrays, and the JSON data with PostgreSQL and Prisma. While using JSON and array columns can come in handy, they can be perceived as breaking the first normal form rule. However, they don’t automatically mean bad database design, because we should choose the appropriate tool for the task. In some cases, it’s better to use an array instead of creating separate tables, as this can make the database smaller and simplify queries by eliminating the need for joins. 2NF – second normal form The first requirement of the second normal form is that the data must already comply with the first normal form. Additionally, the table should not have any partial dependencies. Let’s imagine a table with a composite primary key. If some other column relies only on a part of our primary key, it means a partial dependency. A primary key is a unique identifier for each record in a database table, ensuring that no two rows have the same key value. If the primary key consists of more than one column, we refer to it as a composite primary key. schema.prisma model ProductPurchase { clientName String productName String productPrice Int @@id([clientName, productName]) }In our table, we have a composite primary key that consists of the clientName and productName columns. We can see a partial dependency because the productPrice depends only on one of the primary key fields – the productName. This can lead to duplicating our data. Above, two rows describe people who purchased microwaves at 100 per unit. This shows a violation of the second normal form, leading to repeated information about the microwave’s price. This uses extra space in the database and creates problems when updating data. For instance, if the price of the microwave changes, it might require changes in several rows. 3NF – third normal form To achieve the third normal form, a table must first satisfy the requirements of the second normal form. Additionally, all attributes should be functionally dependent only on the primary key. schema.prisma model ProductPurchase { clientName String productName String productPrice Int courierCarBrand String courierCarModel String @@id([clientName, productName]) }In the above example, we add the courierCarBrand and courierCarModel properties that depend on each other. If a model is “Prius”, it tells us the brand is “Toyota”. Unfortunately, this breaks the third normal form. Having the data set up like that could cause problems with keeping the information consistent. For example, if we change the brand of a car in just one row and not everywhere, the same car model might show up with different brands in different spots. Key takeaways from 2NF and 3NF The second and third normal forms tell us each table should describe one specific entity. We can make several related tables instead of putting all the data in one table. Also, it’s usually better to have an automatically created ID column than to use keys made from combining several columns. Let’s use this knowledge to create multiple tables for the products, clients, card brands, and car models. We can then use those tables through a many-to-one relationship with the ProductPurchaseModel. schema.prisma model Product { id Int @id @default(autoincrement()) name String price Int ProductPurchase ProductPurchase[] } model Client { id Int @id @default(autoincrement()) name String ProductPurchase ProductPurchase[] } model CarBrand { id Int @id @default(autoincrement()) name String CarModel CarModel[] } model CarModel { id Int @id @default(autoincrement()) name String brand CarBrand @relation(fields: [brandId], references: [id]) brandId Int ProductPurchase ProductPurchase[] } model ProductPurchase { id Int @id @default(autoincrement()) client Client @relation(fields: [clientId], references: [id]) product Product @relation(fields: [productId], references: [id]) courierCarModel CarModel @relation(fields: [courierModelId], references: [id]) clientId Int productId Int courierModelId Int }Thanks to this approach, the ProductPurchase model only contains IDs that relate to various other tables. Summary In this article, we’ve covered database normalization basics, using examples with Prisma and exploring different normal forms. Even though the official definitions can seem complex, the fundamental ideas behind data normalizations are pretty simple. Remembering these concepts will help ensure our database stays efficient and is scalable. The post API with NestJS #133. Introducing database normalization with PostgreSQL and Prisma appeared first on Marcin Wanago Blog - JavaScript, both frontend and backend.