Full-text search with the Drizzle ORM and PostgreSQL

August 26, 2024

With PostgreSQL’s full-text search feature, we can quickly find documents that contain a particular word or phrase. It can also sort the results to show the most relevant matches first. In this article, we learn how to implement it with the Drizzle ORM, PostgreSQL, and NestJS.

Column types used with the text-search feature#

To implement the full-text search with PostgreSQL, we need two data types. They allow us to search through a set of texts and find the ones that best match a given query.

tsvector#

With the tsvector column, we can store the text in a format optimized for searching. Unfortunately, the Drizzle ORM does not support it natively yet. Therefore, we must understand how it works from the ground up and use some raw SQL in our code.

To convert a regular string to the tsvector format, we need to use the to_tsvector function.

SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog');

“The quick brown fox jumps over the lazy dog” is a common sentence that contains all the letters of the alphabet.

When we examine the result of the above query, several optimizations become apparent. The most apparent is duplicate grouping. Using the English dictionary, PostgreSQL recognized that “quick” and “quickly” are different forms of the same word.

Besides that, the tsvector type filters the stop words. These common words appear in almost every sentence but don’t add much value when searching through text. In the example above, since we used the English dictionary, PostgreSQL automatically filtered out words like “the” and “over.” This is a test.

tsquery#

The tsquery data type is designed to store the text we want to search for. To easily convert a string into the tsquery format, we must use the to_tsquery function.

SELECT to_tsquery('fox');

To verify if the tsvector data matches a tsquery, we must use the @@ operator.

SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ to_tsquery('dog');

We can use the boolean operators such as &, |, and !. For example, the ! operator helps ensure that a particular word is not included in the text.

SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ to_tsquery('!cat');

Check out the official documentation for a description of all operators.

Another helpful function is plainto_tsquery. It converts an unformatted phrase to a query by inserting the & operator between words, making it a great option for handling the user’s input.

SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ plainto_tsquery('brown fox');

Working with existing data#

In the previous parts of this series, we’ve worked with the following database schema.

database-schema.ts#

import { pgTable, serial, text } from "drizzle-orm/pg-core"
 
export const articles = pgTable("articles", {
  id: serial("id").primaryKey(),
  title: text("title"),
  content: text("content"),
})
export const databaseSchema = {
  articles,
}

Since our table does not contain a tsvector column, we must find another solution. The most straightforward approach is to convert our text to tsvector on the fly.

articles.service.ts#

import { Injectable } from "@nestjs/common"
import { DrizzleService } from "../database/drizzle.service"
import { databaseSchema } from "../database/database-schema"
import { sql } from "drizzle-orm"
@Injectable()
export class ArticlesService {
  constructor(private readonly drizzleService: DrizzleService) {} // ...
  searchByQuery(query: string) {
    return this.drizzleService.db
      .select()
      .from(databaseSchema.articles)
      .where(
        sql`to_tsvector('english', ${databaseSchema.articles.content}) @@ plainto_tsquery(${query})`,
      )
  }
}

To use this approach in our NestJS application, we should use an optional query param that allows the user to provide the search query.

articles.controller.ts#

import { Controller, Get, Query } from "@nestjs/common"
import { ArticlesService } from "./articles.service"
@Controller("articles")
export class ArticlesController {
  constructor(private readonly articlesService: ArticlesService) {}
  @Get()
  getAll(@Query("search") searchQuery: string) {
    if (searchQuery) {
      return this.articlesService.searchByQuery(searchQuery)
    }
    return this.articlesService.getAll()
  } // ...
}

We can combine the title and content columns to search through them both.

articles.service.ts#

import { Injectable } from "@nestjs/common"
import { DrizzleService } from "../database/drizzle.service"
import { databaseSchema } from "../database/database-schema"
import { sql } from "drizzle-orm"
@Injectable()
export class ArticlesService {
  constructor(private readonly drizzleService: DrizzleService) {} // ...
  searchByQuery(query: string) {
    return this.drizzleService.db.select().from(databaseSchema.articles).where(sql`
          to_tsvector(
            'english',
             ${databaseSchema.articles.content} || ' ' || ${databaseSchema.articles.title}
          ) @@ plainto_tsquery(${query})
        `)
  }
}

The key problem with this approach is that it forces PostgreSQL to convert the text from every record in the articles table, which can be very time-consuming. Instead, we can define a generated column that automatically transforms the data into the tsvector format.

Unfortunately, the Drizzle ORM does not support the tsvector columns. To deal with that, we have to define a custom type.

database-schema.ts#

import { sql, SQL } from "drizzle-orm"
import { customType, index, pgTable, serial, text } from "drizzle-orm/pg-core"
 
const tsvector = customType<{ data: unknown }>({
  dataType() {
    return "tsvector"
  },
})
export const articles = pgTable(
  "articles",
  {
    id: serial("id").primaryKey(),
    title: text("title"),
    content: text("content"),
    textTsvector: tsvector("text_tsvector").generatedAlwaysAs(
      (): SQL => sql`
      to_tsvector('english', ${articles.title} || ' ' || ${articles.content})
    `,
    ),
  },
  (table) => {
    return {
      textTsvectorIndex: index("text_tsvector_index").using("gin", table.textTsvector),
    }
  },
)
export const databaseSchema = {
  articles,
}

Above, we create a stored generated column. PostgreSQL automatically updates it whenever the content or title columns change.

We also create a Generalized Inverted Index (GIN). This index type is highly effective for text searching and is ideal when a column contains multiple values. Implementing a GIN index can significantly enhance the speed of our SELECT queries.

Thanks to this approach, we can now simplify our query.

articles.service.ts#

import { Injectable } from "@nestjs/common"
import { DrizzleService } from "../database/drizzle.service"
import { databaseSchema } from "../database/database-schema"
import { sql } from "drizzle-orm"
@Injectable()
export class ArticlesService {
  constructor(private readonly drizzleService: DrizzleService) {}
  searchByQuery(query: string) {
    return this.drizzleService.db
      .select()
      .from(databaseSchema.articles)
      .where(sql`${databaseSchema.articles.textTsvector} @@ plainto_tsquery(${query})`)
  } // ...
}

Ordering our results#

Until now, we haven’t focused on the order of the results in our query. Sorting the search results by relevance can significantly improve the user experience.

For instance, we can prioritize the text from the title column over the content column. To achieve this, we can modify how we create the text_tsvector column and use the setweight function.

database-schema.ts#

import { sql, SQL } from "drizzle-orm"
import { customType, index, pgTable, serial, text } from "drizzle-orm/pg-core"
 
const tsvector = customType<{ data: unknown }>({
  dataType() {
    return "tsvector"
  },
})
export const articles = pgTable(
  "articles",
  {
    id: serial("id").primaryKey(),
    title: text("title"),
    content: text("content"),
    textTsvector: tsvector("text_tsvector").generatedAlwaysAs(
      (): SQL => sql`
      setweight(to_tsvector('english', ${articles.title}), 'A') ||
      setweight(to_tsvector('english', ${articles.content}), 'B')
    `,
    ),
  },
  (table) => {
    return {
      textTsvectorIndex: index("text_tsvector_index").using("gin", table.textTsvector),
    }
  },
)
export const databaseSchema = {
  articles,
}

With the setweight function, we assign a weight to each term in the tsvector column. The A indicates the highest weight, and the D indicates the lowest.

Thanks to this, we can use the ts_rank function to order the results based on the weight of each column.

import { Injectable } from "@nestjs/common"
import { DrizzleService } from "../database/drizzle.service"
import { databaseSchema } from "../database/database-schema"
import { sql } from "drizzle-orm"
@Injectable()
export class ArticlesService {
  constructor(private readonly drizzleService: DrizzleService) {}
  searchByQuery(query: string) {
    return this.drizzleService.db
      .select()
      .from(databaseSchema.articles)
      .where(sql`${databaseSchema.articles.textTsvector} @@ plainto_tsquery(${query})`)
      .orderBy(
        sql`ts_rank(${databaseSchema.articles.textTsvector}, plainto_tsquery(${query})) DESC`,
      )
  } // ...
}

With this approach, if our query matches the title of one article and the content of another, the article with the title match will be ranked higher.

Summary#

In this article, we implemented the full-text search functionality in an application that uses PostgreSQL, NestJS, and the Drizzle ORM. To do that, we had to understand the tsvector and tsquery data types and learn how to use them in the Drizzle ORM. To boost the performance, we created a generated column and set up a Generalized Inverted Index. Thanks to this, we achieved a fast and efficient search mechanism, even though the Drizzle ORM does not natively support some of the necessary features.