6 min read
Original source

Generating statistics using aggregate functions in raw SQL

So far, we’ve been mostly writing SQL queries that either store or retrieve the data from the database. Besides that, we can rely on PostgreSQL to process the…

So far, we’ve been mostly writing SQL queries that either store or retrieve the data from the database. Besides that, we can rely on PostgreSQL to process the data and get the computed results. By doing that, we can learn more about the rows in our tables. In this article, we look into how we can use aggregate functions to generate statistics about our data. For the code from this article check out this repository. The purpose of aggregate functions The job of an aggregate function is to compute a single result from multiple input rows. One of the most popular aggregate functions is count(). When used with an asterisk, it measures the total number of rows in the table.SELECT count(*) FROM users When we provide the count function with a column name, it counts the number of rows with a non-NULL value for that column.SELECT count(address_id) AS number_of_users_with_address FROM users The count() function was handy in the previous article when we implemented pagination.import { Injectable } from '@nestjs/common'; import DatabaseService from '../database/database.service'; import PostModel from './post.model'; @Injectable() class PostsRepository { constructor(private readonly databaseService: DatabaseService) {} async get(offset = 0, limit: number | null = null, idsToSkip = 0) { const databaseResponse = await this.databaseService.runQuery( ` WITH selected_posts AS ( SELECT * FROM posts WHERE id > $3 ORDER BY id ASC OFFSET $1 LIMIT $2 ), total_posts_count_response AS ( SELECT COUNT(*)::int AS total_posts_count FROM posts ) SELECT * FROM selected_posts, total_posts_count_response `, [offset, limit, idsToSkip], ); const items = databaseResponse.rows.map( (databaseRow) => new PostModel(databaseRow), ); const count = databaseResponse.rows[0]?.total_posts_count || 0; return { items, count, }; } // ... } export default PostsRepository; Grouping data in the table Aggregate functions work great when we perform them on groups of dataSELECT author_id, count(*) FROM posts GROUP BY author_idWhen we do the above, PostgreSQL divides the data into groups and runs the aggregate function on each group individually. We could make our query even more helpful and order our results. By doing that, we can ensure the authors with the highest number of posts are at the top of the list.SELECT author_id, count(*) AS posts_count FROM posts GROUP BY author_id ORDER BY posts_count DESCLet’s create a model that can hold the above data. postAuthorStatistics.model.ts export interface PostAuthorStatisticsModelData { author_id: number; posts_count: number; } class PostAuthorStatisticsModel { authorId: number; postsCount: number; constructor(postAuthorStatisticsData: PostAuthorStatisticsModelData) { this.authorId = postAuthorStatisticsData.author_id; this.postsCount = postAuthorStatisticsData.posts_count; } } export default PostAuthorStatisticsModel;Let’s create a separate statistics repository to prevent our PostsRepository class from getting too big. postsStatistics.repository.ts import { Injectable, } from '@nestjs/common'; import DatabaseService from '../database/database.service'; import PostAuthorStatisticsModel from './postAuthorStatistics.model'; @Injectable() class PostsStatisticsRepository { constructor(private readonly databaseService: DatabaseService) {} async getPostsAuthorStatistics() { const databaseResponse = await this.databaseService.runQuery( ` SELECT author_id, count(*)::int AS posts_count FROM posts GROUP BY author_id ORDER BY posts_count DESC `, [], ); return databaseResponse.rows.map( (databaseRow) => new PostAuthorStatisticsModel(databaseRow), ); } } export default PostsStatisticsRepository; The count() function returns the value using the bigint data type. Because of that, we convert it to a regular integer. If you want to know more, check out the previous article. We also need to point to our new repository in the PostsService class. posts.service.ts import { Injectable } from '@nestjs/common'; import PostsRepository from './posts.repository'; import PostsStatisticsRepository from './postsStatistics.repository'; @Injectable() export class PostsService { constructor( private readonly postsRepository: PostsRepository, private readonly postsStatisticsRepository: PostsStatisticsRepository, ) {} getPostAuthorStatistics() { return this.postsStatisticsRepository.getPostsAuthorStatistics(); } // ... }The last step is to use it in the controller. posts.controller.ts import { ClassSerializerInterceptor, Controller, Get, Query, UseInterceptors, } from '@nestjs/common'; import { PostsService } from './posts.service'; import GetPostsByAuthorQuery from './getPostsByAuthorQuery'; import PaginationParams from '../utils/paginationParams'; @Controller('posts') @UseInterceptors(ClassSerializerInterceptor) export default class PostsController { constructor(private readonly postsService: PostsService) {} @Get() getPosts( @Query() { authorId }: GetPostsByAuthorQuery, @Query() { offset, limit, idsToSkip }: PaginationParams, ) { return this.postsService.getPosts(authorId, offset, limit, idsToSkip); } @Get('statistics') getStatistics() { return this.postsService.getPostAuthorStatistics(); } // ... } Other aggregate functions There are more aggregate functions besides count(). Let’s go through them. max and min Using the max() function, we can find the largest value of the selected column. Respectively, the min() function returns the smallest value of the column. Since we don’t have any numerical columns in our posts, let’s pair the above functions with length(). This way, we can get the longest and shortest posts of a particular author. The length() function returns the length of a string. postsStatistics.repository.ts import { Injectable } from '@nestjs/common'; import DatabaseService from '../database/database.service'; import PostAuthorStatisticsModel from './postAuthorStatistics.model'; @Injectable() class PostsStatisticsRepository { constructor(private readonly databaseService: DatabaseService) {} async getPostsAuthorStatistics() { const databaseResponse = await this.databaseService.runQuery( ` SELECT author_id, count(*)::int AS posts_count, max(length(post_content)) AS longest_post_length, min(length(post_content)) AS shortest_post_length FROM posts GROUP BY author_id ORDER BY posts_count DESC `, [], ); return databaseResponse.rows.map( (databaseRow) => new PostAuthorStatisticsModel(databaseRow), ); } } export default PostsStatisticsRepository; Running max() or min() on a text column returns a string based on the alphabetical order. sum With the sum() function, we can return a total sum of a particular column. Since it only works with numerical values, we also need the length() function. Since the sum() function also returns the value in the bigint format, we transform it to a regular integer. We can do it because we don’t expect values bigger than 2³¹⁻¹. postsStatistics.repository.ts import { Injectable } from '@nestjs/common'; import DatabaseService from '../database/database.service'; import PostAuthorStatisticsModel from './postAuthorStatistics.model'; @Injectable() class PostsStatisticsRepository { constructor(private readonly databaseService: DatabaseService) {} async getPostsAuthorStatistics() { const databaseResponse = await this.databaseService.runQuery( ` SELECT author_id, count(*)::int AS posts_count, max(length(post_content)) AS longest_post_length, min(length(post_content)) AS shortest_post_length, sum(length(post_content))::int AS all_posts_content_sum FROM posts GROUP BY author_id ORDER BY posts_count DESC `, [], ); return databaseResponse.rows.map( (databaseRow) => new PostAuthorStatisticsModel(databaseRow), ); } } export default PostsStatisticsRepository; avg The avg() function calculates the average of the values in a group. Let’s combine it with the length() function to calculate the average length of all posts of a particular author. postsStatistics.repository.ts import { Injectable } from '@nestjs/common'; import DatabaseService from '../database/database.service'; import PostAuthorStatisticsModel from './postAuthorStatistics.model'; @Injectable() class PostsStatisticsRepository { constructor(private readonly databaseService: DatabaseService) {} async getPostsAuthorStatistics() { const databaseResponse = await this.databaseService.runQuery( ` SELECT author_id, count(*)::int AS posts_count, max(length(post_content)) AS longest_post_length, min(length(post_content)) AS shortest_post_length, sum(length(post_content))::int AS all_posts_content_sum, avg(length(post_content))::real AS average_post_content_length FROM posts GROUP BY author_id ORDER BY posts_count DESC `, [], ); return databaseResponse.rows.map( (databaseRow) => new PostAuthorStatisticsModel(databaseRow), ); } } export default PostsStatisticsRepository;A significant thing about the avg() function is that it returns the data in the numeric type. It can store many digits and is very useful when exactness is crucial. Parsing this data type to JSON converts it to a string by default. Since we don’t need many digits after the decimal, we convert it to the real data type. Aggregating data from more than one table So far, we’ve been grouping and aggregating data in

Generating statistics using aggregate functions in raw SQL | NestJS.io