Complete Guide to Calculating Request Totals in Time Windows Using PromQL

Keywords: Prometheus | PromQL | Grafana | increase function | counter monitoring

Abstract: This article provides a comprehensive guide on using Prometheus Query Language to calculate HTTP request totals within specific time ranges in Grafana dashboards. Through in-depth analysis of the increase() function mechanics and sum() aggregation operator applications, combined with practical code examples, readers will master the core techniques for building accurate monitoring panels. The article also explores Grafana time range variables and addresses common counter type selection issues.

Fundamental Concepts of Prometheus Counters

In monitoring systems, accurately calculating request counts within specific time windows is crucial for building effective dashboards. Prometheus provides dedicated Counter metric types to track cumulative counts, but directly querying counter values returns the total accumulation since monitoring began, rather than the increment within a specific period.

Counter metrics like http_requests_total monotonically increase over time, recording all requests processed since system startup. This design makes direct queries unsuitable for the requirement of "displaying request counts within selected time periods," necessitating the use of special PromQL functions.

Core Functionality of the increase() Function

The increase() function is the central tool for solving incremental calculation problems within time windows. This function is specifically designed for counter metrics, accurately calculating value growth within specified time ranges. Its working principle involves comparing counter values at the start and end of the time range, automatically handling potential counter resets.

When calculating the total number of requests over the past 24 hours, the basic query expression is:

increase(http_requests_total[24h])

This query returns a time series showing the increase value of the http_requests_total counter over the past 24 hours. The function internally handles sampling intervals and counter resets to ensure result accuracy.

Aggregation Processing in Multi-Instance Environments

In real production environments, applications are typically deployed across multiple instances, each independently reporting its own http_requests_total counter. To obtain the total request count for the entire system, the sum() aggregation operator must be used to sum incremental values from all instances.

The complete aggregation query expression is as follows:

sum(increase(http_requests_total[24h]))

This query first calculates the request increment for each instance over the past 24 hours, then sums the increment values from all instances to obtain the total request count for the entire system during that period. The sum() operator groups and aggregates based on the same label dimensions, ensuring result correctness.

Grafana Time Range Integration

In Grafana dashboards, users can dynamically adjust the displayed time range using the time selector in the upper right corner. To ensure queries automatically adapt to the selected time window, special variables provided by Grafana can be utilized.

While earlier versions required complex configurations, Grafana 5.3 introduced the $__range variable to simplify this process. This variable automatically replaces with the current dashboard's selected time range, enabling queries to dynamically adapt to different time windows.

Example query using time range variables:

sum(increase(http_requests_total[$__range]))

The advantage of this approach is that the query expression doesn't require hardcoded specific time ranges, automatically adapting to the dashboard's time selection, significantly improving dashboard flexibility and usability.

Considerations for Counter Type Selection

The issues mentioned in the reference article highlight the importance of metric type selection. When using Gauge types instead of Counter types to record request counts, different query challenges arise.

Gauge types represent instantaneous values, suitable for recording current states (like memory usage, active connections), but inappropriate for cumulative events. If Gauges are mistakenly used to record request counts, the standard increase() function won't work correctly because Gauge values can arbitrarily increase or decrease, lacking monotonic increasing characteristics.

The correct approach is to clearly define metric types during system design: for cumulative events like request counts, always use Counter types; for instantaneous state values, use Gauge types. This type selection directly impacts subsequent query feasibility and accuracy.

Practical Application Scenario Analysis

In actual API monitoring scenarios, complete queries typically require combining label filters to target specific services or endpoints. For example, to monitor request counts for specific product IDs, the query can be extended to:

sum(increase(http_requests_total{product_id=~"$product_id"}[$__range]))

This query structure allows operations teams to create granular monitoring panels for different business dimensions (such as product lines, environments, regions, etc.), providing accurate data support for capacity planning, performance analysis, and troubleshooting.

By reasonably combining PromQL functions and Grafana variables, accurate and flexible monitoring solutions can be built to meet various complex business monitoring requirements.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.