Join our community of software engineering leaders and aspirational developers. Always
stay in-the-know by getting the most important news and exclusive content delivered
fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter
in the past. Click the button below to open the re-subscribe form
in a new tab. When you're done, simply close that tab and continue
with this form to complete your subscription.
The New Stack does not sell your information or share it with
unaffiliated third parties. By continuing, you agree to our
Terms of Use and
Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!
We’re so glad you’re here. You can expect all the best TNS content to arrive
Monday through Friday to keep you on top of the news and at the top of your game.
What’s next?
Check your inbox for a confirmation email where you can adjust your preferences
and even join additional groups.
Follow TNS on your favorite social media networks.
Fluent Bit is one of the most widely used open source data collection agents for logs, metrics and traces. It’s lightweight, high-performance and easily extensible, making it ideal for modern observability pipelines.
At its core, Fluent Bit is a simple data pipeline comprising various stages, as illustrated in the diagram below.
However, even the most efficient pipeline can hit a bottleneck known as backpressure, which occurs when data is ingested at a rate that exceeds the system’s ability to process and flush it. Backpressure causes problems such as high memory usage, service downtime and data loss.
Let’s explore how to monitor and alert on backpressure in Fluent Bit, enabling you to maintain a healthy logging pipeline.
Prerequisites
Docker: Installed on your system.
Elasticsearch: We will send logs to Elasticsearch. To follow along, refer to this guide.
Familiarity with Fluent Bit concepts: Such as inputs, outputs, parsers and filters. If you’re unfamiliar with these concepts, please refer to the official documentation.
Understanding Backpressure in Fluent Bit
In high-throughput logging pipelines, Fluent Bit ingests data faster than downstream outputs (HTTP endpoints, databases, storage backends) can accept it. This mismatch between the input rate and output rate gives rise to backpressure, a condition in which buffers grow, memory consumption increases and input must be throttled or paused.
Backpressure Example
A classic example is reading from large log files (or having big backlogs) and trying to dispatch events to a backend over the network. If the backend is slow or unavailable, buffered data accumulates in Fluent Bit.
If unchecked, backpressure can lead to excessive memory usage, performance degradation or even data loss.
Mechanisms To Control Backpressure
Fluent Bit implements several controls to limit how much data an input plugin can feed into the pipeline under strain:
Control
When Applicable
Behavior / Effect
`Mem_Buf_Limit`
Only when `storage.type` is memory (the default)
Sets an upper bound for how much in-memory data can be queued. When memory usage exceeds this limit, Fluent Bit triggers a pause callback on the input, preventing new data from being ingested until the buffer is drained.
`storage.max_chunks_up`
When using `storage.type filesystem` or in hybrid (memory + filesystem) mode
Controls how many memory “chunks” can be held before transitions or limits are enforced. Once the limit is reached, Fluent Bit may stop buffering new data in memory and switch to filesystem-only buffering (if enabled).
For more information about backpressure in Fluent Bit, refer to the official documentation.
Key Metrics To Monitor for Fluent Bit Backpressure
Fluent Bit exposes its internal state using Prometheus metrics, which are essential for Fluent Bit monitoring and alerting on backpressure. The following table provides an overview of the essential metrics to monitor for detecting and troubleshooting backpressure in Fluent Bit:
Metric Name
Description
Backpressure Relevance
Input Metrics
`fluentbit_input_ingestion_paused`
When inputs are paused, this metric sets to 1
Detect when inputs are paused due to backpressure
`fluentbit_input_storage_overlimit`
When inputs are over storage limits, this metric sets to 1
To understand how backpressure works in practice, let’s set up a scenario that allows us to observe it in action.
Here’s an explanation of each component:
1. High-Volume Input
This represents a source generating logs at a high rate. For testing purposes, you could use the `tail` input plugin reading from a file that’s being rapidly written to, or the `dummy` input plugin is configured to generate messages at a high rate.
The internal buffer holds chunks until they can be delivered.
Output plugins attempt to send the data to destinations.
3. Elasticsearch
This represents a destination that can’t keep up with the input rate. To control the number of requests processed by Elasticsearch, we will add a proxy server between Fluent Bit and Elasticsearch and configure the proxy with rate limiting.
Grafana dashboards visualize the metrics for analysis and alerts.
This monitoring setup enables us to observe backpressure as it occurs and understand its causes and effects.
Let’s see the above setup in action.
Instructions
1. Clone the Repository
Start by cloning the repository that contains the necessary configuration files.
git clone https://github.com/sharadregoti/fluent-bit-backpressure-monitoring.git
cd fluent-bit-backpressure-monitoring
2. Start Elasticsearch (Optional)
We will run Elasticsearch in a Docker container. If you already have ElasticSearch running, you can skip this step.
cd elasticsearch/elastic-start-local
docker compose up -d
cd -
It will take a couple of minutes to set up Elasticsearch and Kibana. The default username and password are `elastic` and `rslglTS4`.
3. Modify Configuration Files
In the `nginx/nginx.conf` file, replace `http://<your-ip-addr>:9200` with your Elasticsearch host and port. Note: If you are running Elasticsearch locally in a Docker container as described above, use the public IPv4 address assigned to your machine instead of localhost.
In the `fluent-bit/config/fluent-bit.yaml` output section, replace `<your-username>` and `<your-password>` with your Elasticsearch authentication credentials.
The dummy input plugin is used to generate artificial log events at a high rate:
rate: 350: Produces 350 log records per second. This ensures the pipeline is stressed enough to trigger backpressure.
samples: -1: Runs indefinitely, so the log generation doesn’t stop.
mem_buf_limit: 2M: Sets a very small memory buffer limit of just 2MB. Since each log event is enriched later with additional fields, this buffer fills quickly, which helps simulate a backpressure scenario.
Filters
The Lua filter simulates processing overhead and inflates each record:
A small loop (`for i=1,1000`) introduces CPU work, mimicking real-world processing delays.
New fields are added (`hostname`, `environment`), and a `data` field containing 1KB of repeated characters is injected. This increases the payload size for each log.
Together, the extra CPU load and message size increase the stress on the pipeline. The filter ensures that Fluent Bit consumes resources while handling logs, not just passing them through.
Outputs
The Elasticsearch output plugin sends logs to an external system; however, in this demo, it’s intentionally routed through an NGINX proxy instead of being sent directly to Elasticsearch.
host: nginx-proxy / port: 9000: The NGINX proxy is configured with rate-limiting rules. This acts as a bottleneck, slowing down Fluent Bit’s ability to offload logs.
The result is that Fluent Bit starts buffering records, eventually hitting memory limits and demonstrating backpressure in action.
In a real-world environment, backpressure may occur when Elasticsearch (or another storage system) slows down due to a heavy indexing load. Here, the NGINX proxy is used to mimic that controlled slowdown.
5. Start the Services
Start all services using the command below. Note: Ensure Elasticsearch is up and running.
`docker-compose up -d`
Wait for a few moments to allow all services to initialize correctly.
6. Access Grafana
Open your web browser and navigate to `http://localhost:3000`. Log in with the default credentials (`admin/admin`) and skip the new password generation step when prompted.
7. Import the Dashboard
In Grafana, import the provided dashboard JSON file to visualize Fluent Bit metrics.
Go to the Dashboards section from the left sidebar.
After importing the dashboard, wait for a couple of minutes for the data to populate.
Backpressure propagates from Output to Input. You will start seeing an increase in the number of retried and dropped records from the output plugin (Elasticsearch).
You will also see the difference between the number of input records (yellow) processed vs. the output records (green).
Going to the next stage, when backpressure occurs, the Input plugin memory buffers (`mem_buf_limit`) become full, which causes the Fluent Bit to pause (`fluentbit_input_ingestion_paused set to 1`) ingesting new records. Input getting paused is a clear indication of backpressure.
9. Clean Up
The command below will stop all services.
docker compose down -v
If you have started local Elasticsearch, run this command.
cd elasticsearch/elastic-start-local
docker compose down -v
Setting up Fluent Bit Alerts on Backpressure
Configuring Fluent Bit alerts ensures you’re notified the moment backpressure begins to affect your data pipeline:
1. Input Paused Alert
Condition: `fluentbit_input_ingestion_paused > 0`
Evaluation: Every 1m for 2m
Notification message: “Input {{$labels.name}} is paused due to backpressure”
Notification message: “Output {{$labels.name}} experiencing errors at rate {{$value}}/s”
Conclusion
Monitoring and alerting on backpressure in Fluent Bit is essential for maintaining a healthy logging pipeline. By understanding the backpressure mechanisms, configuring appropriate limits and setting up monitoring and alerts, you can ensure that your Fluent Bit deployment handles high volumes of data efficiently and reliably.
The key takeaways are:
Configure appropriate memory and storage limits for your input plugins.
Monitor the key metrics related to backpressure.
Set up alerts to be notified when backpressure occurs.
Use the visualizations to understand the behavior of your Fluent Bit deployment.
By following these guidelines, you can effectively manage backpressure in Fluent Bit and ensure that your logging infrastructure remains robust and reliable.
Chronosphere, a Palo Alto Networks company, is the observability platform built for control in the modern, containerized world. Recognized as a leader by major analyst firms, Chronosphere empowers customers to focus on the data and insights that matter to reduce data complexity, optimize costs, and remediate issues faster. Visit chronosphere.io.