Photo by Carlos Muza on Unsplash
Monitoring Nginx Logs
And increase the stability of your web application
Are you using Nginx? Are you monitoring your Nginx logs?
Do you know you can increase the stability and reliability of your web application by monitoring nginx
logs?
I will show you how to monitor Nginx logs, use them as metrics and make the most out of them.
Why monitor Nginx logs?
To ensure the availability of our system — A system is available when there are no failures or failures that do not exceed a certain threshold. By analyzing the HTTP status code, we can determine the failure percentage and ensure the availability of our system.
![]() |
Latency — Availability is one metric, but latency is also crucial. We will lose our customers when our system takes too long to respond.
Removing deprecated endpoints — Sometimes, we would like to remove old deprecated endpoints. But we can’t remove it directly as some of our customers might be using it. We could analyze endpoint usage by looking into the history and removing it accordingly.
Basic workflow
The workflow is as follows
- We collect Nginx metrics using filebeat’s module [1].
- Filebeat publishes collected metrics to elasticsearch.
- Using Kibana, we can visualize these metrics.
- We can inform our team using watchers [2].
![]() |
Proof of concept
Source code — monitor-nginx-logs
Collect Nginx metrics
We collect Nginx metrics using the following code.
filebeat.modules:
- module: nginx
access:
var.paths: ["/var/log/nginx/host.access.log"]
Metrics are published to elastic search using the following code.
output.elasticsearch:
hosts: ["http://elasticsearch:9200"]
You can view collected metrics using Discover
in kibana at localhost:5601. http.response.status_code
is the metric of interest.
![]() |
Visualize HTTP status code in Kibana
Select Visualize Library
and create a visualization
Select TSVB
Group the HTTP status code as shown in the following diagram.
![]() |
Sorry, it’s hard to read. I tried to zoom in but it was still not readable. Hence I am writing it for you down.
Left hand side: http.response.status_code>= 200 and http.response.status_code<= 299
Right hand side: 200
Similarly, adjust it for other response codes
Finally, we can see the HTTP status code over time.
![]() |
Inform the team upon failed requests
You can use the watcher API [3] to inform your team. The watcher will retrieve data at regular intervals, and inform our team when a condition is met. Upon receiving notifications, the team acts accordingly.
Suppose we want to trigger watcher at every 1h.
"trigger": {
"schedule": {
"interval": "1h"
}
}
As we store metrics in filebeat
index, hence
"indices": [
"<filebeat-*-{now/d}>"
]
We can determine the failed requests using the following query.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": "now-1h"
}
}
},
{
"exists": {
"field": "http.response.status_code"
}
}
]
}
},
"aggs": {
"response_code_ranges": {
"range": {
"field": "http.response.status_code",
"keyed": true,
"ranges": [
{
"key": "server_errors",
"from": 500,
"to": 599
}
]
}
}
}
}
The following condition will trigger our watcher and inform our team when the percentage of failed requests exceeds 0.3%
. You can adjust this threshold as per your need.
"condition": {
"script": {
"source": "(double) ctx.payload.aggregations.response_code_ranges.buckets.server_errors.doc_count/(double) ctx.payload.hits.total > params.threshold",
"lang": "painless",
"params": {
"threshold": 0.003
}
}
}
To inform team via slack
-- copied from https://www.elastic.co/guide/en/elasticsearch/reference/7.17/actions-slack.html
"actions" : {
"notify-slack" : {
"transform" : { ... },
"throttle_period" : "5m",
"slack" : {
"message" : {
"to" : [ "#admins", "@chief-admin" ],
"text" : "..."
}
}
}
}
Thanks for reading.
Resources
[1] filebeat-module-nginx.html
[2] watcher-ui.html
[3] watcher-api-put-watch.html