kube-prometheus-stack

September 14, 2022

What is prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.

What is Grafana?

Grafana is open source visualization and analytics software. It allows you to query, visualize, alert on, and explore your metrics no matter where they are stored. In plain English, it provides you with tools to turn your time-series database (TSDB) data into beautiful graphs and visualizations.

What is helm

Helm is the best way to find, share, and use software built for Kubernetes.

We will use helm to install Prometheus & Grafana monitoring tools for this chapter.

# add prometheus Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# add grafana Helm repo
helm repo add grafana https://grafana.github.io/helm-charts

Deploy prometheus and grafana on K8S by helm

Search repo prometheus

helm search repo prometheus

In this chapter, we will using repo prometheus-community/kube-prometheus-stack version 39.12.1

Pull repo

helm pull prometheus-community/kube-prometheus-stack --version 39.12.1
tar -xvf kube-prometheus-stack-39.12.1.tgz
cp kube-prometheus-stack/values.yaml values-prometheus-grafana-stack.yaml

Config values-prometheus-grafana-stack.yaml

AlertManager

Ingress

At line 221 - section ingress of alertmanager, we config as same as the screenshot below

Don’t forget enable ingress and set hosts

Persistent on eks

alertmanager:
    alertmanagerSpec:
        storage: 
            volumeClaimTemplate:
                spec:
                storageClassName: gp2
                accessModes: ["ReadWriteOnce"]
                resources:
                    requests:
                    storage: 50Gi

Prometheus

Ingress

At line 1997 - section ingress of prometheus, we aslo config as same as the screenshot below

Persistent on eks

prometheus:
    prometheusSpec:
        storageSpec: 
            volumeClaimTemplate:
                spec:
                storageClassName: gp2
                accessModes: ["ReadWriteOnce"]
                resources:
                    requests:
                    storage: 50Gi

Addtional Rules

With prometheus in this repo is provided a lot of rules in order to monitoring and alert. But we can add rules by config as follows

additionalPrometheusRules:
 - name: custom.rules
   groups:
     - name: 360tovisit
       rules:
       - record: my_record
         expr: sum by (cpu) (rate(node_cpu_seconds_total{node!='idle'}[1m])) > 6

When the number of rules is large, the helm-value file will become very large, cumbersome and difficult to manage.

So we recommend using this way by create a new object as follows

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: prometheus-grafana-kube-pr-test.rules 
  namespace: prometheus
  labels:
    app: kube-prometheus-stack #This label matches the ruleSelector configuration so that it is automatically loaded into Prometheus
spec: 
  groups:
  - name: test.rules #
This is the name shown in the PrometheusRule listing on K8S
    rules:
    - alert: Testing 
#Rule name displayed on Prometheus' Rule section on Prometheus web
      annotations:
          description: Too many request
          summary: Too many request
      expr: |-
        sum by (cpu) (rate(node_cpu_seconds_total{node!='idle'})) > 6 
#Comparison conditions to generate alarms
      for: 10m #Time to reach the pre-warning condition.
      labels:
        severity: warning

Then, we will apply it with command

    kubectl  apply -f AlertRule.yaml

We can get rules with command

    kubectl -n prometheus get PrometheusRule

We must config ruleSelector in order to rules is automatically loaded into Prometheus

prometheus:
    prometheusSpec:
        ruleSelector: 
            matchExpressions:
                - key: app
                operator: In
                values:
                    - kube-prometheus-stack

Grafana

Ingress

At line 677 - section ingress of grafana, we also config as same as the screenshot below

AdminPassword

At line 670 - adminPassword, change it

Persistent on eks

Because Grafana is not persistent, we could find solution to resolve this issue at here

grafana:
    persistence:
        type: pvc
        enabled: true
        storageClassName: gp2
        accessModes:
        - ReadWriteOnce
        size: 50Gi
        finalizers:
        - kubernetes.io/pvc-protection

Notification

To recive notification, we config as follows

  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['namespace', 'job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'telegram'
      routes:
      - receiver: 'telegram'
        group_wait: 10s
        matchers:
        -  namespace="prometheus"
    receivers:
    - name: 'telegram'
      telegram_configs:
      - send_resolved: 'true'
        bot_token: '5835412393:AAH2piHBLa9-27yIagIxffDS9GuIRCabAhI'
        chat_id: '-1001641231215'
        api_url: 'https://api.telegram.org'
        parse_mode: 'MarkdownV2'
    templates:
    - '/etc/alertmanager/config/*.tmpl'

Deploy

kubectl create ns prometheus-grafana-stack

helm install prometheus-grafana-stack \
    -f values-prometheus-grafana-stack.yaml kube-prometheus-stack \
    -n prometheus-grafana-stack \
    --set persistence.storageClassName="gp2" \
    --set persistence.enabled=true

Release

helm upgrade prometheus-grafana-stack \
    -f values-prometheus-grafana-stack.yaml kube-prometheus-stack \
    -n prometheus-grafana-stack \
    --set persistence.storageClassName="gp2" \
    --set persistence.enabled=true

Remove

helm uninstall prometheus-grafana-stack \
    -n prometheus-grafana-stack 
kubectl delete ns prometheus-grafana-stack

Monitoring

Import dashboard

We will search on google with keyword grafana k8s node dashboard, after review a lot of recommend, we choose a dashboard and click Copy ID to clipboard

Now, back to grafana:

Click '+' button on left panel and select ‘Import’.
Enter ID dashboard id under Grafana.com Dashboard.
Click ‘Load’.
Select ‘Prometheus’ as the endpoint under prometheus data sources drop down.
Click ‘Import’.

The result after import dashboard

Monitoring all pods

In this repo prometheus-community/kube-prometheus-stack is included dashboard monitor all pods, so we access index page and then choose kubernetes-compute-resources-pod dashboard on list. After access that dashboard, we choose Data source, namespace and pod, it will show consume of pod