Health Check

The RapidFort platform is deployed on AWS EKS via the RapidFort Helm chart. The EKS cluster with RapidFort deployed can be monitored in various ways by Customers, but RapidFort recommends using the kube-prometheus stack. This is a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy-to-operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

Setup

The kube-prometheus stack is also deployed via a Helm Chart.

RapidFort uses an override-values.yaml file to:

Enable authentication to Grafana, Prometheus, and alertmanager
Allow for the RapidFort-specific monitoring of the platform

Download a copy of the RapidFort override-values.yaml
Update variables
- <email-smtp-host> - SMTP HOST address
- <smtp from email> - From Email Address. This email address will be used to send an email.
- <smtp username> - SMTP Username
- <smtp password> - SMTP Password
- <webhook url> - Webhook URL to integrate with any IM Services i.e. Slack.
- <alertmanager-FQDN> - FQDN to access Alert Manager UI i.e. alertmanager.domain.com
- <grafana password> - Set Grafana password
- <grafana-FQDN> - FQDN to access Grafana UI i.e. grafana.domain.com
- <prometheus-FQDN> - FQDN to access Grafana UI i.e. prometheus.domain.como
Create a K8s Secret
- htpasswd -c auth <username>
- Note: Follow the instruction to provide the password
- kubectl create secret generic basic-auth --from-file=auth
Add the prometheus Helm repo
- helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
- helm repo update
Install Prometheus, Grafana, and Alert manager using Helm
- export RELEASE-NAME=<RELEASE_NAME>
- helm install ${RELEASE-NAME} prometheus-community/kube-prometheus-stack -f override-values.yaml
Login to Grafana UI
- Once all the pods are in running state, open Grafana in your browser with the FQDN and the password set in #1.

Monitoring Steps

Open Grafana in your browser
Check Alerting
- Select "Alerting" on the left side of the Grafana UI
- Filter on "firing" state. This is for filters breaking rules / thresholds e.g.
  - Nginx high HTTP 5xx error rate
  - Kubernetes Pod not healthy
- Review and address any Firing Alerts
  - Expand these alerts for more information to help investigate any issues
Monitor Realtime Dashboards
- Select "Search Dashboards" on the left side of the Grafana UI
  - Note users can star visited dashboards and they will show on the main Grafana home page
  - Alerting above saves the need for an end user to constantly monitor dashboards
- Review dashboards of interest e.g.
  - "Kubernetes" -> "Compute Resources" -> "Pods" for realtime CPU usage, memory usage, I/O etc. of the RapidFort Pods.
  - "NGINX Ingress controller" for realtime network I/O, latency, ingress request volume, success rate etc. of the NGINX controller.
Create Custom Dashboards (optional)
- Users can build their own custom dashboards by selecting "Create (+ Sign)" -> "Dashboard" on the left side of the Grafana UI

Health Check

Setup​

Monitoring Steps​

Setup

Monitoring Steps