Health Check
The RapidFort platform is deployed on AWS EKS via the RapidFort Helm chart. The EKS cluster with RapidFort deployed can be monitored in various ways by Customers, but RapidFort recommends using the kube-prometheus stack. This is a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy-to-operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.
Setup
The kube-prometheus stack is also deployed via a Helm Chart.
RapidFort uses an override-values.yaml file to:
- Enable authentication to Grafana, Prometheus, and alertmanager
- Allow for the RapidFort-specific monitoring of the platform
- Download a copy of the RapidFort override-values.yaml
- Update variables
<email-smtp-host>
- SMTP HOST address<smtp from email>
- From Email Address. This email address will be used to send an email.<smtp username>
- SMTP Username<smtp password>
- SMTP Password<webhook url>
- Webhook URL to integrate with any IM Services i.e. Slack.<alertmanager-FQDN>
- FQDN to access Alert Manager UI i.e. alertmanager.domain.com<grafana password>
- Set Grafana password<grafana-FQDN>
- FQDN to access Grafana UI i.e. grafana.domain.com<prometheus-FQDN>
- FQDN to access Grafana UI i.e. prometheus.domain.como
- Create a K8s Secret
htpasswd -c auth <username>
- Note: Follow the instruction to provide the password
kubectl create secret generic basic-auth --from-file=auth
- Add the prometheus Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
- Install Prometheus, Grafana, and Alert manager using Helm
export RELEASE-NAME=<RELEASE_NAME>
helm install ${RELEASE-NAME} prometheus-community/kube-prometheus-stack -f override-values.yaml
- Login to Grafana UI
- Once all the pods are in running state, open Grafana in your browser with the FQDN and the password set in #1.
Monitoring Steps
- Open Grafana in your browser
- Check Alerting
- Select "Alerting" on the left side of the Grafana UI
- Filter on "firing" state. This is for filters breaking rules / thresholds e.g.
- Nginx high HTTP 5xx error rate
- Kubernetes Pod not healthy
- Review and address any Firing Alerts
- Expand these alerts for more information to help investigate any issues
- Monitor Realtime Dashboards
- Select "Search Dashboards" on the left side of the Grafana UI
- Note users can star visited dashboards and they will show on the main Grafana home page
- Alerting above saves the need for an end user to constantly monitor dashboards
- Review dashboards of interest e.g.
- "Kubernetes" -> "Compute Resources" -> "Pods" for realtime CPU usage, memory usage, I/O etc. of the RapidFort Pods.
- "NGINX Ingress controller" for realtime network I/O, latency, ingress request volume, success rate etc. of the NGINX controller.
- Select "Search Dashboards" on the left side of the Grafana UI
- Create Custom Dashboards (optional)
- Users can build their own custom dashboards by selecting "Create (+ Sign)" -> "Dashboard" on the left side of the Grafana UI