Compare commits

...

57 Commits

Author SHA1 Message Date
Ian Fijolek
9072d97bb8 Make linters happy 2022-01-24 10:39:53 -08:00
Ian Fijolek
cdd8a69669 Update go version 2021-12-01 14:47:58 -08:00
Ian Fijolek
3c14a02770 Continue checking all monitors after sending alert
Previously this was mistakenly returning after sending an alert. Now
all alerts will be sent unless there is an exception on one of them.
2021-09-02 10:20:04 -07:00
Ian Fijolek
328ea83c25 Some linting cleanup 2021-09-02 10:19:03 -07:00
Ian Fijolek
ce986e8d1d Roll back to alpine:3.12
Looks like there is a clock issue with raspbian

https://wiki.alpinelinux.org/wiki/Release_Notes_for_Alpine_3.13.0#time64_requirements
2021-05-12 19:06:41 -07:00
Ian Fijolek
31a4b484bf Merge branch 'duration-intervals' 2021-05-12 18:32:12 -07:00
Ian Fijolek
49e3635819 Add backwards compatility explanation in Readme 2021-05-12 16:37:59 -07:00
Ian Fijolek
444d060736 Remove qemu-user-static from Dockerfile and update alpine
My build machine now has proper qemu support added, so this is not needed
2021-05-12 23:22:24 +00:00
Ian Fijolek
860c2cdf43 Add custom type to parse out seconds as int and durations as strings 2021-05-12 10:33:42 -07:00
Ian Fijolek
befea7375f Add check runtime metric 2021-05-11 10:41:39 -07:00
Ian Fijolek
04395fa693 Add duration parsing tests 2021-05-11 10:41:39 -07:00
Ian Fijolek
bdf7355fa7 Add duration parsing for intervals 2021-05-11 10:41:39 -07:00
Ian Fijolek
30c2c7d6b2 Add Dockerfile linting back in 2021-05-10 21:53:26 -07:00
Ian Fijolek
5f250f17a8 Add more liniting and update to pass 2021-05-10 21:53:26 -07:00
Ian Fijolek
fda9e1bfc3 Replace log with slog 2021-05-10 21:53:26 -07:00
Ian Fijolek
f0e179851f Update linting and a test case 2021-01-08 18:31:22 -05:00
Ian Fijolek
9e124803da Add release uploads 2021-01-08 18:13:48 -05:00
Ian Fijolek
2c4543a7bc Update go version to 1.15 2021-01-08 18:13:34 -05:00
Ian Fijolek
a1b906b94a Update for go 1.15 2020-11-16 15:56:31 -08:00
Ian Fijolek
0a5be250b5 Scripts: Add echoing log lines to helper scripts
Rather than only returning the status of whether or not a container is
healhthy, the helper scripts will now optionally echo some of the latest
log lines.
2020-11-16 15:52:21 -08:00
Ian Fijolek
88f77aa27c Fix Makefile comment 2020-11-16 15:51:41 -08:00
Ian Fijolek
67c2375bba Remove docker linting for now
Drone check doesn't pass. Need to install docker there
2020-07-14 17:29:54 -07:00
Ian Fijolek
aad9eaa32f Update exported status metric to properly reflect alerting status of a monitor
It was using the result of the individual check and not the monitor as a whole
2020-07-14 17:09:56 -07:00
Ian Fijolek
5dc5ba5257 Add docker linting 2020-07-14 17:08:48 -07:00
Ian Fijolek
4aff294739 Set overrided version in drone config 2020-07-07 12:15:53 -07:00
Ian Fijolek
0684b15a44 Update logic for setting version
I noticed that versions were not being properly dervied from the git
tags. This fixes that in a simpler way by allowing git to describe the
current commit with tags, commits, shas, and a dirty maker.
2020-07-07 10:51:13 -07:00
Ian Fijolek
d3826dacde Update drone to use new linux only target 2020-07-06 20:33:02 -07:00
Ian Fijolek
f8e40c643c Move static binaries to dist/ for easier publishing
This will make it easier to publish them to Github or Gitea releases later.

To avoid making the Makefile super complex, this patch also makes use of
variables to simplify the Makefile as well.
2020-07-06 20:15:21 -07:00
Ian Fijolek
cffbbd734a Make default log alert conditional
Allow using the default `log` alert for both up and down alerts using
Go's templating conditionals. Following this example can do away with
the need for an up and down version of every alert.
2020-06-19 09:51:42 -07:00
Ian Fijolek
ad6f3be6ec Update README with more detailed running instructions from prior project 2020-02-19 22:13:30 -08:00
Ian Fijolek
ae30f477f7 Add ability to customize metrics port 2020-02-19 22:13:07 -08:00
Ian Fijolek
9dcd8ebf12 Update README to correct differences between py and go versions 2020-02-19 21:56:01 -08:00
Ian Fijolek
11af700618 Merge branch 'minitor-py-compat-rebase' 2020-02-19 21:21:40 -08:00
Ian Fijolek
00029a6327 Make Python compatability a flag 2020-02-19 17:38:07 -08:00
Ian Fijolek
9c21880efa Add a default log alert 2020-02-19 17:38:07 -08:00
Ian Fijolek
8b0d3b65cf Try to allow parsing of Minitor-py templates
This will make transition easier for an interim period. Will remove at
version 1.0
2020-02-19 17:38:07 -08:00
Ian Fijolek
25c5179d3d Switch to a single key for command and command shell
This makes the configuration more similar to Minitor-py and
docker-compose. If a string is passed, it will be executed in a shell.
If an array is passed, it will be executed in as a command directly.

This breaks compatiblity with previous versions of Minitor-go, but
closer to compatiblity with Minitor-py.
2020-02-19 17:38:06 -08:00
Ian Fijolek
eb7ad0b25e Allow specifying config path as an argument 2020-02-19 17:37:53 -08:00
Ian Fijolek
3b963f420f Remove underscore var name 2020-02-19 17:37:52 -08:00
Ian Fijolek
162e8618cb Revert "Don't copy extra qemu files"
This reverts commit 5b69eacdd5.
2020-01-30 11:41:34 -08:00
Ian Fijolek
c67fe1f4c7 Go back to copying all files because drone doesn't like this 2020-01-30 11:35:59 -08:00
Ian Fijolek
5b69eacdd5 Don't copy extra qemu files 2020-01-30 11:30:13 -08:00
Ian Fijolek
d6b979f06e Update README to correct typo and checklist 2020-01-17 15:53:37 -08:00
Ian Fijolek
f4a972747f Add notify after docker builds 2020-01-10 14:25:02 -08:00
Ian Fijolek
c7c82fabe8 Add qemu binaries 2020-01-10 14:21:48 -08:00
Ian Fijolek
f807caa1ad Add multi-arch builds 2020-01-10 13:58:17 -08:00
Ian Fijolek
3226be69e7 Fix issue with shell commands containing "<>" and unecessary (and poor) escaping 2020-01-07 10:37:53 -08:00
Ian Fijolek
0269ad3512 Add new test for multi-line YAML strings 2020-01-07 10:28:14 -08:00
Ian Fijolek
f6ccd9a3bd Update Dockerfiles to newer (roughly) pinned versions 2019-11-22 14:44:21 -08:00
Ian Fijolek
f463ef27b7 Update Dockerfiles to make this version runnable
Should now have pairity in terms of system utilities and scripts for
checking services
2019-11-22 12:58:26 -08:00
Ian Fijolek
76ae8f3a44 Do build and test in one step
Speed up build time by moving these two tasks to one step so that a new
container doesn't have to be spun up and the cached modules from the
build step are reused in the test step.
2019-11-21 15:40:59 -08:00
Ian Fijolek
9b9f803231 Add pre-commit to the repo
This adds pre-commit which can be used to enforce consistent style
and common errors (like committing large files)
2019-11-21 15:32:57 -08:00
Ian Fijolek
b808df7365 Update README to indicate where to get alert template info 2019-11-20 17:21:48 -08:00
Ian Fijolek
b1422bbec2 Add split out metrics to a new make target
In case trying to run without having an available port to
serve http on
2019-11-15 17:36:10 -08:00
Ian Fijolek
604c27118a Add Docker deploy pipeline 2019-11-15 17:30:29 -08:00
Ian Fijolek
b2d9882c91 Update readme with accurate To do status 2019-11-15 17:17:46 -08:00
Ian Fijolek
457e19af9b Add prometheus metrics exporter
This should add metrics parity to the Python version
2019-11-15 17:14:20 -08:00
30 changed files with 1494 additions and 364 deletions
+129 -6
View File
@@ -1,13 +1,136 @@
---
kind: pipeline kind: pipeline
name: test name: test
steps: steps:
- name: build
image: golang:1.12
commands:
- make build
- name: test - name: test
image: golang:1.12 image: golang:1.17
environment:
VERSION: ${DRONE_TAG:-${DRONE_COMMIT}}
commands: commands:
- make test - make test
- name: check
image: iamthefij/drone-pre-commit:personal
---
kind: pipeline
name: publish
depends_on:
- test
trigger:
event:
- push
- tag
refs:
- refs/heads/master
- refs/tags/v*
steps:
- name: build all binaries
image: golang:1.17
environment:
VERSION: ${DRONE_TAG:-${DRONE_COMMIT}}
commands:
- make all
- name: compress binaries for release
image: ubuntu
commands:
- find ./dist -type f -executable -execdir tar -czvf {}.tar.gz {} \;
when:
event: tag
- name: upload gitea release
image: plugins/gitea-release
settings:
title: ${DRONE_TAG}
files: dist/*.tar.gz
checksum:
- md5
- sha1
- sha256
- sha512
base_url:
from_secret: gitea_base_url
api_key:
from_secret: gitea_token
when:
event: tag
- name: push image - arm
image: plugins/docker
settings:
repo: iamthefij/minitor-go
auto_tag: true
auto_tag_suffix: linux-arm
username:
from_secret: docker_username
password:
from_secret: docker_password
build_args:
- ARCH=arm
- REPO=arm32v7
- name: push image - arm64
image: plugins/docker
settings:
repo: iamthefij/minitor-go
auto_tag: true
auto_tag_suffix: linux-arm64
username:
from_secret: docker_username
password:
from_secret: docker_password
build_args:
- ARCH=arm64
- REPO=arm64v8
- name: push image - amd64
image: plugins/docker
settings:
repo: iamthefij/minitor-go
auto_tag: true
auto_tag_suffix: linux-amd64
username:
from_secret: docker_username
password:
from_secret: docker_password
- name: publish manifest
image: plugins/manifest
settings:
spec: manifest.tmpl
auto_tag: true
ignore_missing: true
username:
from_secret: docker_username
password:
from_secret: docker_password
---
kind: pipeline
name: notify
depends_on:
- test
- publish
trigger:
status:
- failure
steps:
- name: notify
image: drillster/drone-email
settings:
host:
from_secret: SMTP_HOST # pragma: whitelist secret
username:
from_secret: SMTP_USER # pragma: whitelist secret
password:
from_secret: SMTP_PASS # pragma: whitelist secret
from: drone@iamthefij.com
Vendored
+2
View File
@@ -16,4 +16,6 @@
config.yml config.yml
# Output binary # Output binary
minitor
minitor-go minitor-go
dist/
+48
View File
@@ -0,0 +1,48 @@
---
linters:
enable:
- asciicheck
- bodyclose
- dogsled
- dupl
- exhaustive
- gochecknoinits
- gocognit
- gocritic
- gocyclo
- goerr113
- gofumpt
- goimports
- gomnd
- goprintffuncname
# - gosec
# - ifshort
- interfacer
- maligned
- misspell
- nakedret
- nestif
- nlreturn
- noctx
- unparam
- wsl
# - errorlint
disable:
- gochecknoglobals
linters-settings:
gosec:
excludes:
- G204
# gomnd:
# settings:
# mnd:
# ignored-functions: math.*
issues:
exclude-rules:
- path: _test\.go
linters:
- errcheck
- gosec
- maligned
+25
View File
@@ -0,0 +1,25 @@
---
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.4.0
hooks:
- id: check-added-large-files
- id: check-yaml
args:
- --allow-multiple-documents
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-merge-conflict
- repo: https://github.com/golangci/golangci-lint
rev: v1.42.1
hooks:
- id: golangci-lint
- repo: git://github.com/dnephin/pre-commit-golang
rev: v0.4.0
hooks:
- id: go-fmt
- id: go-imports
- repo: https://github.com/hadolint/hadolint
rev: v2.4.0
hooks:
- id: hadolint
+19 -3
View File
@@ -1,8 +1,24 @@
ARG REPO=library ARG REPO=library
FROM ${REPO}/busybox:latest FROM ${REPO}/alpine:3.12
WORKDIR /root/
RUN mkdir /app
WORKDIR /app/
# Add common checking tools
RUN apk --no-cache add bash=~5.0 curl=~7.76 jq=~1.6
# Add minitor user for running as non-root
RUN addgroup -S minitor && adduser -S minitor -G minitor
# Copy scripts
COPY ./scripts /app/scripts
RUN chmod -R 755 /app/scripts
# Copy minitor in
ARG ARCH=amd64 ARG ARCH=amd64
COPY ./minitor-go ./minitor COPY ./dist/minitor-linux-${ARCH} ./minitor
# Drop to non-root user
USER minitor
ENTRYPOINT [ "./minitor" ] ENTRYPOINT [ "./minitor" ]
+21 -5
View File
@@ -1,7 +1,5 @@
ARG REPO=library ARG REPO=library
FROM golang:1.12-alpine AS builder FROM golang:1.17 AS builder
RUN apk add --no-cache git
RUN mkdir /app RUN mkdir /app
WORKDIR /app WORKDIR /app
@@ -16,8 +14,26 @@ ARG VERSION=dev
ENV CGO_ENABLED=0 GOOS=linux GOARCH=${ARCH} ENV CGO_ENABLED=0 GOOS=linux GOARCH=${ARCH}
RUN go build -ldflags "-X main.version=${VERSION}" -a -installsuffix nocgo -o minitor . RUN go build -ldflags "-X main.version=${VERSION}" -a -installsuffix nocgo -o minitor .
FROM ${REPO}/busybox:latest FROM ${REPO}/alpine:3.10
WORKDIR /root/ RUN mkdir /app
WORKDIR /app/
# Copy minitor in
COPY --from=builder /app/minitor . COPY --from=builder /app/minitor .
# Add common checking tools
RUN apk --no-cache add bash=~5.0 curl=~7.66 jq=~1.6
# Add minitor user for running as non-root
RUN addgroup -S minitor && adduser -S minitor -G minitor
# Copy scripts
COPY ./scripts /app/scripts
RUN chmod -R 755 /app/scripts
# Drop to non-root user
USER minitor
ENTRYPOINT [ "./minitor" ] ENTRYPOINT [ "./minitor" ]
# vim: set filetype=dockerfile:
+70 -14
View File
@@ -1,39 +1,95 @@
DOCKER_TAG ?= minitor-go-${USER} DOCKER_TAG ?= minitor-go-${USER}
VERSION ?= $(shell git describe --tags --dirty)
GOFILES = *.go
# Multi-arch targets are generated from this
TARGET_ALIAS = minitor-linux-amd64 minitor-linux-arm minitor-linux-arm64 minitor-darwin-amd64
TARGETS = $(addprefix dist/,$(TARGET_ALIAS))
#
# Default make target will run tests
.DEFAULT_GOAL = test
.PHONY: test # Build all static Minitor binaries
default: test .PHONY: all
all: $(TARGETS)
# Build all static Linux Minitor binaries. Used in Docker images
.PHONY: all-linux
all-linux: $(filter dist/minitor-linux-%,$(TARGETS))
# Build minitor for the current machine
minitor: $(GOFILES)
@echo Version: $(VERSION)
go build -ldflags '-X "main.version=${VERSION}"' -o minitor
.PHONY: build .PHONY: build
build: build: minitor
go build
minitor-go:
go build
# Run minitor for the current machine
.PHONY: run .PHONY: run
run: minitor-go build run: minitor
./minitor-go -debug ./minitor -debug
.PHONY: run-metrics
run-metrics: minitor
./minitor -debug -metrics
# Run all tests
.PHONY: test .PHONY: test
test: test:
go test -coverprofile=coverage.out go test -coverprofile=coverage.out
@echo
go tool cover -func=coverage.out go tool cover -func=coverage.out
@echo
@# Check min coverage percentage
@go tool cover -func=coverage.out | awk -v target=80.0% \ @go tool cover -func=coverage.out | awk -v target=80.0% \
'/^total:/ { print "Total coverage: " $$3 " Minimum coverage: " target; if ($$3+0.0 >= target+0.0) print "ok"; else { print "fail"; exit 1; } }' '/^total:/ { print "Total coverage: " $$3 " Minimum coverage: " target; if ($$3+0.0 >= target+0.0) print "ok"; else { print "fail"; exit 1; } }'
# Installs pre-commit hooks
.PHONY: install-hooks
install-hooks:
pre-commit install --install-hooks
# Runs pre-commit checks on files
.PHONY: check
check:
pre-commit run --all-files
.PHONY: clean .PHONY: clean
clean: clean:
rm -f ./minitor-go rm -f ./minitor
rm -f ./coverage.out rm -f ./coverage.out
rm -fr ./dist
.PHONY: docker-build .PHONY: docker-build
docker-build: docker-build:
docker build -f ./Dockerfile.multi-stage -t $(DOCKER_TAG) . docker build -f ./Dockerfile.multi-stage -t $(DOCKER_TAG)-linux-amd64 .
.PHONY: docker-run .PHONY: docker-run
docker-run: docker-build docker-run: docker-build
docker run --rm -v $(shell pwd)/config.yml:/root/config.yml $(DOCKER_TAG) docker run --rm -v $(shell pwd)/config.yml:/root/config.yml $(DOCKER_TAG)
## Multi-arch targets
$(TARGETS): $(GOFILES)
mkdir -p ./dist
GOOS=$(word 2, $(subst -, ,$(@))) GOARCH=$(word 3, $(subst -, ,$(@))) CGO_ENABLED=0 \
go build -ldflags '-X "main.version=${VERSION}"' -a -installsuffix nocgo \
-o $@
.PHONY: $(TARGET_ALIAS)
$(TARGET_ALIAS):
$(MAKE) $(addprefix dist/,$@)
# Arch specific docker build targets
.PHONY: docker-build-arm
docker-build-arm: dist/minitor-linux-arm
docker build --build-arg REPO=arm32v7 --build-arg ARCH=arm . -t ${DOCKER_TAG}-linux-arm
.PHONY: docker-build-arm64
docker-build-arm64: dist/minitor-linux-arm64
docker build --build-arg REPO=arm64v8 --build-arg ARCH=arm64 . -t ${DOCKER_TAG}-linux-arm64
# Cross run on host architechture
.PHONY: docker-run-arm
docker-run-arm: docker-build-arm
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock --name $(DOCKER_TAG)-run ${DOCKER_TAG}-linux-arm
.PHONY: docker-run-arm64
docker-run-arm64: docker-build-arm64
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock --name $(DOCKER_TAG)-run ${DOCKER_TAG}-linux-arm64
+136 -66
View File
@@ -1,97 +1,167 @@
# minitor-go # [minitor-go](https://git.iamthefij.com/iamthefij/minitor-go)
A reimplementation of [Minitor](https://git.iamthefij/iamthefij/minitor) in Go A minimal monitoring system
Minitor is already a very minimal monitoring tool. Python 3 was a quick way to get something live, but Python itself comes with a very large footprint.Thus Go feels like a better fit for the project, longer term. ## What does it do?
Minitor accepts a YAML configuration file with a set of commands to run and a set of alerts to execute when those commands fail. It is designed to be as simple as possible and relies on other command line tools to do checks and issue alerts.
## But why?
I'm running a few small services and found Sensu, Consul, Nagios, etc. to all be far too complicated for my usecase.
## So how do I use it?
### Running
Install and execute with:
```bash
go get github.com/iamthefij/minitor-go
minitor
```
If locally developing you can use:
```bash
make run
```
It will read the contents of `config.yml` and begin its loop. You could also run it directly and provide a new config file via the `-config` argument.
#### Docker
You can pull this repository directly from Docker:
```bash
docker pull iamthefij/minitor-go:latest
```
The Docker image uses a default `config.yml` that is copied from `sample-config.yml`. This won't really do anything for you, so when you run the Docker image, you should supply your own `config.yml` file:
```bash
docker run -v $PWD/config.yml:/app/config.yml iamthefij/minitor-go:latest
```
Images are provided for `amd64`, `arm`, and `arm64` architechtures.
## Configuring
In this repo, you can explore the `sample-config.yml` file for an example, but the general structure is as follows. It should be noted that environment variable interpolation happens on load of the YAML file.
The global configurations are:
|key|value|
|---|---|
|`check_interval`|Maximum frequency to run checks for each monitor as duration, eg. 1m2s.|
|`monitors`|List of all monitors. Detailed description below|
|`alerts`|List of all alerts. Detailed description below|
### Monitors
All monitors should be listed under `monitors`.
Each monitor allows the following configuration:
|key|value|
|---|---|
|`name`|Name of the monitor running. This will show up in messages and logs.|
|`command`|Specifies the command that should be executed, either in exec or shell form. This command's exit value will determine whether the check is successful|
|`alert_down`|A list of Alerts to be triggered when the monitor is in a "down" state|
|`alert_up`|A list of Alerts to be triggered when the monitor moves to an "up" state|
|`check_interval`|The interval at which this monitor should be checked. This must be greater than the global `check_interval` value|
|`alert_after`|Allows specifying the number of failed checks before an alert should be triggered|
|`alert_every`|Allows specifying how often an alert should be retriggered. There are a few magic numbers here. Defaults to `-1` for an exponential backoff. Setting to `0` disables re-alerting. Positive values will allow retriggering after the specified number of checks|
### Alerts
Alerts exist as objects keyed under `alerts`. Their key should be the name of the Alert. This is used in your monitor setup in `alert_down` and `alert_up`.
Eachy alert allows the following configuration:
|key|value|
|---|---|
|`command`|Specifies the command that should be executed, either in exec or shell form. This is the command that will be run when the alert is executed. This can be templated with environment variables or the variables shown in the table below|
Also, when alerts are executed, they will be passed through Go's format function with arguments for some attributes of the Monitor. The following monitor specific variables can be referenced using Go formatting syntax:
|token|value|
|---|---|
|`{{.AlertCount}}`|Number of times this monitor has alerted|
|`{{.FailureCount}}`|The total number of sequential failed checks for this monitor|
|`{{.LastCheckOutput}}`|The last returned value from the check command to either stderr or stdout|
|`{{.LastSuccess}}`|The ISO datetime of the last successful check|
|`{{.MonitorName}}`|The name of the monitor that failed and triggered the alert|
|`{{.IsUp}}`|Indicates if the monitor that is alerting is up or not. Can be used in a conditional message template|
### Metrics
Minitor supports exporting metrics for [Prometheus](https://prometheus.io/). Prometheus is an open source tool for reading and querying metrics from different sources. Combined with another tool, [Grafana](https://grafana.com/), it allows building of charts and dashboards. You could also opt to just use Minitor to log check results, and instead do your alerting with Grafana.
It is also possible to use the metrics endpoint for monitoring Minitor itself! This allows setting up multiple instances of Minitor on different servers and have them monitor each-other so that you can detect a minitor outage.
To run minitor with metrics, use the `-metrics` flag. The metrics will be served on port `8080` by default, though it can be overriden using `-metrics-port`. They will be accessible on the path `/metrics`. Eg. `localhost:8080/metrics`.
```bash
minitor -metrics
# or
minitor -metrics -metrics-port 3000
```
## Contributing
Whether you're looking to submit a patch or tell me I broke something, you can contribute through the Github mirror and I can merge PRs back to the source repository.
Primary Repo: https://git.iamthefij.com/iamthefij/minitor.git
Github Mirror: https://github.com/IamTheFij/minitor.git
## Original Minitor
This is a reimplementation of [Minitor](https://git.iamthefij.com/iamthefij/minitor) in Go
Minitor is already a minimal monitoring tool. Python 3 was a quick way to get something live, but Python itself comes with a large footprint. Thus Go feels like a better fit for the project, longer term.
Initial target is meant to be roughly compatible requiring only minor changes to configuration. Future iterations may diverge to take advantage of Go specific features. Initial target is meant to be roughly compatible requiring only minor changes to configuration. Future iterations may diverge to take advantage of Go specific features.
## Differences from Python version ### Differences from Python version
There are a few key differences between the Python version and the v0.x Go version. Templating for Alert messages has been updated. In the Python version, `str.format(...)` was used with certain keys passed in that could be used to format messages. In the Go version, we use a struct, `AlertNotice` defined in `alert.go` and the built in Go templating format. Eg.
First, configuration keys cannot have multiple types in Go, so a different key must be used when specifying a Shell command as a string rather than a list of args. Instead of `command`, you must use `command_shell`. Eg:
minitor-py:
```yaml
monitors:
- name: Exec command
command: ['echo', 'test']
- name: Shell command
command: echo 'test'
```
minitor-go:
```yaml
monitors:
- name: Exec command
command: ['echo', 'test']
- name: Shell command
command_shell: echo 'test'
```
Second, templating for Alert messages has been updated. In the Python version, `str.format(...)` was used with certain keys passed in that could be used to format messages. In the Go version, we use a struct containing Alert info and the built in Go templating format. Eg.
minitor-py: minitor-py:
```yaml ```yaml
alerts: alerts:
log_command: log:
command: ['echo', '{monitor_name}'] command: 'echo {monitor_name}'
log_shell:
command_shell: "echo {monitor_name}"
``` ```
minitor-go: minitor-go:
```yaml ```yaml
alerts: alerts:
log_command: log:
command: ['echo', '{{.MonitorName}}'] command: 'echo {{.MonitorName}}'
log_shell:
command_shell: "echo {{.MonitorName}}"
``` ```
Finally, newlines in a shell command don't terminate a particular command. Semicolons must be used and continuations should not. Interval durations have changed from being an integer number of seconds to a duration string supported by Go, for example:
minitor-py: minitor-py:
```yaml ```yaml
alerts: check_interval: 90
log_shell:
command_shell: >
echo "line 1"
echo "line 2"
echo "continued" \
"line"
``` ```
minitor-go: minitor-go:
```yaml ```yaml
alerts: check_interval: 1m30s
log_shell:
command_shell: >
echo "line 1";
echo "line 2";
echo "continued"
"line"
``` ```
## To do For the time being, legacy configs for the Python version of Minitor should be compatible if you apply the `-py-compat` flag when running Minitor. Eventually, this flag will go away when later breaking changes are introduced.
There are two sets of task lists. The first is to get rough parity on key features with the Python version. The second is to make some improvements to the framework.
Pairity: ## Future
- [x] Run monitor commands Future, potentially breaking changes
- [x] Run monitor commands in a shell
- [x] Run alert commands
- [x] Run alert commands in a shell
- [x] Allow templating of alert commands
- [ ] Implement Prometheus client to export metrics
- [ ] Test coverage
Improvement:
- [ ] Implement leveled logging (maybe glog or logrus)
- [ ] Consider switching from YAML to TOML
- [ ] Consider value of templating vs injecting values into Env variables - [ ] Consider value of templating vs injecting values into Env variables
- [ ] Consider dropping `alert_up` and `alert_down` in favor of using Go templates that offer more control of messaging
- [ ] Async checking - [ ] Async checking
- [ ] Use durations rather than seconds checked in event loop - [ ] Revisit metrics and see if they all make sense
- [ ] Consider dropping `alert_up` and `alert_down` in favor of using Go templates that offer more control of messaging (Breaking)
+86 -29
View File
@@ -2,89 +2,126 @@ package main
import ( import (
"bytes" "bytes"
"errors"
"fmt" "fmt"
"log"
"os/exec" "os/exec"
"strings"
"text/template" "text/template"
"time" "time"
"git.iamthefij.com/iamthefij/slog"
)
var (
errNoTemplate = errors.New("no template")
// ErrAlertFailed indicates that an alert failed to send
ErrAlertFailed = errors.New("alert failed")
) )
// Alert is a config driven mechanism for sending a notice // Alert is a config driven mechanism for sending a notice
type Alert struct { type Alert struct {
Name string Name string
Command []string Command CommandOrShell
CommandShell string `yaml:"command_shell"`
commandTemplate []*template.Template commandTemplate []*template.Template
commandShellTemplate *template.Template commandShellTemplate *template.Template
} }
// AlertNotice captures the context for an alert to be sent // AlertNotice captures the context for an alert to be sent
type AlertNotice struct { type AlertNotice struct {
MonitorName string
AlertCount int16 AlertCount int16
FailureCount int16 FailureCount int16
LastCheckOutput string
LastSuccess time.Time
IsUp bool IsUp bool
LastSuccess time.Time
MonitorName string
LastCheckOutput string
} }
// IsValid returns a boolean indicating if the Alert has been correctly // IsValid returns a boolean indicating if the Alert has been correctly
// configured // configured
func (alert Alert) IsValid() bool { func (alert Alert) IsValid() bool {
atLeastOneCommand := (alert.CommandShell != "" || alert.Command != nil) return !alert.Command.Empty()
atMostOneCommand := (alert.CommandShell == "" || alert.Command == nil)
return atLeastOneCommand && atMostOneCommand
} }
// BuildTemplates compiles command templates for the Alert // BuildTemplates compiles command templates for the Alert
func (alert *Alert) BuildTemplates() error { func (alert *Alert) BuildTemplates() error {
if LogDebug { // TODO: Remove legacy template support later after 1.0
log.Printf("DEBUG: Building template for alert %s", alert.Name) legacy := strings.NewReplacer(
} "{alert_count}", "{{.AlertCount}}",
if alert.commandTemplate == nil && alert.Command != nil { "{alert_message}", "{{.MonitorName}} check has failed {{.FailureCount}} times",
"{failure_count}", "{{.FailureCount}}",
"{last_output}", "{{.LastCheckOutput}}",
"{last_success}", "{{.LastSuccess}}",
"{monitor_name}", "{{.MonitorName}}",
)
slog.Debugf("Building template for alert %s", alert.Name)
switch {
case alert.commandTemplate == nil && alert.Command.Command != nil:
alert.commandTemplate = []*template.Template{} alert.commandTemplate = []*template.Template{}
for i, cmdPart := range alert.Command { for i, cmdPart := range alert.Command.Command {
if PyCompat {
cmdPart = legacy.Replace(cmdPart)
}
alert.commandTemplate = append(alert.commandTemplate, template.Must( alert.commandTemplate = append(alert.commandTemplate, template.Must(
template.New(alert.Name+string(i)).Parse(cmdPart), template.New(alert.Name+fmt.Sprint(i)).Parse(cmdPart),
)) ))
} }
} else if alert.commandShellTemplate == nil && alert.CommandShell != "" { case alert.commandShellTemplate == nil && alert.Command.ShellCommand != "":
shellCmd := alert.Command.ShellCommand
if PyCompat {
shellCmd = legacy.Replace(shellCmd)
}
alert.commandShellTemplate = template.Must( alert.commandShellTemplate = template.Must(
template.New(alert.Name).Parse(alert.CommandShell), template.New(alert.Name).Parse(shellCmd),
) )
} else { default:
return fmt.Errorf("No template provided for alert %s", alert.Name) return fmt.Errorf("No template provided for alert %s: %w", alert.Name, errNoTemplate)
} }
return nil return nil
} }
// Send will send an alert notice by executing the command template // Send will send an alert notice by executing the command template
func (alert Alert) Send(notice AlertNotice) (output_str string, err error) { func (alert Alert) Send(notice AlertNotice) (outputStr string, err error) {
log.Printf("INFO: Sending alert %s for %s", alert.Name, notice.MonitorName) slog.Infof("Sending alert %s for %s", alert.Name, notice.MonitorName)
var cmd *exec.Cmd var cmd *exec.Cmd
if alert.commandTemplate != nil {
switch {
case alert.commandTemplate != nil:
command := []string{} command := []string{}
for _, cmdTmp := range alert.commandTemplate { for _, cmdTmp := range alert.commandTemplate {
var commandBuffer bytes.Buffer var commandBuffer bytes.Buffer
err = cmdTmp.Execute(&commandBuffer, notice) err = cmdTmp.Execute(&commandBuffer, notice)
if err != nil { if err != nil {
return return
} }
command = append(command, commandBuffer.String()) command = append(command, commandBuffer.String())
} }
cmd = exec.Command(command[0], command[1:]...) cmd = exec.Command(command[0], command[1:]...)
} else if alert.commandShellTemplate != nil { case alert.commandShellTemplate != nil:
var commandBuffer bytes.Buffer var commandBuffer bytes.Buffer
err = alert.commandShellTemplate.Execute(&commandBuffer, notice) err = alert.commandShellTemplate.Execute(&commandBuffer, notice)
if err != nil { if err != nil {
return return
} }
shellCommand := commandBuffer.String() shellCommand := commandBuffer.String()
cmd = ShellCommand(shellCommand) cmd = ShellCommand(shellCommand)
} else { default:
err = fmt.Errorf("No templates compiled for alert %v", alert.Name) err = fmt.Errorf("No templates compiled for alert %s: %w", alert.Name, errNoTemplate)
return return
} }
@@ -95,10 +132,30 @@ func (alert Alert) Send(notice AlertNotice) (output_str string, err error) {
var output []byte var output []byte
output, err = cmd.CombinedOutput() output, err = cmd.CombinedOutput()
output_str = string(output) outputStr = string(output)
if LogDebug { slog.Debugf("Alert output for: %s\n---\n%s\n---", alert.Name, outputStr)
log.Printf("DEBUG: Alert output for: %s\n---\n%s\n---", alert.Name, output_str)
if err != nil {
err = fmt.Errorf(
"Alert '%s' failed to send. Returned %v: %w",
alert.Name,
err,
ErrAlertFailed,
)
} }
return output_str, err return outputStr, err
}
// NewLogAlert creates an alert that does basic logging using echo
func NewLogAlert() *Alert {
return &Alert{
Name: "log",
Command: CommandOrShell{
Command: []string{
"echo",
"{{.MonitorName}} {{if .IsUp}}has recovered{{else}}check has failed {{.FailureCount}} times{{end}}",
},
},
}
} }
+59 -14
View File
@@ -11,23 +11,20 @@ func TestAlertIsValid(t *testing.T) {
expected bool expected bool
name string name string
}{ }{
{Alert{Command: []string{"echo", "test"}}, true, "Command only"}, {Alert{Command: CommandOrShell{Command: []string{"echo", "test"}}}, true, "Command only"},
{Alert{CommandShell: "echo test"}, true, "CommandShell only"}, {Alert{Command: CommandOrShell{ShellCommand: "echo test"}}, true, "CommandShell only"},
{Alert{}, false, "No commands"}, {Alert{}, false, "No commands"},
{
Alert{Command: []string{"echo", "test"}, CommandShell: "echo test"},
false,
"Both commands",
},
} }
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
actual := c.alert.IsValid() actual := c.alert.IsValid()
if actual != c.expected { if actual != c.expected {
t.Errorf("IsValid(%v), expected=%t actual=%t", c.name, c.expected, actual) t.Errorf("IsValid(%v), expected=%t actual=%t", c.name, c.expected, actual)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
@@ -39,50 +36,94 @@ func TestAlertSend(t *testing.T) {
expectedOutput string expectedOutput string
expectErr bool expectErr bool
name string name string
pyCompat bool
}{ }{
{ {
Alert{Command: []string{"echo", "{{.MonitorName}}"}}, Alert{Command: CommandOrShell{Command: []string{"echo", "{{.MonitorName}}"}}},
AlertNotice{MonitorName: "test"}, AlertNotice{MonitorName: "test"},
"test\n", "test\n",
false, false,
"Command with template", "Command with template",
false,
}, },
{ {
Alert{CommandShell: "echo {{.MonitorName}}"}, Alert{Command: CommandOrShell{ShellCommand: "echo {{.MonitorName}}"}},
AlertNotice{MonitorName: "test"}, AlertNotice{MonitorName: "test"},
"test\n", "test\n",
false, false,
"Command shell with template", "Command shell with template",
false,
}, },
{ {
Alert{Command: []string{"echo", "{{.Bad}}"}}, Alert{Command: CommandOrShell{Command: []string{"echo", "{{.Bad}}"}}},
AlertNotice{MonitorName: "test"}, AlertNotice{MonitorName: "test"},
"", "",
true, true,
"Command with bad template", "Command with bad template",
false,
}, },
{ {
Alert{CommandShell: "echo {{.Bad}}"}, Alert{Command: CommandOrShell{ShellCommand: "echo {{.Bad}}"}},
AlertNotice{MonitorName: "test"}, AlertNotice{MonitorName: "test"},
"", "",
true, true,
"Command shell with bad template", "Command shell with bad template",
false,
},
{
Alert{Command: CommandOrShell{ShellCommand: "echo {alert_message}"}},
AlertNotice{MonitorName: "test", FailureCount: 1},
"test check has failed 1 times\n",
false,
"Command shell with legacy template",
true,
},
// Test default log alert down
{
*NewLogAlert(),
AlertNotice{MonitorName: "Test", FailureCount: 1, IsUp: false},
"Test check has failed 1 times\n",
false,
"Default log alert down",
false,
},
// Test default log alert up
{
*NewLogAlert(),
AlertNotice{MonitorName: "Test", IsUp: true},
"Test has recovered\n",
false,
"Default log alert up",
false,
}, },
} }
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
c.alert.BuildTemplates() // Set PyCompat to value of compat flag
PyCompat = c.pyCompat
err := c.alert.BuildTemplates()
if err != nil {
t.Errorf("Send(%v output), error building templates: %v", c.name, err)
}
output, err := c.alert.Send(c.notice) output, err := c.alert.Send(c.notice)
hasErr := (err != nil) hasErr := (err != nil)
if output != c.expectedOutput { if output != c.expectedOutput {
t.Errorf("Send(%v output), expected=%v actual=%v", c.name, c.expectedOutput, output) t.Errorf("Send(%v output), expected=%v actual=%v", c.name, c.expectedOutput, output)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
if hasErr != c.expectErr { if hasErr != c.expectErr {
t.Errorf("Send(%v err), expected=%v actual=%v", c.name, "Err", err) t.Errorf("Send(%v err), expected=%v actual=%v", c.name, "Err", err)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
// Set PyCompat back to default value
PyCompat = false
log.Println("-----") log.Println("-----")
} }
} }
@@ -90,10 +131,12 @@ func TestAlertSend(t *testing.T) {
func TestAlertSendNoTemplates(t *testing.T) { func TestAlertSendNoTemplates(t *testing.T) {
alert := Alert{} alert := Alert{}
notice := AlertNotice{} notice := AlertNotice{}
output, err := alert.Send(notice) output, err := alert.Send(notice)
if err == nil { if err == nil {
t.Errorf("Send(no template), expected=%v actual=%v", "Err", output) t.Errorf("Send(no template), expected=%v actual=%v", "Err", output)
} }
log.Println("-----") log.Println("-----")
} }
@@ -103,8 +146,8 @@ func TestAlertBuildTemplate(t *testing.T) {
expectErr bool expectErr bool
name string name string
}{ }{
{Alert{Command: []string{"echo", "test"}}, false, "Command only"}, {Alert{Command: CommandOrShell{Command: []string{"echo", "test"}}}, false, "Command only"},
{Alert{CommandShell: "echo test"}, false, "CommandShell only"}, {Alert{Command: CommandOrShell{ShellCommand: "echo test"}}, false, "CommandShell only"},
{Alert{}, true, "No commands"}, {Alert{}, true, "No commands"},
} }
@@ -112,10 +155,12 @@ func TestAlertBuildTemplate(t *testing.T) {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
err := c.alert.BuildTemplates() err := c.alert.BuildTemplates()
hasErr := (err != nil) hasErr := (err != nil)
if hasErr != c.expectErr { if hasErr != c.expectErr {
t.Errorf("IsValid(%v), expected=%t actual=%t", c.name, c.expectErr, err) t.Errorf("IsValid(%v), expected=%t actual=%t", c.name, c.expectErr, err)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
+120 -32
View File
@@ -2,60 +2,135 @@ package main
import ( import (
"errors" "errors"
"gopkg.in/yaml.v2"
"io/ioutil" "io/ioutil"
"log" "time"
"os"
"git.iamthefij.com/iamthefij/slog"
"gopkg.in/yaml.v2"
) )
var errInvalidConfig = errors.New("Invalid configuration")
// Config type is contains all provided user configuration // Config type is contains all provided user configuration
type Config struct { type Config struct {
CheckInterval int64 `yaml:"check_interval"` CheckInterval SecondsOrDuration `yaml:"check_interval"`
Monitors []*Monitor Monitors []*Monitor
Alerts map[string]*Alert Alerts map[string]*Alert
} }
// CommandOrShell type wraps a string or list of strings
// for executing a command directly or in a shell
type CommandOrShell struct {
ShellCommand string
Command []string
}
// Empty checks if the Command has a value
func (cos CommandOrShell) Empty() bool {
return (cos.ShellCommand == "" && cos.Command == nil)
}
// UnmarshalYAML allows unmarshalling either a string or slice of strings
// and parsing them as either a command or a shell command.
func (cos *CommandOrShell) UnmarshalYAML(unmarshal func(interface{}) error) error {
var cmd []string
err := unmarshal(&cmd)
// Error indicates this is shell command
if err != nil {
var shellCmd string
err := unmarshal(&shellCmd)
if err != nil {
return err
}
cos.ShellCommand = shellCmd
} else {
cos.Command = cmd
}
return nil
}
// SecondsOrDuration wraps a duration value for parsing a duration or seconds from YAML
// NOTE: This should be removed in favor of only parsing durations once compatibility is broken
type SecondsOrDuration struct {
value time.Duration
}
// Value returns a duration value
func (sod SecondsOrDuration) Value() time.Duration {
return sod.value
}
// UnmarshalYAML allows unmarshalling a duration value or seconds if an int was provided
func (sod *SecondsOrDuration) UnmarshalYAML(unmarshal func(interface{}) error) error {
var seconds int64
err := unmarshal(&seconds)
if err == nil {
sod.value = time.Second * time.Duration(seconds)
return nil
}
// Error indicates that we don't have an int
err = unmarshal(&sod.value)
return err
}
// IsValid checks config validity and returns true if valid // IsValid checks config validity and returns true if valid
func (config Config) IsValid() (isValid bool) { func (config Config) IsValid() (isValid bool) {
isValid = true isValid = true
// Validate monitors // Validate alerts
if config.Monitors == nil || len(config.Monitors) == 0 { if config.Alerts == nil || len(config.Alerts) == 0 {
log.Printf("ERROR: Invalid monitor configuration: Must provide at least one monitor") // This should never happen because there is a default alert named 'log' for now
slog.Errorf("Invalid alert configuration: Must provide at least one alert")
isValid = false isValid = false
} }
for _, alert := range config.Alerts {
if !alert.IsValid() {
slog.Errorf("Invalid alert configuration: %+v", alert.Name)
isValid = false
} else {
slog.Debugf("Loaded alert %s", alert.Name)
}
}
// Validate monitors
if config.Monitors == nil || len(config.Monitors) == 0 {
slog.Errorf("Invalid monitor configuration: Must provide at least one monitor")
isValid = false
}
for _, monitor := range config.Monitors { for _, monitor := range config.Monitors {
if !monitor.IsValid() { if !monitor.IsValid() {
log.Printf("ERROR: Invalid monitor configuration: %s", monitor.Name) slog.Errorf("Invalid monitor configuration: %s", monitor.Name)
isValid = false isValid = false
} }
// Check that all Monitor alerts actually exist // Check that all Monitor alerts actually exist
for _, isUp := range []bool{true, false} { for _, isUp := range []bool{true, false} {
for _, alertName := range monitor.GetAlertNames(isUp) { for _, alertName := range monitor.GetAlertNames(isUp) {
if _, ok := config.Alerts[alertName]; !ok { if _, ok := config.Alerts[alertName]; !ok {
log.Printf( slog.Errorf(
"ERROR: Invalid monitor configuration: %s. Unknown alert %s", "Invalid monitor configuration: %s. Unknown alert %s",
monitor.Name, alertName, monitor.Name, alertName,
) )
isValid = false isValid = false
} }
} }
} }
} }
// Validate alerts return isValid
if config.Alerts == nil || len(config.Alerts) == 0 {
log.Printf("ERROR: Invalid alert configuration: Must provide at least one alert")
isValid = false
}
for _, alert := range config.Alerts {
if !alert.IsValid() {
log.Printf("ERROR: Invalid alert configuration: %s", alert.Name)
isValid = false
}
}
return
} }
// Init performs extra initialization on top of loading the config from file // Init performs extra initialization on top of loading the config from file
@@ -77,22 +152,35 @@ func LoadConfig(filePath string) (config Config, err error) {
return return
} }
// TODO: Decide if this is better expanded here, or only when executing err = yaml.Unmarshal(data, &config)
envExpanded := os.ExpandEnv(string(data))
err = yaml.Unmarshal([]byte(envExpanded), &config)
if err != nil { if err != nil {
return return
} }
log.Printf("config:\n%v\n", config) slog.Debugf("Config values:\n%v\n", config)
if !config.IsValid() { // Add log alert if not present
err = errors.New("Invalid configuration") if PyCompat {
return // Initialize alerts list if not present
if config.Alerts == nil {
config.Alerts = map[string]*Alert{}
}
if _, ok := config.Alerts["log"]; !ok {
config.Alerts["log"] = NewLogAlert()
}
} }
// Finish initializing configuration // Finish initializing configuration
err = config.Init() if err = config.Init(); err != nil {
return
}
return if !config.IsValid() {
err = errInvalidConfig
return
}
return config, err
} }
+119 -7
View File
@@ -3,6 +3,7 @@ package main
import ( import (
"log" "log"
"testing" "testing"
"time"
) )
func TestLoadConfig(t *testing.T) { func TestLoadConfig(t *testing.T) {
@@ -10,22 +11,133 @@ func TestLoadConfig(t *testing.T) {
configPath string configPath string
expectErr bool expectErr bool
name string name string
pyCompat bool
}{ }{
{"./test/valid-config.yml", false, "Valid config file"}, {"./test/valid-config.yml", false, "Valid config file", false},
{"./test/does-not-exist", true, "Invalid config path"}, {"./test/valid-default-log-alert.yml", false, "Valid config file with default log alert PyCompat", true},
{"./test/invalid-config-type.yml", true, "Invalid config type for key"}, {"./test/valid-default-log-alert.yml", true, "Invalid config file no log alert", false},
{"./test/invalid-config-missing-alerts.yml", true, "Invalid config missing alerts"}, {"./test/does-not-exist", true, "Invalid config path", false},
{"./test/invalid-config-unknown-alert.yml", true, "Invalid config unknown alert"}, {"./test/invalid-config-type.yml", true, "Invalid config type for key", false},
{"./test/invalid-config-missing-alerts.yml", true, "Invalid config missing alerts", false},
{"./test/invalid-config-unknown-alert.yml", true, "Invalid config unknown alert", false},
} }
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
// Set PyCompat based on compatibility mode
PyCompat = c.pyCompat
_, err := LoadConfig(c.configPath) _, err := LoadConfig(c.configPath)
hasErr := (err != nil) hasErr := (err != nil)
if hasErr != c.expectErr { if hasErr != c.expectErr {
t.Errorf("LoadConfig(%v), expected=%v actual=%v", c.name, "Err", err) t.Errorf("LoadConfig(%v), expected_error=%v actual=%v", c.name, c.expectErr, err)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----")
// Set PyCompat to default value
PyCompat = false
}
}
func TestIntervalParsing(t *testing.T) {
log.Printf("Testing case TestIntervalParsing")
config, err := LoadConfig("./test/valid-config.yml")
if err != nil {
t.Errorf("Failed loading config: %v", err)
}
oneSecond := time.Second
tenSeconds := 10 * time.Second
oneMinute := time.Minute
// validate top level interval seconds represented as an int
if config.CheckInterval.Value() != oneSecond {
t.Errorf("Incorrectly parsed int seconds. expected=%v actual=%v", oneSecond, config.CheckInterval)
}
if config.Monitors[0].CheckInterval.Value() != tenSeconds {
t.Errorf("Incorrectly parsed seconds duration. expected=%v actual=%v", oneSecond, config.CheckInterval)
}
if config.Monitors[1].CheckInterval.Value() != oneMinute {
t.Errorf("Incorrectly parsed seconds duration. expected=%v actual=%v", oneSecond, config.CheckInterval)
}
log.Println("-----")
}
// TestMultiLineConfig is a more complicated test stepping through the parsing
// and execution of mutli-line strings presented in YAML
func TestMultiLineConfig(t *testing.T) {
log.Println("Testing multi-line string config")
config, err := LoadConfig("./test/valid-verify-multi-line.yml")
if err != nil {
t.Fatalf("TestMultiLineConfig(load), expected=no_error actual=%v", err)
}
log.Println("-----")
log.Println("TestMultiLineConfig(parse > string)")
expected := "echo 'Some string with stuff'; echo \"<angle brackets>\"; exit 1\n"
actual := config.Monitors[0].Command.ShellCommand
if expected != actual {
t.Errorf("TestMultiLineConfig(>) failed")
t.Logf("string expected=`%v`", expected)
t.Logf("string actual =`%v`", actual)
t.Logf("bytes expected=%v", []byte(expected))
t.Logf("bytes actual =%v", []byte(actual))
}
log.Println("-----")
log.Println("TestMultiLineConfig(execute > string)")
_, notice := config.Monitors[0].Check()
if notice == nil {
t.Fatalf("Did not receive an alert notice")
}
expected = "Some string with stuff\n<angle brackets>\n"
actual = notice.LastCheckOutput
if expected != actual {
t.Errorf("TestMultiLineConfig(execute > string) check failed")
t.Logf("string expected=`%v`", expected)
t.Logf("string actual =`%v`", actual)
t.Logf("bytes expected=%v", []byte(expected))
t.Logf("bytes actual =%v", []byte(actual))
}
log.Println("-----")
log.Println("TestMultiLineConfig(parse | string)")
expected = "echo 'Some string with stuff'\necho '<angle brackets>'\n"
actual = config.Alerts["log_shell"].Command.ShellCommand
if expected != actual {
t.Errorf("TestMultiLineConfig(|) failed")
t.Logf("string expected=`%v`", expected)
t.Logf("string actual =`%v`", actual)
t.Logf("bytes expected=%v", []byte(expected))
t.Logf("bytes actual =%v", []byte(actual))
}
log.Println("-----")
log.Println("TestMultiLineConfig(execute | string)")
actual, err = config.Alerts["log_shell"].Send(AlertNotice{})
if err != nil {
t.Errorf("Execution of alert failed")
}
expected = "Some string with stuff\n<angle brackets>\n"
if expected != actual {
t.Errorf("TestMultiLineConfig(execute | string) check failed")
t.Logf("string expected=`%v`", expected)
t.Logf("string actual =`%v`", actual)
t.Logf("bytes expected=%v", []byte(expected))
t.Logf("bytes actual =%v", []byte(actual))
} }
} }
+6 -2
View File
@@ -1,5 +1,9 @@
module git.iamthefij.com/iamthefij/minitor-go module git.iamthefij.com/iamthefij/minitor-go
go 1.12 go 1.15
require gopkg.in/yaml.v2 v2.2.4 require (
git.iamthefij.com/iamthefij/slog v1.3.0
github.com/prometheus/client_golang v1.2.1
gopkg.in/yaml.v2 v2.2.4
)
+77
View File
@@ -1,3 +1,80 @@
git.iamthefij.com/iamthefij/slog v1.3.0 h1:4Hu5PQvDrW5e3FrTS3q2iIXW0iPvhNY/9qJsqDR3K3I=
git.iamthefij.com/iamthefij/slog v1.3.0/go.mod h1:1RUj4hcCompZkAxXCRfUX786tb3cM/Zpkn97dGfUfbg=
github.com/alecthomas/template v0.0.0-20160405071501-a0175ee3bccc/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc=
github.com/alecthomas/template v0.0.0-20190718012654-fb15b899a751/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc=
github.com/alecthomas/units v0.0.0-20151022065526-2efee857e7cf/go.mod h1:ybxpYRFXyAe+OPACYpWeL0wqObRcbAqCMya13uyzqw0=
github.com/alecthomas/units v0.0.0-20190717042225-c3de453c63f4/go.mod h1:ybxpYRFXyAe+OPACYpWeL0wqObRcbAqCMya13uyzqw0=
github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973/go.mod h1:Dwedo/Wpr24TaqPxmxbtue+5NUziq4I4S80YR8gNf3Q=
github.com/beorn7/perks v1.0.0/go.mod h1:KWe93zE9D1o94FZ5RNwFwVgaQK1VOXiVxmqh+CedLV8=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/cespare/xxhash/v2 v2.1.0 h1:yTUvW7Vhb89inJ+8irsUqiWjh8iT6sQPZiQzI6ReGkA=
github.com/cespare/xxhash/v2 v2.1.0/go.mod h1:dgIUBU3pDso/gPgZ1osOZ0iQf77oPR28Tjxl5dIMyVM=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/go-kit/kit v0.8.0/go.mod h1:xBxKIO96dXMWWy0MnWVtmwkA9/13aqxPnvrjFYMA2as=
github.com/go-kit/kit v0.9.0/go.mod h1:xBxKIO96dXMWWy0MnWVtmwkA9/13aqxPnvrjFYMA2as=
github.com/go-logfmt/logfmt v0.3.0/go.mod h1:Qt1PoO58o5twSAckw1HlFXLmHsOX5/0LbT9GBnD5lWE=
github.com/go-logfmt/logfmt v0.4.0/go.mod h1:3RMwSq7FuexP4Kalkev3ejPJsZTpXXBr9+V4qmtdjCk=
github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY=
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.3.2 h1:6nsPYzhq5kReh6QImI3k5qWzO4PEbvbIW2cwSfR/6xs=
github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/json-iterator/go v1.1.6/go.mod h1:+SdeFBvtyEkXs7REEP0seUULqWtbJapLOCVDaaPEHmU=
github.com/json-iterator/go v1.1.7/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4=
github.com/julienschmidt/httprouter v1.2.0/go.mod h1:SYymIcj16QtmaHHD7aYtjjsJG7VTCxuUUipMqKk8s4w=
github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
github.com/kr/logfmt v0.0.0-20140226030751-b84e30acd515/go.mod h1:+0opPa2QZZtGFBFZlji/RkVcI2GknAs/DXo4wKdlNEc=
github.com/matttproud/golang_protobuf_extensions v1.0.1 h1:4hp9jkHxhMHkqkrB3Ix0jegS5sx/RkqARlsWZ6pIwiU=
github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5Ld7szi9bcBfOoFv/3dc6xSMkL2PC0=
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0=
github.com/modern-go/reflect2 v1.0.1/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0=
github.com/mwitkow/go-conntrack v0.0.0-20161129095857-cc309e4a2223/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U=
github.com/pkg/errors v0.8.0/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/prometheus/client_golang v0.9.1/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXPKyh/dDVn+NZz0KFw=
github.com/prometheus/client_golang v1.0.0/go.mod h1:db9x61etRT2tGnBNRi70OPL5FsnadC4Ky3P0J6CfImo=
github.com/prometheus/client_golang v1.2.1 h1:JnMpQc6ppsNgw9QPAGF6Dod479itz7lvlsMzzNayLOI=
github.com/prometheus/client_golang v1.2.1/go.mod h1:XMU6Z2MjaRKVu/dC1qupJI9SiNkDYzz3xecMgSW/F+U=
github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910/go.mod h1:MbSGuTsp3dbXC40dX6PRTWyKYBIrTGTE9sqQNg2J8bo=
github.com/prometheus/client_model v0.0.0-20190129233127-fd36f4220a90/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4 h1:gQz4mCbXsO+nc9n1hCxHcGA3Zx3Eo+UHZoInFGUIXNM=
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=
github.com/prometheus/common v0.4.1/go.mod h1:TNfzLD0ON7rHzMJeJkieUDPYmFC7Snx/y86RQel1bk4=
github.com/prometheus/common v0.7.0 h1:L+1lyG48J1zAQXA3RBX/nG/B3gjlHq0zTt2tlbJLyCY=
github.com/prometheus/common v0.7.0/go.mod h1:DjGbpBbp5NYNiECxcL/VnbXCCaQpKd3tt26CguLLsqA=
github.com/prometheus/procfs v0.0.0-20181005140218-185b4288413d/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk=
github.com/prometheus/procfs v0.0.2/go.mod h1:TjEm7ze935MbeOT/UhFTIMYKhuLP4wbCsTZCD3I8kEA=
github.com/prometheus/procfs v0.0.5 h1:3+auTFlqw+ZaQYJARz6ArODtkaIwtvBTx3N2NehQlL8=
github.com/prometheus/procfs v0.0.5/go.mod h1:4A/X28fw3Fc593LaREMrKMqOKvUAntwMDaekg4FpcdQ=
github.com/sirupsen/logrus v1.2.0/go.mod h1:LxeOpSwHxABJmUn/MG1IvRgCAasNZTLOkJPxbbu5VWo=
github.com/sirupsen/logrus v1.4.2/go.mod h1:tLMulIdttU9McNUspp0xgXVQah82FyeX6MwdIuYE2rE=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/objx v0.1.1/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
golang.org/x/crypto v0.0.0-20180904163835-0709b304e793/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190613194153-d28f0bde5980/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20181116152217-5ac8a444bdc5/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190422165155-953cdadca894/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191010194322-b09406accb47/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
gopkg.in/alecthomas/kingpin.v2 v2.2.6/go.mod h1:FMv+mEhP44yOT+4EoQTLFTRgOQ1FBLkstjWtayDeSgw=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v2 v2.2.1/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.2.4 h1:/eiJrUcujPVeJ3xlSWaiNi3uSVmDGBK1pDHUHAnao1I= gopkg.in/yaml.v2 v2.2.4 h1:/eiJrUcujPVeJ3xlSWaiNi3uSVmDGBK1pDHUHAnao1I=
gopkg.in/yaml.v2 v2.2.4/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= gopkg.in/yaml.v2 v2.2.4/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
+84 -49
View File
@@ -1,60 +1,88 @@
package main package main
import ( import (
"errors"
"flag" "flag"
"fmt" "fmt"
"log"
"time" "time"
"git.iamthefij.com/iamthefij/slog"
) )
var ( var (
// LogDebug will control whether debug messsages should be logged // ExportMetrics will track whether or not we want to export metrics to prometheus
LogDebug = false ExportMetrics = false
// MetricsPort is the port to expose metrics on
MetricsPort = 8080
// Metrics contains all active metrics
Metrics = NewMetrics()
// PyCompat enables support for legacy Python templates
PyCompat = false
// version of minitor being run // version of minitor being run
version = "dev" version = "dev"
errUnknownAlert = errors.New("unknown alert")
) )
func sendAlerts(config *Config, monitor *Monitor, alertNotice *AlertNotice) error {
slog.Debugf("Received an alert notice from %s", alertNotice.MonitorName)
alertNames := monitor.GetAlertNames(alertNotice.IsUp)
if alertNames == nil {
// This should only happen for a recovery alert. AlertDown is validated not empty
slog.Warningf(
"Received alert, but no alert mechanisms exist. MonitorName=%s IsUp=%t",
alertNotice.MonitorName, alertNotice.IsUp,
)
return nil
}
for _, alertName := range alertNames {
if alert, ok := config.Alerts[alertName]; ok {
output, err := alert.Send(*alertNotice)
if err != nil {
slog.Errorf(
"Alert '%s' failed. result=%v: output=%s",
alert.Name,
err,
output,
)
return err
}
// Count alert metrics
Metrics.CountAlert(monitor.Name, alert.Name)
} else {
// This case should never actually happen since we validate against it
slog.Errorf("Unknown alert for monitor %s: %s", alertNotice.MonitorName, alertName)
return fmt.Errorf("unknown alert for monitor %s: %s: %w", alertNotice.MonitorName, alertName, errUnknownAlert)
}
}
return nil
}
func checkMonitors(config *Config) error { func checkMonitors(config *Config) error {
// TODO: Run this in goroutines and capture exceptions
for _, monitor := range config.Monitors { for _, monitor := range config.Monitors {
if monitor.ShouldCheck() { if monitor.ShouldCheck() {
_, alertNotice := monitor.Check() success, alertNotice := monitor.Check()
hasAlert := alertNotice != nil
// Track status metrics
Metrics.SetMonitorStatus(monitor.Name, monitor.IsUp())
Metrics.CountCheck(monitor.Name, success, monitor.LastCheckMilliseconds(), hasAlert)
// Should probably consider refactoring everything below here
if alertNotice != nil { if alertNotice != nil {
if LogDebug { err := sendAlerts(config, monitor, alertNotice)
log.Printf("DEBUG: Recieved an alert notice from %s", alertNotice.MonitorName) // If there was an error in sending an alert, exit early and bubble it up
} if err != nil {
alertNames := monitor.GetAlertNames(alertNotice.IsUp) return err
if alertNames == nil {
// This should only happen for a recovery alert. AlertDown is validated not empty
log.Printf(
"WARNING: Recieved alert, but no alert mechanisms exist. MonitorName=%s IsUp=%t",
alertNotice.MonitorName, alertNotice.IsUp,
)
}
for _, alertName := range alertNames {
if alert, ok := config.Alerts[alertName]; ok {
output, err := alert.Send(*alertNotice)
if err != nil {
log.Printf(
"ERROR: Alert '%s' failed. result=%v: output=%s",
alert.Name,
err,
output,
)
return fmt.Errorf(
"Unsuccessfully triggered alert '%s'. "+
"Crashing to avoid false negatives: %v",
alert.Name,
err,
)
}
} else {
// This case should never actually happen since we validate against it
log.Printf("ERROR: Unknown alert for monitor %s: %s", alertNotice.MonitorName, alertName)
return fmt.Errorf("Unknown alert for monitor %s: %s", alertNotice.MonitorName, alertName)
}
} }
} }
} }
@@ -64,31 +92,38 @@ func checkMonitors(config *Config) error {
} }
func main() { func main() {
// Get debug flag showVersion := flag.Bool("version", false, "Display the version of minitor and exit")
flag.BoolVar(&LogDebug, "debug", false, "Enables debug logs (default: false)") configPath := flag.String("config", "config.yml", "Alternate configuration path (default: config.yml)")
var showVersion = flag.Bool("version", false, "Display the version of minitor and exit")
flag.BoolVar(&slog.DebugLevel, "debug", false, "Enables debug logs (default: false)")
flag.BoolVar(&ExportMetrics, "metrics", false, "Enables prometheus metrics exporting (default: false)")
flag.BoolVar(&PyCompat, "py-compat", false, "Enables support for legacy Python Minitor config. Will eventually be removed. (default: false)")
flag.IntVar(&MetricsPort, "metrics-port", MetricsPort, "The port that Prometheus metrics should be exported on, if enabled. (default: 8080)")
flag.Parse() flag.Parse()
// Print version if flag is provided // Print version if flag is provided
if *showVersion { if *showVersion {
fmt.Println("Minitor version:", version) fmt.Println("Minitor version:", version)
return return
} }
// Load configuration // Load configuration
config, err := LoadConfig("config.yml") config, err := LoadConfig(*configPath)
if err != nil { slog.OnErrFatalf(err, "Error loading config: %v", err)
log.Fatalf("Error loading config: %v", err)
// Serve metrics exporter, if specified
if ExportMetrics {
slog.Infof("Exporting metrics to Prometheus on port %d", MetricsPort)
go ServeMetrics()
} }
// Start main loop // Start main loop
for { for {
err = checkMonitors(&config) err = checkMonitors(&config)
if err != nil { slog.OnErrPanicf(err, "Error checking monitors")
panic(err)
}
sleepTime := time.Duration(config.CheckInterval) * time.Second time.Sleep(config.CheckInterval.Value())
time.Sleep(sleepTime)
} }
} }
+51 -29
View File
@@ -16,9 +16,9 @@ func TestCheckMonitors(t *testing.T) {
{ {
config: Config{ config: Config{
Monitors: []*Monitor{ Monitors: []*Monitor{
&Monitor{ {
Name: "Success", Name: "Success",
Command: []string{"true"}, Command: CommandOrShell{Command: []string{"true"}},
}, },
}, },
}, },
@@ -28,34 +28,22 @@ func TestCheckMonitors(t *testing.T) {
{ {
config: Config{ config: Config{
Monitors: []*Monitor{ Monitors: []*Monitor{
&Monitor{ {
Name: "Failure", Name: "Failure",
Command: []string{"false"}, Command: CommandOrShell{Command: []string{"false"}},
AlertAfter: 1,
},
&Monitor{
Name: "Failure",
Command: []string{"false"},
AlertDown: []string{"unknown"},
AlertAfter: 1, AlertAfter: 1,
}, },
}, },
}, },
expectErr: false, expectErr: false,
name: "Monitor failure, no and unknown alerts", name: "Monitor failure, no alerts",
}, },
{ {
config: Config{ config: Config{
Monitors: []*Monitor{ Monitors: []*Monitor{
&Monitor{ {
Name: "Success", Name: "Success",
Command: []string{"ls"}, Command: CommandOrShell{Command: []string{"ls"}},
alertCount: 1,
},
&Monitor{
Name: "Success",
Command: []string{"true"},
AlertUp: []string{"unknown"},
alertCount: 1, alertCount: 1,
}, },
}, },
@@ -66,16 +54,44 @@ func TestCheckMonitors(t *testing.T) {
{ {
config: Config{ config: Config{
Monitors: []*Monitor{ Monitors: []*Monitor{
&Monitor{ {
Name: "Failure", Name: "Failure",
Command: []string{"false"}, Command: CommandOrShell{Command: []string{"false"}},
AlertDown: []string{"unknown"},
AlertAfter: 1,
},
},
},
expectErr: true,
name: "Monitor failure, unknown alerts",
},
{
config: Config{
Monitors: []*Monitor{
{
Name: "Success",
Command: CommandOrShell{Command: []string{"true"}},
AlertUp: []string{"unknown"},
alertCount: 1,
},
},
},
expectErr: true,
name: "Monitor recovery, unknown alerts",
},
{
config: Config{
Monitors: []*Monitor{
{
Name: "Failure",
Command: CommandOrShell{Command: []string{"false"}},
AlertDown: []string{"good"}, AlertDown: []string{"good"},
AlertAfter: 1, AlertAfter: 1,
}, },
}, },
Alerts: map[string]*Alert{ Alerts: map[string]*Alert{
"good": &Alert{ "good": {
Command: []string{"true"}, Command: CommandOrShell{Command: []string{"true"}},
}, },
}, },
}, },
@@ -85,17 +101,17 @@ func TestCheckMonitors(t *testing.T) {
{ {
config: Config{ config: Config{
Monitors: []*Monitor{ Monitors: []*Monitor{
&Monitor{ {
Name: "Failure", Name: "Failure",
Command: []string{"false"}, Command: CommandOrShell{Command: []string{"false"}},
AlertDown: []string{"bad"}, AlertDown: []string{"bad"},
AlertAfter: 1, AlertAfter: 1,
}, },
}, },
Alerts: map[string]*Alert{ Alerts: map[string]*Alert{
"bad": &Alert{ "bad": {
Name: "bad", Name: "bad",
Command: []string{"false"}, Command: CommandOrShell{Command: []string{"false"}},
}, },
}, },
}, },
@@ -105,10 +121,16 @@ func TestCheckMonitors(t *testing.T) {
} }
for _, c := range cases { for _, c := range cases {
c.config.Init() err := c.config.Init()
err := checkMonitors(&c.config) if err != nil {
t.Errorf("checkMonitors(%s): unexpected error reading config: %v", c.name, err)
}
err = checkMonitors(&c.config)
if err == nil && c.expectErr { if err == nil && c.expectErr {
t.Errorf("checkMonitors(%s): Expected panic, the code did not panic", c.name) t.Errorf("checkMonitors(%s): Expected panic, the code did not panic", c.name)
} else if err != nil && !c.expectErr {
t.Errorf("checkMonitors(%s): Did not expect an error, but we got one anyway: %v", c.name, err)
} }
} }
} }
+25
View File
@@ -0,0 +1,25 @@
image: iamthefij/minitor-go:{{#if build.tag}}{{trimPrefix "v" build.tag}}{{else}}latest{{/if}}
{{#if build.tags}}
tags:
{{#each build.tags}}
- {{this}}
{{/each}}
{{/if}}
manifests:
-
image: iamthefij/minitor-go:{{#if build.tag}}{{trimPrefix "v" build.tag}}-{{/if}}linux-amd64
platform:
architecture: amd64
os: linux
-
image: iamthefij/minitor-go:{{#if build.tag}}{{trimPrefix "v" build.tag}}-{{/if}}linux-arm64
platform:
architecture: arm64
os: linux
variant: v8
-
image: iamthefij/minitor-go:{{#if build.tag}}{{trimPrefix "v" build.tag}}-{{/if}}linux-arm
platform:
architecture: arm
os: linux
variant: v7
+117
View File
@@ -0,0 +1,117 @@
package main
import (
"fmt"
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
// TODO: Not sure if this is the best way to handle. A global instance for
// metrics isn't bad, but it might be nice to curry versions of the metrics
// for each monitor. Especially since every monitor has it's own. Perhaps
// another new function that essentially curries each metric for a given
// monitor name would do. This could be run when validating monitors and
// initializing alert templates.
// MinitorMetrics contains all counters and metrics that Minitor will need to access
type MinitorMetrics struct {
alertCount *prometheus.CounterVec
checkCount *prometheus.CounterVec
checkTime *prometheus.GaugeVec
monitorStatus *prometheus.GaugeVec
}
// NewMetrics creates and initializes all metrics
func NewMetrics() *MinitorMetrics {
// Initialize all metrics
metrics := &MinitorMetrics{
alertCount: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "minitor_alert_total",
Help: "Number of Minitor alerts",
},
[]string{"alert", "monitor"},
),
checkCount: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "minitor_check_total",
Help: "Number of Minitor checks",
},
[]string{"monitor", "status", "is_alert"},
),
checkTime: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "minitor_check_milliseconds",
Help: "Time in miliseconds that a check ran for",
},
[]string{"monitor", "status"},
),
monitorStatus: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "minitor_monitor_up_count",
Help: "Status of currently responsive monitors",
},
[]string{"monitor"},
),
}
// Register newly created metrics
prometheus.MustRegister(metrics.alertCount)
prometheus.MustRegister(metrics.checkCount)
prometheus.MustRegister(metrics.checkTime)
prometheus.MustRegister(metrics.monitorStatus)
return metrics
}
// SetMonitorStatus sets the current status of Monitor
func (metrics *MinitorMetrics) SetMonitorStatus(monitor string, isUp bool) {
val := 0.0
if isUp {
val = 1.0
}
metrics.monitorStatus.With(prometheus.Labels{"monitor": monitor}).Set(val)
}
// CountCheck counts the result of a particular Monitor check
func (metrics *MinitorMetrics) CountCheck(monitor string, isSuccess bool, ms int64, isAlert bool) {
status := "failure"
if isSuccess {
status = "success"
}
alertVal := "false"
if isAlert {
alertVal = "true"
}
metrics.checkCount.With(
prometheus.Labels{"monitor": monitor, "status": status, "is_alert": alertVal},
).Inc()
metrics.checkTime.With(
prometheus.Labels{"monitor": monitor, "status": status},
).Set(float64(ms))
}
// CountAlert counts an alert
func (metrics *MinitorMetrics) CountAlert(monitor string, alert string) {
metrics.alertCount.With(
prometheus.Labels{
"alert": alert,
"monitor": monitor,
},
).Inc()
}
// ServeMetrics starts an http server with a Prometheus metrics handler
func ServeMetrics() {
http.Handle("/metrics", promhttp.Handler())
host := fmt.Sprintf(":%d", MetricsPort)
_ = http.ListenAndServe(host, nil)
}
+55 -51
View File
@@ -1,38 +1,37 @@
package main package main
import ( import (
"log"
"math" "math"
"os/exec" "os/exec"
"time" "time"
"git.iamthefij.com/iamthefij/slog"
) )
// Monitor represents a particular periodic check of a command // Monitor represents a particular periodic check of a command
type Monitor struct { type Monitor struct { //nolint:maligned
// Config values // Config values
AlertAfter int16 `yaml:"alert_after"`
AlertEvery int16 `yaml:"alert_every"`
CheckInterval SecondsOrDuration `yaml:"check_interval"`
Name string Name string
Command []string
CommandShell string `yaml:"command_shell"`
AlertDown []string `yaml:"alert_down"` AlertDown []string `yaml:"alert_down"`
AlertUp []string `yaml:"alert_up"` AlertUp []string `yaml:"alert_up"`
CheckInterval float64 `yaml:"check_interval"` Command CommandOrShell
AlertAfter int16 `yaml:"alert_after"`
AlertEvery int16 `yaml:"alert_every"`
// Other values // Other values
lastCheck time.Time alertCount int16
lastOutput string failureCount int16
alertCount int16 lastCheck time.Time
failureCount int16 lastSuccess time.Time
lastSuccess time.Time lastOutput string
lastCheckDuration time.Duration
} }
// IsValid returns a boolean indicating if the Monitor has been correctly // IsValid returns a boolean indicating if the Monitor has been correctly
// configured // configured
func (monitor Monitor) IsValid() bool { func (monitor Monitor) IsValid() bool {
atLeastOneCommand := (monitor.CommandShell != "" || monitor.Command != nil) return (!monitor.Command.Empty() &&
atMostOneCommand := (monitor.CommandShell == "" || monitor.Command == nil)
return (atLeastOneCommand &&
atMostOneCommand &&
monitor.getAlertAfter() > 0 && monitor.getAlertAfter() > 0 &&
monitor.AlertDown != nil) monitor.AlertDown != nil)
} }
@@ -44,25 +43,29 @@ func (monitor Monitor) ShouldCheck() bool {
return true return true
} }
sinceLastCheck := time.Now().Sub(monitor.lastCheck).Seconds() sinceLastCheck := time.Since(monitor.lastCheck)
return sinceLastCheck >= monitor.CheckInterval
return sinceLastCheck >= monitor.CheckInterval.Value()
} }
// Check will run the command configured by the Monitor and return a status // Check will run the command configured by the Monitor and return a status
// and a possible AlertNotice // and a possible AlertNotice
func (monitor *Monitor) Check() (bool, *AlertNotice) { func (monitor *Monitor) Check() (bool, *AlertNotice) {
var cmd *exec.Cmd var cmd *exec.Cmd
if monitor.Command != nil { if monitor.Command.Command != nil {
cmd = exec.Command(monitor.Command[0], monitor.Command[1:]...) cmd = exec.Command(monitor.Command.Command[0], monitor.Command.Command[1:]...)
} else { } else {
cmd = ShellCommand(monitor.CommandShell) cmd = ShellCommand(monitor.Command.ShellCommand)
} }
checkStartTime := time.Now()
output, err := cmd.CombinedOutput() output, err := cmd.CombinedOutput()
monitor.lastCheck = time.Now() monitor.lastCheck = time.Now()
monitor.lastOutput = string(output) monitor.lastOutput = string(output)
monitor.lastCheckDuration = monitor.lastCheck.Sub(checkStartTime)
var alertNotice *AlertNotice var alertNotice *AlertNotice
isSuccess := (err == nil) isSuccess := (err == nil)
if isSuccess { if isSuccess {
alertNotice = monitor.success() alertNotice = monitor.success()
@@ -70,17 +73,11 @@ func (monitor *Monitor) Check() (bool, *AlertNotice) {
alertNotice = monitor.failure() alertNotice = monitor.failure()
} }
if LogDebug { slog.Debugf("Command output: %s", monitor.lastOutput)
log.Printf("DEBUG: Command output: %s", monitor.lastOutput) slog.OnErrWarnf(err, "Command result: %v", err)
}
if err != nil {
if LogDebug {
log.Printf("DEBUG: Command result: %v", err)
}
}
log.Printf( slog.Infof(
"INFO: %s success=%t, alert=%t", "%s success=%t, alert=%t",
monitor.Name, monitor.Name,
isSuccess, isSuccess,
alertNotice != nil, alertNotice != nil,
@@ -89,15 +86,22 @@ func (monitor *Monitor) Check() (bool, *AlertNotice) {
return isSuccess, alertNotice return isSuccess, alertNotice
} }
func (monitor Monitor) isUp() bool { // IsUp returns the status of the current monitor
func (monitor Monitor) IsUp() bool {
return monitor.alertCount == 0 return monitor.alertCount == 0
} }
// LastCheckMilliseconds gives number of miliseconds the last check ran for
func (monitor Monitor) LastCheckMilliseconds() int64 {
return monitor.lastCheckDuration.Milliseconds()
}
func (monitor *Monitor) success() (notice *AlertNotice) { func (monitor *Monitor) success() (notice *AlertNotice) {
if !monitor.isUp() { if !monitor.IsUp() {
// Alert that we have recovered // Alert that we have recovered
notice = monitor.createAlertNotice(true) notice = monitor.createAlertNotice(true)
} }
monitor.failureCount = 0 monitor.failureCount = 0
monitor.alertCount = 0 monitor.alertCount = 0
monitor.lastSuccess = time.Now() monitor.lastSuccess = time.Now()
@@ -109,15 +113,14 @@ func (monitor *Monitor) failure() (notice *AlertNotice) {
monitor.failureCount++ monitor.failureCount++
// If we haven't hit the minimum failures, we can exit // If we haven't hit the minimum failures, we can exit
if monitor.failureCount < monitor.getAlertAfter() { if monitor.failureCount < monitor.getAlertAfter() {
if LogDebug { slog.Debugf(
log.Printf( "%s failed but did not hit minimum failures. "+
"DEBUG: %s failed but did not hit minimum failures. "+ "Count: %v alert after: %v",
"Count: %v alert after: %v", monitor.Name,
monitor.Name, monitor.failureCount,
monitor.failureCount, monitor.getAlertAfter(),
monitor.getAlertAfter(), )
)
}
return return
} }
@@ -125,19 +128,20 @@ func (monitor *Monitor) failure() (notice *AlertNotice) {
failureCount := (monitor.failureCount - monitor.getAlertAfter()) failureCount := (monitor.failureCount - monitor.getAlertAfter())
// Use alert cadence to determine if we should alert // Use alert cadence to determine if we should alert
if monitor.AlertEvery > 0 { switch {
case monitor.AlertEvery > 0:
// Handle integer number of failures before alerting // Handle integer number of failures before alerting
if failureCount%monitor.AlertEvery == 0 { if failureCount%monitor.AlertEvery == 0 {
notice = monitor.createAlertNotice(false) notice = monitor.createAlertNotice(false)
} }
} else if monitor.AlertEvery == 0 { case monitor.AlertEvery == 0:
// Handle alerting on first failure only // Handle alerting on first failure only
if failureCount == 0 { if failureCount == 0 {
notice = monitor.createAlertNotice(false) notice = monitor.createAlertNotice(false)
} }
} else { default:
// Handle negative numbers indicating an exponential backoff // Handle negative numbers indicating an exponential backoff
if failureCount >= int16(math.Pow(2, float64(monitor.alertCount))-1) { if failureCount >= int16(math.Pow(2, float64(monitor.alertCount))-1) { //nolint:gomnd
notice = monitor.createAlertNotice(false) notice = monitor.createAlertNotice(false)
} }
} }
@@ -147,7 +151,7 @@ func (monitor *Monitor) failure() (notice *AlertNotice) {
monitor.alertCount++ monitor.alertCount++
} }
return return notice
} }
func (monitor Monitor) getAlertAfter() int16 { func (monitor Monitor) getAlertAfter() int16 {
@@ -155,18 +159,18 @@ func (monitor Monitor) getAlertAfter() int16 {
// Zero is one! // Zero is one!
if monitor.AlertAfter == 0 { if monitor.AlertAfter == 0 {
return 1 return 1
} else {
return monitor.AlertAfter
} }
return monitor.AlertAfter
} }
// GetAlertNames gives a list of alert names for a given monitor status // GetAlertNames gives a list of alert names for a given monitor status
func (monitor Monitor) GetAlertNames(up bool) []string { func (monitor Monitor) GetAlertNames(up bool) []string {
if up { if up {
return monitor.AlertUp return monitor.AlertUp
} else {
return monitor.AlertDown
} }
return monitor.AlertDown
} }
func (monitor Monitor) createAlertNotice(isUp bool) *AlertNotice { func (monitor Monitor) createAlertNotice(isUp bool) *AlertNotice {
+33 -19
View File
@@ -13,25 +13,22 @@ func TestMonitorIsValid(t *testing.T) {
expected bool expected bool
name string name string
}{ }{
{Monitor{Command: []string{"echo", "test"}, AlertDown: []string{"log"}}, true, "Command only"}, {Monitor{Command: CommandOrShell{Command: []string{"echo", "test"}}, AlertDown: []string{"log"}}, true, "Command only"},
{Monitor{CommandShell: "echo test", AlertDown: []string{"log"}}, true, "CommandShell only"}, {Monitor{Command: CommandOrShell{ShellCommand: "echo test"}, AlertDown: []string{"log"}}, true, "CommandShell only"},
{Monitor{Command: []string{"echo", "test"}}, false, "No AlertDown"}, {Monitor{Command: CommandOrShell{Command: []string{"echo", "test"}}}, false, "No AlertDown"},
{Monitor{AlertDown: []string{"log"}}, false, "No commands"}, {Monitor{AlertDown: []string{"log"}}, false, "No commands"},
{ {Monitor{Command: CommandOrShell{Command: []string{"echo", "test"}}, AlertDown: []string{"log"}, AlertAfter: -1}, false, "Invalid alert threshold, -1"},
Monitor{Command: []string{"echo", "test"}, CommandShell: "echo test", AlertDown: []string{"log"}},
false,
"Both commands",
},
{Monitor{Command: []string{"echo", "test"}, AlertDown: []string{"log"}, AlertAfter: -1}, false, "Invalid alert threshold, -1"},
} }
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
actual := c.monitor.IsValid() actual := c.monitor.IsValid()
if actual != c.expected { if actual != c.expected {
t.Errorf("IsValid(%v), expected=%t actual=%t", c.name, c.expected, actual) t.Errorf("IsValid(%v), expected=%t actual=%t", c.name, c.expected, actual)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
@@ -48,9 +45,9 @@ func TestMonitorShouldCheck(t *testing.T) {
name string name string
}{ }{
{Monitor{}, true, "Empty"}, {Monitor{}, true, "Empty"},
{Monitor{lastCheck: timeNow, CheckInterval: 15}, false, "Just checked"}, {Monitor{lastCheck: timeNow, CheckInterval: SecondsOrDuration{time.Second * 15}}, false, "Just checked"},
{Monitor{lastCheck: timeTenSecAgo, CheckInterval: 15}, false, "-10s"}, {Monitor{lastCheck: timeTenSecAgo, CheckInterval: SecondsOrDuration{time.Second * 15}}, false, "-10s"},
{Monitor{lastCheck: timeTwentySecAgo, CheckInterval: 15}, true, "-20s"}, {Monitor{lastCheck: timeTwentySecAgo, CheckInterval: SecondsOrDuration{time.Second * 15}}, true, "-20s"},
} }
for _, c := range cases { for _, c := range cases {
@@ -61,7 +58,7 @@ func TestMonitorShouldCheck(t *testing.T) {
} }
} }
// TestMonitorIsUp tests the Monitor.isUp() // TestMonitorIsUp tests the Monitor.IsUp()
func TestMonitorIsUp(t *testing.T) { func TestMonitorIsUp(t *testing.T) {
cases := []struct { cases := []struct {
monitor Monitor monitor Monitor
@@ -76,11 +73,13 @@ func TestMonitorIsUp(t *testing.T) {
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
actual := c.monitor.isUp()
actual := c.monitor.IsUp()
if actual != c.expected { if actual != c.expected {
t.Errorf("isUp(%v), expected=%t actual=%t", c.name, c.expected, actual) t.Errorf("IsUp(%v), expected=%t actual=%t", c.name, c.expected, actual)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
@@ -101,11 +100,13 @@ func TestMonitorGetAlertNames(t *testing.T) {
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
actual := c.monitor.GetAlertNames(c.up) actual := c.monitor.GetAlertNames(c.up)
if !EqualSliceString(actual, c.expected) { if !EqualSliceString(actual, c.expected) {
t.Errorf("GetAlertNames(%v), expected=%v actual=%v", c.name, c.expected, actual) t.Errorf("GetAlertNames(%v), expected=%v actual=%v", c.name, c.expected, actual)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
@@ -124,12 +125,15 @@ func TestMonitorSuccess(t *testing.T) {
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
notice := c.monitor.success() notice := c.monitor.success()
hasNotice := (notice != nil) hasNotice := (notice != nil)
if hasNotice != c.expectNotice { if hasNotice != c.expectNotice {
t.Errorf("success(%v), expected=%t actual=%t", c.name, c.expectNotice, hasNotice) t.Errorf("success(%v), expected=%t actual=%t", c.name, c.expectNotice, hasNotice)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
@@ -152,12 +156,15 @@ func TestMonitorFailureAlertAfter(t *testing.T) {
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
notice := c.monitor.failure() notice := c.monitor.failure()
hasNotice := (notice != nil) hasNotice := (notice != nil)
if hasNotice != c.expectNotice { if hasNotice != c.expectNotice {
t.Errorf("failure(%v), expected=%t actual=%t", c.name, c.expectNotice, hasNotice) t.Errorf("failure(%v), expected=%t actual=%t", c.name, c.expectNotice, hasNotice)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
@@ -200,10 +207,12 @@ func TestMonitorFailureAlertEvery(t *testing.T) {
notice := c.monitor.failure() notice := c.monitor.failure()
hasNotice := (notice != nil) hasNotice := (notice != nil)
if hasNotice != c.expectNotice { if hasNotice != c.expectNotice {
t.Errorf("failure(%v), expected=%t actual=%t", c.name, c.expectNotice, hasNotice) t.Errorf("failure(%v), expected=%t actual=%t", c.name, c.expectNotice, hasNotice)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
@@ -228,15 +237,18 @@ func TestMonitorFailureExponential(t *testing.T) {
// Unlike previous tests, this one requires a static Monitor with repeated // Unlike previous tests, this one requires a static Monitor with repeated
// calls to the failure method // calls to the failure method
monitor := Monitor{failureCount: 0, AlertAfter: 1, AlertEvery: -1} monitor := Monitor{failureCount: 0, AlertAfter: 1, AlertEvery: -1}
for _, c := range cases { for _, c := range cases {
log.Printf("Testing case %s", c.name) log.Printf("Testing case %s", c.name)
notice := monitor.failure() notice := monitor.failure()
hasNotice := (notice != nil) hasNotice := (notice != nil)
if hasNotice != c.expectNotice { if hasNotice != c.expectNotice {
t.Errorf("failure(%v), expected=%t actual=%t", c.name, c.expectNotice, hasNotice) t.Errorf("failure(%v), expected=%t actual=%t", c.name, c.expectNotice, hasNotice)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
@@ -248,28 +260,29 @@ func TestMonitorCheck(t *testing.T) {
hasNotice bool hasNotice bool
lastOutput string lastOutput string
} }
cases := []struct { cases := []struct {
monitor Monitor monitor Monitor
expect expected expect expected
name string name string
}{ }{
{ {
Monitor{Command: []string{"echo", "success"}}, Monitor{Command: CommandOrShell{Command: []string{"echo", "success"}}},
expected{isSuccess: true, hasNotice: false, lastOutput: "success\n"}, expected{isSuccess: true, hasNotice: false, lastOutput: "success\n"},
"Test successful command", "Test successful command",
}, },
{ {
Monitor{CommandShell: "echo success"}, Monitor{Command: CommandOrShell{ShellCommand: "echo success"}},
expected{isSuccess: true, hasNotice: false, lastOutput: "success\n"}, expected{isSuccess: true, hasNotice: false, lastOutput: "success\n"},
"Test successful command shell", "Test successful command shell",
}, },
{ {
Monitor{Command: []string{"total", "failure"}}, Monitor{Command: CommandOrShell{Command: []string{"total", "failure"}}},
expected{isSuccess: false, hasNotice: true, lastOutput: ""}, expected{isSuccess: false, hasNotice: true, lastOutput: ""},
"Test failed command", "Test failed command",
}, },
{ {
Monitor{CommandShell: "false"}, Monitor{Command: CommandOrShell{ShellCommand: "false"}},
expected{isSuccess: false, hasNotice: true, lastOutput: ""}, expected{isSuccess: false, hasNotice: true, lastOutput: ""},
"Test failed command shell", "Test failed command shell",
}, },
@@ -295,6 +308,7 @@ func TestMonitorCheck(t *testing.T) {
t.Errorf("Check(%v) (output), expected=%v actual=%v", c.name, c.expect.lastOutput, lastOutput) t.Errorf("Check(%v) (output), expected=%v actual=%v", c.name, c.expect.lastOutput, lastOutput)
log.Printf("Case failed: %s", c.name) log.Printf("Case failed: %s", c.name)
} }
log.Println("-----") log.Println("-----")
} }
} }
+22 -9
View File
@@ -1,21 +1,34 @@
check_interval: 30 ---
check_interval: 5
monitors: monitors:
- name: My Website - name: Fake Website
command: [ 'curl', '-s', '-o', '/dev/null', 'https://minitor.mon' ] command: ["curl", "-s", "-o", "/dev/null", "https://minitor.mon"]
alert_down: [ log, mailgun_down, sms_down ] alert_down: [log_down, mailgun_down, sms_down]
alert_up: [ log, email_up ] alert_up: [log_up, email_up]
check_interval: 30 # Must be at minimum the global `check_interval` check_interval: 10 # Must be at minimum the global `check_interval`
alert_after: 3 alert_after: 3
alert_every: -1 # Defaults to -1 for exponential backoff. 0 to disable repeating alert_every: -1 # Defaults to -1 for exponential backoff. 0 to disable repeating
- name: Real Website
command: ["curl", "-s", "-o", "/dev/null", "https://google.com"]
alert_down: [log_down, mailgun_down, sms_down]
alert_up: [log_up, email_up]
check_interval: 5
alert_after: 3
alert_every: -1
alerts: alerts:
log_down:
command: ["echo", "Minitor failure for {{.MonitorName}}"]
log_up:
command: ["echo", "Minitor recovery for {{.MonitorName}}"]
email_up: email_up:
command: [ sendmail, "me@minitor.mon", "Recovered: {monitor_name}", "We're back!" ] command:
[sendmail, "me@minitor.mon", "Recovered: {monitor_name}", "We're back!"]
mailgun_down: mailgun_down:
command: > command: >
curl -s -X POST curl -s -X POST
-F subject="Alert! {monitor_name} failed" -F subject="Alert! {{.MonitorName}} failed"
-F from="Minitor <minitor@minitor.mon>" -F from="Minitor <minitor@minitor.mon>"
-F to=me@minitor.mon -F to=me@minitor.mon
-F text="Our monitor failed" -F text="Our monitor failed"
@@ -23,7 +36,7 @@ alerts:
-u "api:${MAILGUN_API_KEY}" -u "api:${MAILGUN_API_KEY}"
sms_down: sms_down:
command: > command: >
curl -s -X POST -F "Body=Failure! {monitor_name} has failed" curl -s -X POST -F "Body=Failure! {{.MonitorName}} has failed"
-F "From=${AVAILABLE_NUMBER}" -F "To=${MY_PHONE}" -F "From=${AVAILABLE_NUMBER}" -F "To=${MY_PHONE}"
"https://api.twilio.com/2010-04-01/Accounts/${ACCOUNT_SID}/Messages" "https://api.twilio.com/2010-04-01/Accounts/${ACCOUNT_SID}/Messages"
-u "${ACCOUNT_SID}:${AUTH_TOKEN}" -u "${ACCOUNT_SID}:${AUTH_TOKEN}"
+5
View File
@@ -0,0 +1,5 @@
# Minitor Scripts
A collection of some handy scripts to use with Minitor
These are not included with the Python package, but they are included in the Docker image in `/app/scripts`.
+63
View File
@@ -0,0 +1,63 @@
#! /bin/bash
set -e
#################
# docker_check.sh
#
# Checks the most recent state exit code of a Docker container
#################
# Docker host will default to a socket
# To override, export DOCKER_HOST to a new hostname
DOCKER_HOST="${DOCKER_HOST:=socket}"
container_name="$1"
num_log_lines="$2"
# Curls Docker either using a socket or URL
function curl_docker {
local path="$1"
if [ "$DOCKER_HOST" == "socket" ]; then
curl --unix-socket /var/run/docker.sock "http://localhost/$path" 2>/dev/null
else
curl "http://${DOCKER_HOST}/$path" 2>/dev/null
fi
}
# Returns caintainer ID for a given container name
function get_container_id {
local container_name="$1"
curl_docker 'containers/json?all=1' \
| jq -r ".[] | {Id, Name: .Names[]} | select(.Name == \"/${container_name}\") | .Id"
}
# Returns container JSON
function inspect_container {
local container_id="$1"
curl_docker "containers/$container_id/json"
}
# Gets some lines from docker log
function get_logs {
container_id="$1"
num_lines="$2"
curl_docker "containers/$container_id/logs?stdout=1&stderr=1" | tail -n "$num_lines"
}
if [ -z "$container_name" ]; then
echo "Usage: $0 container_name [num_log_lines]"
echo "Will exit with the last status code of continer with provided name"
exit 1
fi
container_id=$(get_container_id "$container_name")
if [ -z "$container_id" ]; then
echo "ERROR: Could not find container with name: $container_name"
exit 1
fi
exit_code=$(inspect_container "$container_id" | jq -r .State.ExitCode)
if [ -n "$num_log_lines" ]; then
get_logs "$container_id" "$num_log_lines"
fi
exit "$exit_code"
+73
View File
@@ -0,0 +1,73 @@
#! /bin/bash
set -e
#################
# docker_healthcheck.sh
#
# Returns the results of a Docker Healthcheck for a container
#################
# Docker host will default to a socket
# To override, export DOCKER_HOST to a new hostname
DOCKER_HOST="${DOCKER_HOST:=socket}"
container_name="$1"
num_log_lines="$2"
# Curls Docker either using a socket or URL
function curl_docker {
local path="$1"
if [ "$DOCKER_HOST" == "socket" ]; then
curl --unix-socket /var/run/docker.sock "http://localhost/$path" 2>/dev/null
else
curl "http://${DOCKER_HOST}/$path" 2>/dev/null
fi
}
# Returns caintainer ID for a given container name
function get_container_id {
local container_name="$1"
curl_docker 'containers/json?all=1' \
| jq -r ".[] | {Id, Name: .Names[]} | select(.Name == \"/${container_name}\") | .Id"
}
# Returns container JSON
function inspect_container {
local container_id="$1"
curl_docker "containers/$container_id/json"
}
# Gets some lines from docker log
function get_logs {
container_id="$1"
num_lines="$2"
curl_docker "containers/$container_id/logs?stdout=1&stderr=1" | tail -n "$num_lines"
}
if [ -z "$container_name" ]; then
echo "Usage: $0 container_name [num_log_lines]"
echo "Will return results of healthcheck for continer with provided name"
exit 1
fi
container_id=$(get_container_id "$container_name")
if [ -z "$container_id" ]; then
echo "ERROR: Could not find container with name: $container_name"
exit 1
fi
health=$(inspect_container "$container_id" | jq -r '.State.Health.Status')
if [ -n "$num_log_lines" ]; then
get_logs "$container_id" "$num_log_lines"
fi
case "$health" in
null)
echo "No healthcheck results"
;;
starting|healthy)
echo "Status: '$health'"
;;
*)
echo "Status: '$health'"
exit 1
esac
-1
View File
@@ -6,4 +6,3 @@ monitors:
alert_down: [ 'alert_down', 'log_shell', 'log_command' ] alert_down: [ 'alert_down', 'log_shell', 'log_command' ]
# alert_every: -1 # alert_every: -1
alert_every: 0 alert_every: 0
+9 -6
View File
@@ -1,22 +1,25 @@
---
check_interval: 1 check_interval: 1
monitors: monitors:
- name: Command - name: Command
command: ['echo', '$PATH'] command: ["echo", "$PATH"]
alert_down: [ 'log_command', 'log_shell' ] alert_down: ["log_command", "log_shell"]
alert_every: 0 alert_every: 0
check_interval: 10s
- name: Shell - name: Shell
command_shell: > command: >
echo 'Some string with stuff'; echo 'Some string with stuff';
echo 'another line'; echo 'another line';
echo $PATH; echo $PATH;
exit 1 exit 1
alert_down: [ 'log_command', 'log_shell' ] alert_down: ["log_command", "log_shell"]
alert_after: 5 alert_after: 5
alert_every: 0 alert_every: 0
check_interval: 1m
alerts: alerts:
log_command: log_command:
command: [ 'echo', 'regular', '"command!!!"', "{{.MonitorName}}" ] command: ["echo", "regular", '"command!!!"', "{{.MonitorName}}"]
log_shell: log_shell:
command_shell: echo "Failure on {{.MonitorName}} User is $USER" command: echo "Failure on {{.MonitorName}} User is $USER"
+8
View File
@@ -0,0 +1,8 @@
---
check_interval: 1
monitors:
- name: Command
command: ['echo', '$PATH']
alert_down: ['log']
alert_every: 0
+18
View File
@@ -0,0 +1,18 @@
---
check_interval: 1
monitors:
- name: Shell
command: >
echo 'Some string with stuff';
echo "<angle brackets>";
exit 1
alert_down: ['log_shell']
alert_after: 1
alert_every: 0
alerts:
log_shell:
command: |
echo 'Some string with stuff'
echo '<angle brackets>'
+4 -12
View File
@@ -5,20 +5,10 @@ import (
"strings" "strings"
) )
// escapeCommandShell accepts a command to be executed by a shell and escapes it
func escapeCommandShell(command string) string {
// Remove extra spaces and newlines from ends
command = strings.TrimSpace(command)
// TODO: Not sure if this part is actually needed. Should verify
// Escape double quotes since this will be passed in as an argument
command = strings.Replace(command, `"`, `\"`, -1)
return command
}
// ShellCommand takes a string and executes it as a command using `sh` // ShellCommand takes a string and executes it as a command using `sh`
func ShellCommand(command string) *exec.Cmd { func ShellCommand(command string) *exec.Cmd {
shellCommand := []string{"sh", "-c", escapeCommandShell(command)} shellCommand := []string{"sh", "-c", strings.TrimSpace(command)}
//log.Printf("Shell command: %v", shellCommand)
return exec.Command(shellCommand[0], shellCommand[1:]...) return exec.Command(shellCommand[0], shellCommand[1:]...)
} }
@@ -27,10 +17,12 @@ func EqualSliceString(a, b []string) bool {
if len(a) != len(b) { if len(a) != len(b) {
return false return false
} }
for i, val := range a { for i, val := range a {
if val != b[i] { if val != b[i] {
return false return false
} }
} }
return true return true
} }