Details
-
Task
-
Resolution: Done
-
P2: Important
-
None
-
None
Description
Monitor all kinds of health statistics for all our build and test VMs. Requirements:
- Install a monitoring utility to all of our Tier2 images
- Telegraf? it's the one already in use for the host machines.
- Must be able to run custom monitoring commands on custom intervals, for example "ioping" on a custom directory, in order to measure the I/O latency.
- Send all statistics to a remote database
- InfluxDB most likely, as it's already used for recording the host machines metrics
- Make sure the VMs don't cache any metrics, but send them directly, as the build VMs are by definition short lived - they can be killed the moment something goes wrong, but we definitely don't want to miss those metrics
- Data retention on the database is of secondary importance; it's OK to delete logs after a month or even only a week.
- We'll most likely need to assign a unique hostname to each build VM in Coin.
Attachments
Issue Links
- relates to
-
QTQAINFRA-3089 Implement centralised log aggregation for all hosts/VMs in Coin and OpenNebula
- Reported