DevOps - USE 메소드와 RED 메소드(Monitoring 방법론)

모니터링 방법론인 USE 메소드와, RED 메소드에 대해 설명하는 글입니다.

인프라 레벨에서 사용하는 USE 메소드, 서비스 레벨에서 사용하는 RED 메소드로 구분하였고 Google SRE Book에 언급되는 Four Golden Signal은 아래에 설명하는 USE 메소드, RED 메소드에 중복되는 부분이 있어 별도 설명은 하지 않았습니다.

USE 메소드란 시스템의 성능을 분석하는 방법론으로, 모든 리소스에 대해 utilization(사용량), saturation(포화도), errors(에러)를 모니터링하는 것입니다.

아래는 각 항목에 대한 설명과 예시입니다.

resource: all physical server functional components (CPUs, disks, busses, ...)
utilization: the average time that the resource was busy servicing work.
- ex) as a percent over a time interval. eg, "one disk is running at 90% utilization".
saturation: the degree to which the resource has extra work which it can't service, often queued
- ex) as a queue length. eg, "the CPUs have an average run queue length of four".
errors: the count of error events
- ex) scalar counts. eg, "this network interface has had fifty late collisions".

resource	type	metric
CPU	utilization	CPU utilization (either per-CPU or a system-wide average)
CPU	saturation	run-queue length or scheduler latency
Memory capacity	utilization	available free memory (system-wide)
Memory capacity	saturation	anonymous paging or thread swapping (maybe "page scanning" too)
Network interface	utilization	RX/TX throughput / max bandwidth
Storage device I/O	utilization	device busy percent
Storage device I/O	saturation	wait queue length
Storage device I/O	errors	device errors ("soft", "hard", ...)

아래 표는 Azure Resource를 예로 들어 Resource, type, metric, target을 정리한 도표입니다.

Resource : CPU, Memory, Disk, Network

Azure 리소스 종류 : AKS, AppGW, Cosmos DB for PostgreSQL, AOAI, ML, ACR, KV, Blob

resource	type	metric	target
CPU	utilization	CPU Utilization Percentage	AKS Cluster, Cosmos DB for PostgreSQL, Azure ML Deployment
Memory capacity	utilization	Available free memory Memory Percent CPU Memory Utilization Percentage	AKS Nodes(VMSS) Cosmos DB for PostgreSQL Azure ML Deyployment
Memory capacity	saturation	Memory RSS, Memory Working Set	AKS Cluster
Storage device I/O	utilization	Storage Percent, IOPS Disk Utilization Percentage	Cosmos DB for PostgreSQL Azure ML Deployment

RED 메소드란 마이크로 서비스 환경에서 성능을 분석하는 방법론으로, 모든 서비스에 대한 rate(처리율), errors(오류 수), duration(처리 시간)을 모니터링하는 것입니다.

아래는 각 항목에 대한 설명입니다.

아래는 Azure Resource, Application를 예로 들어 Resource, Metric, Target을 정리한 표입니다.

Azure 리소스 종류 : AKS, AppGW, Cosmos DB for PostgreSQL, AOAI, ML, ACR, KV, Blob

Resource	metric	target
Rate	HTTP Request per minute Requests per minute per Healthy Host Current Connections Active Connection Request Per Minute Azure Open AI Request	App GW Cosmos DB for PostgreSQL Azure ML Online Endpoint Azure Open AI
Errors	HTTP Response Errors(HTTP Code > 299) AKS Pod Status Reason (evicted/nodeaffinity/nodelost) AKS Pod OOMkilled	App GW AKS Cluster AKS Cluster
Duration	HTTP Response Latency AOAI Response Time HTTP Request Latency	App GW Azure Open AI Azure ML Online Endpoint

References

Use Method

RED Method

Monitoring Best Practices

DevOps - Node-Modules, PnP 그리고 Next.js의 Standalone (0)	2025.05.03
Service Discovery란? (0)	2025.02.23
DevOps - Incident severity level의 정의 (0)	2025.01.17

HJ