본문 바로가기

DevOps

DevOps - USE 메소드와 RED 메소드(Monitoring 방법론)

모니터링 방법론인 USE 메소드와, RED 메소드에 대해 설명하는 글입니다.

 

인프라 레벨에서 사용하는 USE 메소드, 서비스 레벨에서 사용하는 RED 메소드로 구분하였고 Google SRE Book에 언급되는 Four Golden Signal은 아래에 설명하는 USE 메소드, RED 메소드에 중복되는 부분이 있어 별도 설명은 하지 않았습니다.

 

USE 메소드

USE 메소드란 시스템의 성능을 분석하는 방법론으로, 모든 리소스에 대해 utilization(사용량), saturation(포화도), errors(에러)를 모니터링하는 것입니다.

아래는 각 항목에 대한 설명과 예시입니다.

  • resource: all physical server functional components (CPUs, disks, busses, ...)
  • utilization: the average time that the resource was busy servicing work.
    • ex) as a percent over a time interval. eg, "one disk is running at 90% utilization".
  • saturation: the degree to which the resource has extra work which it can't service, often queued
    • ex) as a queue length. eg, "the CPUs have an average run queue length of four".
  • errors: the count of error events
    • ex) scalar counts. eg, "this network interface has had fifty late collisions". 

 

resource type metric
CPU utilization CPU utilization (either per-CPU or a system-wide average)
CPU saturation run-queue length or scheduler latency
Memory capacity utilization available free memory (system-wide)
Memory capacity saturation anonymous paging or thread swapping (maybe "page scanning" too)
Network interface utilization RX/TX throughput / max bandwidth
Storage device I/O utilization device busy percent
Storage device I/O saturation wait queue length
Storage device I/O errors device errors ("soft", "hard", ...)

 

아래 표는 Azure Resource를 예로 들어 Resource, type, metric, target을 정리한 도표입니다. 

 

Resource : CPU, Memory, Disk, Network

Azure 리소스 종류 : AKS, AppGW, Cosmos DB for PostgreSQL, AOAI, ML, ACR, KV, Blob

 

 

resource type metric target
CPU utilization CPU Utilization Percentage AKS Cluster, Cosmos DB for PostgreSQL, Azure ML Deployment
Memory capacity utilization Available free memory




Memory Percent




CPU Memory Utilization Percentage
AKS Nodes(VMSS)




Cosmos DB for PostgreSQL




Azure ML Deyployment
Memory capacity saturation Memory RSS, Memory Working Set AKS Cluster
Storage device I/O utilization Storage Percent, IOPS


Disk Utilization Percentage
Cosmos DB for PostgreSQL


Azure ML Deployment

 

RED 메소드

 

RED 메소드란 마이크로 서비스 환경에서 성능을 분석하는 방법론으로, 모든 서비스에 대한 rate(처리율), errors(오류 수), duration(처리 시간)을 모니터링하는 것입니다.

 

아래는 각 항목에 대한 설명입니다.

  • Rate : the number of requests per second
  • Errors : the number of those requests that are failing
  • Duration : the amount of time those requests take

아래는 Azure Resource, Application를 예로 들어 Resource, Metric, Target을 정리한 표입니다. 

Azure 리소스 종류 : AKS, AppGW, Cosmos DB for PostgreSQL, AOAI, ML, ACR, KV, Blob

 
 
Resource metric target
Rate HTTP Request per minute
Requests per minute per Healthy Host
Current Connections


Active Connection




Request Per Minute


Azure Open AI Request
App GW




Cosmos DB for PostgreSQL





Azure ML Online Endpoint


Azure Open AI
Errors HTTP Response Errors(HTTP Code > 299)


AKS Pod Status Reason (evicted/nodeaffinity/nodelost)


AKS Pod OOMkilled
App GW


AKS Cluster


AKS Cluster
Duration HTTP Response Latency


AOAI Response Time


HTTP Request Latency
App GW


Azure Open AI


Azure ML Online Endpoint

 

 

References

 

Use Method

https://www.brendangregg.com/usemethod.html

 

RED Method

https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services/

 

Monitoring Best Practices

https://grafana.com/docs/grafana/latest/dashboards/build-dashboards/best-practices/#red-method

https://grafana.com/blog/2024/07/03/getting-started-with-grafana-best-practices-to-design-your-first-dashboard/

'DevOps' 카테고리의 다른 글

DevOps - Node-Modules, PnP 그리고 Next.js의 Standalone  (0) 2025.05.03
Service Discovery란?  (0) 2025.02.23
DevOps - Incident severity level의 정의  (0) 2025.01.17