Cara Monitoring dan Debug
OpenClaw di VPS Linux

Tools dan techniques untuk keep OpenClaw running smooth di production.

± 35 menit baca SufaNet
OpenClaw Monitoring Debug

Pendahuluan

Sistem yang running smooth di production bukan kebetulan. Behind the scenes, ada monitoring yang solid, logging yang terstruktur, dan ability untuk debug issues dengan cepat kalau ada masalah.

Artikel ini akan bahas cara monitoring dan debugging OpenClaw secara menyeluruh: dari log management, performance monitoring, error tracking, troubleshooting common issues, sampai setup alerting system yang proactive.

💡 Philosophy: "You can't fix what you can't see." Monitoring bukan optional—ini foundational untuk production system.

Log Management

Logs adalah first line of defense untuk troubleshooting.

1. Structured Logging

Setup log format yang structured:

// config/logging.json
{
  "logging": {
    "level": "info",
    "format": "json",
    "outputs": [
      {
        "type": "file",
        "path": "./logs/openclaw.log",
        "maxSize": "100m",
        "maxFiles": 10
      },
      {
        "type": "console",
        "colorize": true
      }
    ],
    "fields": {
      "timestamp": true,
      "level": true,
      "message": true,
      "taskId": true,
      "userId": true,
      "ip": true,
      "duration": true,
      "error": true
    }
  }
}

2. Log Levels

  • ERROR — critical errors yang perlu immediate action
  • WARN — potential issues, not critical yet
  • INFO — general informational messages
  • DEBUG — detailed info untuk debugging (disable di production)
  • TRACE — very verbose (development only)

3. View Logs

# Real-time log monitoring
tail -f ~/openclaw/logs/openclaw.log

# Filter by level
grep "ERROR" ~/openclaw/logs/openclaw.log

# Last 100 lines
tail -100 ~/openclaw/logs/openclaw.log

# Search for specific task
grep "taskId.*backup" ~/openclaw/logs/openclaw.log

# Count errors
grep -c "ERROR" ~/openclaw/logs/openclaw.log

4. Log Rotation

sudo nano /etc/logrotate.d/openclaw
/home/user/openclaw/logs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0640 user user
    postrotate
        pm2 reloadLogs
    endscript
}

Performance Monitoring

1. Built-in Metrics Endpoint

curl http://localhost:3000/metrics

Response:

{
  "uptime": 86400,
  "memory": {
    "rss": "150MB",
    "heapTotal": "80MB",
    "heapUsed": "65MB",
    "external": "2MB"
  },
  "cpu": {
    "user": 12500,
    "system": 3200
  },
  "tasks": {
    "total": 150,
    "completed": 145,
    "failed": 3,
    "running": 2
  },
  "apiCalls": {
    "total": 1250,
    "successful": 1200,
    "failed": 50
  }
}

2. PM2 Monitoring

# Real-time monitoring
pm2 monit

# Status
pm2 status

# Resource usage
pm2 list
pm2 show openclaw

# Logs
pm2 logs openclaw --lines 50

3. System Resource Monitoring

# CPU usage
top -p $(pgrep -f openclaw)

# Memory usage
ps aux | grep openclaw

# Network connections
netstat -tulpn | grep 3000

# Disk I/O
iotop -p $(pgrep -f openclaw)

4. Custom Metrics Collection

Setup automated metrics collection:

{
  "name": "collect-metrics",
  "schedule": "*/5 * * * *",
  "actions": [
    {
      "type": "http",
      "method": "GET",
      "url": "http://localhost:3000/metrics",
      "output": "metrics"
    },
    {
      "type": "database",
      "driver": "postgresql",
      "query": "INSERT INTO metrics (timestamp, data) VALUES (NOW(), $1)",
      "params": ["{{metrics}}"]
    },
    {
      "type": "condition",
      "if": "metrics.memory.heapUsed > '500MB'",
      "then": [
        {
          "type": "notification",
          "message": "⚠️ High memory usage: {{metrics.memory.heapUsed}}"
        }
      ]
    }
  ]
}

Error Tracking

1. Error Log Analysis

# Count errors by type
grep "ERROR" logs/openclaw.log | cut -d'"' -f8 | sort | uniq -c | sort -nr

# Recent errors
grep "ERROR" logs/openclaw.log | tail -20

# Errors in last hour
grep "ERROR" logs/openclaw.log | grep "$(date -d '1 hour ago' '+%Y-%m-%dT%H')"

2. Integration dengan Sentry (Optional)

// .env
SENTRY_DSN=https://[email protected]/xxxxx
SENTRY_ENVIRONMENT=production

Sentry akan automatically capture errors dan send ke dashboard.

3. AI-Powered Error Analysis

{
  "name": "analyze-errors",
  "schedule": "0 * * * *",
  "actions": [
    {
      "type": "shell",
      "command": "grep 'ERROR' logs/openclaw.log | tail -100",
      "output": "recent_errors"
    },
    {
      "type": "ai-analyze",
      "prompt": "Analyze these error logs and identify: 1) Most common error types, 2) Root causes, 3) Suggested fixes",
      "input": "{{recent_errors}}",
      "output": "analysis"
    },
    {
      "type": "notification",
      "channel": "telegram",
      "message": "📊 Error Analysis:\n{{analysis.summary}}"
    }
  ]
}

Health Checks

1. Basic Health Check

curl http://localhost:3000/health

Response:

{
  "status": "ok",
  "uptime": 86400,
  "timestamp": "2026-02-09T10:30:00Z",
  "version": "1.0.0",
  "checks": {
    "database": "ok",
    "redis": "ok",
    "ai_api": "ok",
    "disk_space": "ok"
  }
}

2. Automated Health Monitoring

{
  "name": "health-check-monitor",
  "schedule": "*/2 * * * *",
  "actions": [
    {
      "type": "http",
      "method": "GET",
      "url": "http://localhost:3000/health",
      "timeout": 5000,
      "output": "health"
    },
    {
      "type": "condition",
      "if": "health.status != 'ok' || health.response_time > 3000",
      "then": [
        {
          "type": "notification",
          "channel": "telegram",
          "message": "🚨 Health check failed!\nStatus: {{health.status}}\nResponse time: {{health.response_time}}ms"
        },
        {
          "type": "shell",
          "command": "pm2 restart openclaw"
        }
      ]
    }
  ]
}

Debugging Techniques

1. Enable Debug Mode

# .env
NODE_ENV=development
LOG_LEVEL=debug
DEBUG=openclaw:*
pm2 restart openclaw
pm2 logs openclaw

2. Interactive Debugging

# Stop PM2
pm2 stop openclaw

# Run directly dengan debug
node --inspect dist/index.js

# Atau dengan breakpoint
node --inspect-brk dist/index.js

Connect dengan Chrome DevTools: chrome://inspect

3. Trace Specific Task

# Enable task tracing
curl -X POST http://localhost:3000/admin/trace \
  -H "Authorization: Bearer your-token" \
  -d '{"taskId": "backup-database", "level": "verbose"}'

# Run task
curl -X POST http://localhost:3000/api/tasks/backup-database/run

# View trace
grep "taskId.*backup-database" logs/openclaw.log | jq .

4. Memory Leak Detection

# Take heap snapshot
node --expose-gc dist/index.js &
kill -USR2 $(pgrep -f openclaw)

# Analyze dengan Chrome DevTools atau clinic.js
npm install -g clinic
clinic doctor -- node dist/index.js

Common Issues & Solutions

🔴 Issue: OpenClaw Crashes Randomly

Symptoms: Process exits unexpectedly

Diagnosis:

pm2 logs openclaw --err
grep "FATAL\|SIGTERM\|SIGKILL" logs/openclaw.log

Common Causes:

  • Out of memory (check dengan dmesg | grep -i kill)
  • Unhandled promise rejection
  • Segmentation fault (native module issue)

Solution: Enable auto-restart, add more RAM, fix unhandled promises

🟡 Issue: High Memory Usage

Diagnosis:

pm2 monit
node --max-old-space-size=4096 dist/index.js

Solutions:

  • Increase Node.js memory limit
  • Implement cache eviction
  • Check for memory leaks
  • Optimize large data processing

🔵 Issue: API Timeouts

Diagnosis:

grep "timeout" logs/openclaw.log
curl -w "@curl-format.txt" http://localhost:3000/api/tasks

Solutions:

  • Increase timeout values
  • Optimize slow queries
  • Add request queuing
  • Check AI API latency

Alerting System

Setup alerts untuk stay informed about issues:

{
  "alerts": [
    {
      "name": "cpu-high",
      "condition": "cpu_usage > 80",
      "duration": "5m",
      "channels": ["telegram", "email"]
    },
    {
      "name": "memory-high",
      "condition": "memory_usage > 85",
      "duration": "3m",
      "channels": ["telegram"]
    },
    {
      "name": "disk-full",
      "condition": "disk_usage > 90",
      "channels": ["telegram", "email", "slack"]
    },
    {
      "name": "task-failures",
      "condition": "failed_tasks > 5",
      "duration": "1h",
      "channels": ["telegram"]
    }
  ]
}

Monitoring Dashboard

Setup simple dashboard untuk visualize metrics:

1. Grafana + Prometheus (Advanced)

# Install Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64
./prometheus --config.file=prometheus.yml

2. Simple Web Dashboard

OpenClaw punya built-in dashboard di http://localhost:3000/dashboard yang show:

  • Task execution history
  • Success/failure rates
  • Resource usage charts
  • Recent errors
  • API call statistics

VPS untuk Monitoring Workload

🚀 VPS Indonesia Murah SufaNet

Monitoring yang comprehensive butuh resource yang stable:

  • CPU headroom untuk metrics collection
  • Storage cukup untuk logs dan metrics data
  • Network stabil untuk alerting
  • Uptime guarantee 99.9%
Lihat Paket VPS Indonesia

FAQ

Berapa lama log harus disimpan?

Tergantung compliance requirement. Standard: 30 hari untuk access logs, 90 hari untuk error logs, 1 tahun untuk audit logs.

Tool monitoring apa yang paling ringan?

PM2 built-in monitoring paling ringan. Kalau mau lebih advanced tapi tetap ringan, pakai Netdata (overhead < 3% CPU).

Bagaimana cara debug issue yang sudah resolved?

Check archived logs, audit trail, dan metrics history. Kalau pakai Grafana/Prometheus, bisa time-travel ke waktu issue terjadi.

Kesimpulan

Monitoring dan debugging bukan reactive activity—ini proactive practice yang prevent issues sebelum jadi critical. Dengan setup yang benar:

  • Structured logs → easy troubleshooting
  • Performance metrics → catch bottlenecks early
  • Error tracking → fix issues faster
  • Health checks → detect failures instantly
  • Alerting → stay informed 24/7

👉 Langkah Selanjutnya

System yang reliable butuh monitoring yang solid.