--- name: sysadmin description: Linux system administration, networking diagnostics, and production hardening workflows. Use when handling SSH/connectivity incidents, DNS/routing/firewall issues, host health checks, systemd/service failures, disk or memory pressure, log triage, baseline security checks, or when the user asks for repeatable Linux ops runbooks. --- # Sysadmin ## Overview Execute Linux and network operations with a diagnose-first approach. Prefer minimal-risk commands, capture evidence before changes, and verify outcome after every fix. ## Workflow 1. Confirm scope and blast radius. 2. Capture current state with `scripts/sysdiag.sh` when possible. 3. Isolate layer: host, service, network path, DNS, or policy. 4. Apply the smallest reversible fix. 5. Re-check service health and user-facing behavior. 6. Summarize root cause, change made, and follow-up hardening actions. ## Triage Decision Map - Connection refused or timeout: Check `ss -tulpn`, service status, local firewall (`nft list ruleset` or `iptables -S`), and routing (`ip route`). - Name resolves incorrectly: Check `/etc/resolv.conf`, `resolvectl status`, `dig`, and local cache behavior. - Service flapping: Check `systemctl status`, `journalctl -u --since "-30m"`, restart policy, and resource pressure. - Packet loss or latency spikes: Check `ping`, `mtr` (if present), interface errors via `ip -s link`, and host saturation. - Host unhealthy: Check CPU, memory, disk inode usage, and top failing units from `systemctl --failed`. ## Command Guardrails - Prefer read-only diagnostics first. - Ask before destructive actions (mass deletes, firewall flush, forced reboot). - For privileged reads, run with `sudo` only when required. - Before config edits, back up file: `cp .bak.`. - After change, validate with targeted checks and logs. ## Resources - Incident runbook and command matrix: `references/runbook.md` - Snapshot collector: `scripts/sysdiag.sh`