---
name: sysadmin
description: Linux system administration, networking diagnostics, and production hardening workflows. Use when handling SSH/connectivity incidents, DNS/routing/firewall issues, host health checks, systemd/service failures, disk or memory pressure, log triage, baseline security checks, or when the user asks for repeatable Linux ops runbooks.
---

# Sysadmin

## Overview

Execute Linux and network operations with a diagnose-first approach.
Prefer minimal-risk commands, capture evidence before changes, and verify outcome after every fix.

## Workflow

1. Confirm scope and blast radius.
2. Capture current state with `scripts/sysdiag.sh` when possible.
3. Isolate layer: host, service, network path, DNS, or policy.
4. Apply the smallest reversible fix.
5. Re-check service health and user-facing behavior.
6. Summarize root cause, change made, and follow-up hardening actions.

## Triage Decision Map

- Connection refused or timeout:
Check `ss -tulpn`, service status, local firewall (`nft list ruleset` or `iptables -S`), and routing (`ip route`).
- Name resolves incorrectly:
Check `/etc/resolv.conf`, `resolvectl status`, `dig`, and local cache behavior.
- Service flapping:
Check `systemctl status`, `journalctl -u <service> --since "-30m"`, restart policy, and resource pressure.
- Packet loss or latency spikes:
Check `ping`, `mtr` (if present), interface errors via `ip -s link`, and host saturation.
- Host unhealthy:
Check CPU, memory, disk inode usage, and top failing units from `systemctl --failed`.

## Command Guardrails

- Prefer read-only diagnostics first.
- Ask before destructive actions (mass deletes, firewall flush, forced reboot).
- For privileged reads, run with `sudo` only when required.
- Before config edits, back up file: `cp <file> <file>.bak.<timestamp>`.
- After change, validate with targeted checks and logs.

## Resources

- Incident runbook and command matrix: `references/runbook.md`
- Snapshot collector: `scripts/sysdiag.sh`