Autonomous MSP

Write something

Pinned

Mar 24 •

Hey everyone 👋 I just published the full course we've been building toward — Build Your First AI Agent from Scratch. 8 modules. 8 Colab notebooks. Completely free. This is not a theory course. Every module builds one piece of a real system: - Module 1 — why your Ansible scripts break and agents don't - Module 2 — your first agent in plain Python, no API key needed - Module 3 — SSH tools, REST calls, database queries — the agent's hands - Module 4 — ChromaDB memory so your agent remembers past incidents - Module 5 — LangGraph state machine (think OSPF FSM, but for reasoning) - Module 6 — MCP, RAG, topology context - Module 7 — production safety, audit logs, observability - Module 8 — capstone: full OSPF troubleshooting agent, 25-minute diagnosis in 90 seconds The whole thing is designed for network engineers, not data scientists. Every example uses OSPF, BGP, Netmiko and NOC scenarios you already know. 👉 https://netsec-academy.vexpertai.com/ No login. No paywall. Just go. If you finish Module 8 and run the capstone notebook — post your run summary here. I want to see your confidence scores.

Pinned

Eduard Dulharu

Mar 21 •

Build With Me

Module 1 Lesson 2 - What an AI Agent Actually Is

📌 Read time: ~7 min | Module 1 of 8 --- Let's Be Precise There is a lot of noise around "AI agents." Vendors call everything an agent. Chatbots get called agents. Automation platforms get called agents. Most of it is marketing. Here is the working definition we will use in this course: An AI agent is a software system that perceives its environment, reasons about what to do, takes actions using tools and loops — until it reaches a goal or determines it cannot. Four words matter: perceives, reasons, acts, loops. Take away any one of them and you have something else. --- The Four Pillars Think about how you troubleshoot a network problem you have never seen before. You are not running a script. You are doing something more sophisticated. Let's name it. Pillar 1: Reasoning Engine This is the brain. For you, it is 10 years of network experience. For the agent, it is a large language model — Claude, GPT-4, Llama. It understands concepts and decides what step to take next. Not magic — a very good pattern-matcher that has read every RFC, every Cisco TAC document, and every BGP Stack Overflow answer ever written. Pillar 2: Memory When you walk into a client's NOC for the first time, you are less effective than after six months running that network. The difference is your mental model. An agent has the same concept. An agent without memory starts from zero every time. Pillar 3: Tools Knowing what to do is not the same as being able to do it. Tools are the agent's hands — SSH execution, API calls to your SNMP queries, log fetching, ticket creation. The agent does not run tools blindly. The reasoning engine decides which tool to call, with which parameters, based on what it has found so far. Pillar 4: Context Everything the agent knows at the moment it is working — the client's network documentation, previous ticket history, the output of the last five commands it ran. When you give your agent good context, it performs like your best engineer. When you give it no context, it guesses.

Pinned

Eduard Dulharu

Mar 21 •

Build With Me

Module 1 · Lesson 1 The MSP Problem — Why Scripts Will Not Scale You

You have 50 clients. Each one has a runbook or script written by a different engineer at a different time. None of them share knowledge. When engineer A figures out that a specific vendor drops BGP sessions under high CPU, that knowledge lives in A's head. Next week engineer B hits the same thing on a different client. Starts from zero. Same two hours of investigation. Scripts do not accumulate knowledge. Agents can. An agent that troubleshot 500 BGP issues across your client base brings all of that context to the 501st. Not because you programmed it in — because it has memory. The path to scaling your MSP without scaling headcount is not more scripts. It is agents that investigate, remember and act. --- 👇 What's next: Lesson 2 — What an AI Agent Actually Is

Eduard Dulharu

Mar 21 •

Build With Me

Module 03 · Lesson 3 — The Safety Gate

All Tool Calls Go Through One Function. No Exceptions. You do not call tool.execute() directly from the agent loop. You call execute_with_approval(). Always. This function is the single chokepoint where the three access levels are enforced and every action is logged. The agent loop does not need to know whether a tool is READ, WRITE, or ADMIN — the gate handles that. --- The Audit Log First, the logging function. Every action — approved or rejected — gets a record. import json import datetime AUDIT_LOG_PATH = "audit.jsonl" def _log_action(tool_name: str, params: dict, approved: bool, operator: str): entry = { "timestamp": datetime.datetime.utcnow().isoformat() + "Z", "tool": tool_name, "params": params, "approved": approved, "operator": operator, } with open(AUDIT_LOG_PATH, "a") as f: f.write(json.dumps(entry) + "\n") JSONL format — one JSON object per line. Easy to ship to a SIEM. Easy to search with grep or jq. Each line is self-contained. Think of this as your debug ip ospf events equivalent — a timestamped trail of everything the agent did or tried to do. --- The Full Safety Gate def execute_with_approval(tool: BaseTool, params: dict, operator: str = "unknown") -> ToolResult: """The safety gate. Every tool call goes through here.""" if tool.category == READ: result = tool.execute(**params) _log_action(tool.name, params, approved=True, operator=operator) return result if tool.category == WRITE: print(f"\n{'='*55}") print(" WRITE OPERATION REQUESTED") print(f"{'='*55}") print(f" Tool : {tool.name}") print(f" Params: {json.dumps(params, indent=4)}") if "diff" in params: print(f"\n Config diff:\n{params['diff']}") print(f"{'='*55}") answer = input(" Approve? (y/n): ").strip().lower() approved = (answer == "y") _log_action(tool.name, params, approved=approved, operator=operator) if not approved: return ToolResult(success=False, data={}, error="Operator rejected.")

Eduard Dulharu

Mar 21 •

Build With Me

Module 3 · Lesson 1 — Tool Design Philosophy

The LLM Does Not Touch Your Network Before anything else, get this right in your head: the language model cannot execute code. It cannot SSH into a router. It cannot run a Python function. What it can do is produce text — and that text, if you structure it correctly, looks like a tool call request. Your Python code reads that request. Your Python code decides whether to honor it. Your Python code runs the actual SSH session. That separation is not a design choice you can skip. It is where every safety control lives. --- Three Access Levels. No Exceptions. Think of these like privilege levels on a Cisco router. Every tool you build belongs to exactly one of three levels: - READ — show commands, lookups, pings. Runs automatically. Cannot change anything. - WRITE — config changes, interface bounces. Agent proposes, you type y to approve. - ADMIN — high-impact operations. You must type YES I CONFIRM exactly. You are running this agent against client infrastructure. The access level is not a suggestion. Here is how you define them in Python — plain string constants, nothing fancy: READ = "read" # privilege 1 — show commands only, auto-execute WRITE = "write" # privilege 7 — config changes, needs y/n approval ADMIN = "admin" # privilege 15 — destructive ops, needs YES I CONFIRM --- ToolResult: What Every Tool Returns Every tool returns exactly the same shape. This is important — the agent loop always knows what it is getting back. class ToolResult: """ Every tool returns this. Always the same three fields: success — True or False data — what came back, e.g. {"raw_output": "..."} error — error message if success is False, empty string otherwise """ def __init__(self, success: bool, data: dict, error: str = ""): self.success = success self.data = data self.error = error Usage: result = ToolResult(success=True, data={"output": "neighbor FULL"}) print(result.success) # True print(result.data) # {"output": "neighbor FULL"}

1-15 of 15

Autonomous MSP

skool.com/autonomous-msp-2162

AI-powered NOC, SOC and compliance for MSPs and IT consultancies. Built by a 25-year enterprise network practitioner.

Members

Online

Admin

Kickstarter Challenge

Zero To Founder by Tom Bilyeu

Unison Producer Growth Hub

Ina's Dance Academy

The Acting Lab

Bring people together around your passion and get paid.