Network Telemetry Lab -Physical/Virtual Overview

🖥️ Vibranium — Physical Host (ASUS ROG Strix G16 G614PP-MS96)

AMD Ryzen 9 8940HX

2.4GHz (boost 5.2GHz) • 16 cores / 32 threads

32GB DDR5-5200

Dual-channel • Sufficient for 5+ VMs

NVIDIA RTX 5070 8GB

GPU not used for networking lab

1TB NVMe SSD

Fast I/O for VM snapshots

🧠 How vCPUs relate to physical cores & threads (SMT)

AMD Ryzen 9 8940HX: 16 physical cores, each supports Simultaneous Multi-Threading (SMT) → 32 logical processors (threads).
When you assign a vCPU to a VM in VMware Workstation, it schedules execution time across physical host threads.
Rule of thumb: Total assigned vCPUs should not exceed total host threads (32). Overcommit is possible but may cause latency.
For low-latency telemetry lab: Assign vCPUs carefully — network devices rarely need more than 2 vCPUs each.

⚙️ Virtual CPU Allocation — Best Practices for this Lab

VM	vCPUs	Why this allocation
jeannie (Fedora / Tools)	4	Runs Kafka, Prometheus, Grafana, Postgres, 2x Telegraf — moderate concurrency
leanna (Rocky / Ansible)	2	Control node, playbook execution, lightweight
sai (Arista vEOS)	2	Network OS, control plane overhead minimal
emias (Cisco vIOS)	2	Classic IOS lightweight
milána (Juniper vJunos)	2	Junos requires 2 for stable telemetry
demiá (Cisco NX-OSv)	2	Data center switch simulation, gNMI ready

📊 Total vCPUs assigned: 14 out of 32 host threads → comfortable headroom (no overcommit)

💡 VMware schedules vCPUs across physical cores; SMT allows efficient interleaving. Keep per-VM vCPU ≤ physical cores for latency-sensitive workloads.

🖥️ Virtual Machines — vibranium (VMware Workstation Pro)

jeannie

Fedora 43 | 4vCPU / 8GB

192.168.45.129

📦 Tools VM (Docker stack + broker)

leanna

Rocky Linux 9 | 2vCPU / 2GB

192.168.45.130

⚙️ Ansible Control Node

sai

Arista vEOS | 2vCPU / 4GB

192.168.45.131

🔄 gNMI + SNMP source

emias

Cisco vIOS | 2vCPU / 4GB

192.168.45.132

🔌 SNMP / NETCONF (IOS)

milána

Juniper vJunos | 2vCPU / 4GB

192.168.45.133

🌿 Junos telemetry / SNMP

demiá

Cisco NX-OSv | 2vCPU / 8GB

192.168.45.134

🏢 Data Center (gNMI ready)

Fedora (Tools) Rocky (Ansible) Arista Cisco IOS Juniper Cisco NX-OS (Demiá)

🌐 Management Network — VMnet1 (Host‑only) 192.168.45.0/24

📡 jeannie
Fedora | Docker: Kafka, Prometheus, Grafana, Postgres, Telegraf

⚙️ leanna
Rocky Linux 9 | Ansible Core

🌀 sai
Arista vEOS (gNMI / SNMP)

🔷 emias
Cisco vIOS (SNMP)

🌿 milána
Juniper vJunos

🏛️ demá
Cisco NX-OSv (Demiá)

🔗 SSH from leanna → all VMs (Ansible automation)

📡 SNMP polling from jeannie → all network devices • gNMI future streams → Kafka

📐 Reference Architecture — 5-Layer Model (Traditional + Modern)

Layer	Name	Traditional (Poll/SNMP)	Modern (gNMI/Streaming)
L5	Data/Presentation	Grafana, Prometheus, PostgreSQL (jeannie) — long‑term storage
L4	Broker	✅ Kafka on jeannie • topics: snmp.metrics, gnmi.metrics
L3	Tool	Telegraf (producer) • Ansible	gnmic, custom streaming adapters
L2	Protocol	SNMPv2c, NETCONF, Syslog	gNMI, OpenConfig, MDT
L1	Device	sai • emias • milána • demiá

📈 Data Pipeline — SNMP → Kafka → Prometheus → Grafana

sai / emias / milána / demiá ➡️ SNMP (30s) Telegraf (producer) ➡️ topic Kafka :9092 ➡️ consumer Telegraf (consumer) ➡️ Prometheus ➡️ Grafana 📊

📦 Kafka replay ➡️ PostgreSQL (timescale) ➕ future: ML / Splunk

⚙️ Infrastructure as Code — Ansible on leanna (Rocky Linux)

                📁 ~/ansible-lab/

                ├── inventory/hosts.yml

                ├── playbooks/

                │   ├── 01-configure-network.yml

                │   ├── 02-validate-telemetry.yml

                │   └── 03-copp-impact-test.yml

                ├── ansible.cfg

                └── group_vars/

arista.eos cisco.ios junipernetworks.junos cisco.nxos
✅ Multi‑vendor automation (EOS, IOS, Junos, NX-OS)
✅ CoPP impact test playbook

            🚀 ansible-playbook playbooks/01-configure-network.yml – configures SNMP, gNMI, and streaming sensors across all devices.
        

🛡️ Control Plane Policing (CoPP) — Why Broker Wins

❌ Direct polling (10+ Python scripts)

Each script → separate SSH/show commands → device CPU spikes → CoPP drops packets → lost telemetry

→

✅ Broker‑mediated (single poll + Kafka fan-out)

One poll → Kafka streams to N consumers. Device CPU stable, zero CoPP drops.

🔌 Access & Services

Service	URL / Command	Credentials
Grafana	http://192.168.45.129:3000	admin/admin
Prometheus	http://192.168.45.129:9090	—
SSH jeannie	ssh fedora@192.168.45.129	user password
SSH leanna	ssh rocky@192.168.45.130	Ansible control
SSH sai (Arista)	ssh admin@192.168.45.131	admin/admin
SSH emias (IOS)	ssh cisco@192.168.45.132	cisco/cisco
SSH milána (Junos)	ssh root@192.168.45.133	(none)
SSH demiá (NX-OSv)	ssh admin@192.168.45.134	admin/admin

📜 Quickstart — Reproducible Deployment

# On leanna (Ansible control node)
sudo dnf install -y epel-release ansible-core
ansible-galaxy collection install arista.eos cisco.ios junipernetworks.junos cisco.nxos community.docker

# Copy inventory & playbooks to ~/ansible-lab/
ansible-playbook playbooks/01-configure-network.yml

# On jeannie (Tools VM)
cd ~/telemetry-lab
docker compose up -d

# Verify metrics
docker exec kafka kafka-console-consumer --topic snmp.metrics --bootstrap-server localhost:9092 --max-messages 3

⚡ Vibranium | Network Telemetry Lab