I'm starting a runbook for troubleshooting incorrect VLAN assignment, and I realized I'm missing leveraging Orion NPM's power to make troubleshooting simpler.
What ideas can you come up with to improve this non-Orion-based troubleshooting flow, that could include NPM and NCM?
Runbook: Troubleshooting Incorrect
VLAN Assignment
Technical overview of the problem
- A network device plugged into a switch port is inaccessible
- Device cannot ping its gateway
- Others cannot ping the device
Cause is operator error by either plugging a device with a static IP address into an incorrect VLAN, or a Net Admin has incorrectly configured the port for a VLAN that does not support device’s IP address
Troubleshooting procedures
- Verify the device has the correct IP address, mask, and gateway
- Verify the VLAN assigned to the port is appropriate for the static IP address of the device
Additional troubleshooting (optional if the cause is VLAN mismatched to IP address of the device)
- On the device
- Verify Link light is present
- Temporarily enable DHCP, open a cmd prompt, issue the “ipconfig
/release” and “ipconfig renew” command and then determine what IP address is
received from the network - If the IP address received from DHCP is in a different
subnet, then the original static address will not work - If an address beginning with 169.254.x.x is shown, then
DHCP has timed out. Verify port-fast is
enabled on the switch port and release and renew the ipconfig again. - Review interface error statistics
- Use a network discovery tool such as a Fluke
Link-Sprinter to confirm correct VLAN, speed, duplex, along with the name of
the switch, and the blade/port in play. - If no link light is present:
- Verify L1 connectivity is present and solid all the way
through the drop cable, through the wall, through the patch panel, through the
patch panel, and into the switch - Verify the switch port is not administratively disabled
- Verify the access device’s NIC is not disabled
- Swap the problem device with a known-good-working unit
and retest for link
- Verify L1 connectivity is present and solid all the way
- Temporarily enable DHCP, open a cmd prompt, issue the “ipconfig
- Verify Link light is present
Remedy procedures
- On the switch
- Apply the appropriate VLAN setting for the access device’s static IP address
- Ensure the VLAN is built on both switches (Certain switches will allow you to apply a VLAN to a port even if the VLAN is not present on the switch. This will cause failure every time.)
- Ensure the VLAN is spanned from the Access switch to the Distribution Switch
- Ensure the Distribution switch has an SVI built on it for that VLAN, along with an ip-helper for DHCP forwarding
- Ensure the switch ports involved do not have ACL’s on them that filter out the needed traffic
- Ensure the switch port shows no errors
- Ensure spanning-tree port fast is enabled on the access switch port
- On the access device
- Verify link is present
- Verify DHCP works
- Verify the gateway can be pinged
- Verify subnet mask and gateway are correctly configured
- Verify the IP address of the access device is correct
Test procedures
- On the Distribution Switch or router
- Ping the access device
- If the router or Distribution switch can ping the access device from the SVI for that VLAN, but other devices cannot ping it from different subnets, and you believe they should be able to ping it, then look closely at the edge device’s subnet mask and gateway. It’s common to fat finger a subnet mask, and the result is exactly that—local devices on the subnet, and the router on the subnet, can ping the access device, but things on other subnets cannot ping the access device.
- Verify an ARP entry is present on the router or L3 switch for that IP address.
- Ping sweep the subnet from the router or Distribution switch if other devices are present and working on the same VLAN (for a Class C network, ping x.x.x.0, or ping x.x.x.255, depending on the L3 device)
- Ping the access device
- On the edge device
- Verify correct IP addressing
- Verify link
- Verify the default gateway can be pinged
- Verify other devices can ping the device
Post implementation verification procedures
- Confirm the device can ping the gateway
- Confirm the device can ping devices on other subnets
- Confirm devices on other subnets can ping the device
Change control best practices (prevention measures)
- Identify planned VLAN configurations and port assignments
- Coordinate with the support staff and affected users to ensure the appropriate maintenance window is selected
- Provide notificationof the planned work Complete the work
- Verify successful completion of the tasks
- Provide confirmation of the completed work
-----------------------
Orion NCM/NPM tools to use to quickly prove a VLAN is incorrectly assigned to a port (Here's where your suggestions come in):
- Search NCM 's config change reports to find logs of the specific port's configuration change in the last X days
- ???