ARP is not working on Cisco ASR 1001 X

Problem: Cisco ASR router is loosing connectivity to its directly attached Ethernet neighbors. In this situation interface status is still up, packets are going in and out on both ends, even IPv6 was still working. The actual problem was that the Cisco ASR was ignoring all ARP responses from its neighbors and the ARP table to this interface was empty. Later the same happened on a second interface.

A temporary work around was to reboot the router.

Solution: Cisco support suggested a software upgrade, even though the software was only some weeks old. After the software upgrade the error didn’t happen again until now.
The old IOS version was: asr1001x-universalk9.03.16.03.S.155-3.S3-ext.SPA.bin
The new IOS version is: asr1001x-universalk9.03.16.04a.S.155-3.S4a-ext.SPA.bin

The only fix that possibly fits to the problem is:

https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20160804-wedge

“A remote attacker can cause an interface wedge and an eventual denial of service condition”

What’s an “interface wedge”. Cisco bug reports were more precise years ago.

 

ASR Tips’n’Tricks

ASR-1001-X and IOS-XE is sometimes different and sometimes very similar to classic IOS.

Update. You can update, the firmware as usual:

# copy http: bootflash:
# conf t
(config)# boot system flash bootflash:asr1001x-universalk9.03.16.00.S.155-3.S-ext.SPA.bin

Show SFP (transceiver) info:

# show hw-module interface tenGigabitEthernet 0/0/0 transceiver status
# show hw-module interface tenGigabitEthernet 0/0/0 transceiver idprom

.. to be continued

MLPPP over L2TP over Ethernet Channel Groups on Cisco ASR

 

Problem: After upgrading an ethernet port to a channel-group, all MLPPP connections fail on a Cisco ASR 1002-X. The log file looks like this:

Jul 31 2015 07:04:44.801 CEST: Vi4 PPP: Phase is AUTHENTICATING, Authenticated User
Jul 31 2015 07:04:44.801 CEST: Vi4 CHAP: O SUCCESS id 143 len 4
Jul 31 2015 07:04:44.801 CEST: Vi4 PPP: Phase is VIRTUALIZED
Jul 31 2015 07:04:44.802 CEST: Vi6 MLP: Added link Vi4 to bundle xxx
Jul 31 2015 07:04:44.803 CEST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access4, changed state to up
Jul 31 2015 07:04:44.803 CEST: %LINK-3-UPDOWN: Interface Virtual-Access4, changed state to up
Jul 31 2015 07:04:44.805 CEST: %CPPOSLIB-3-ERROR_NOTIFY: SIP0: cpp_cp:  cpp_cp encountered an error -Traceback= 1#795bed15105852c19a9ac138912d7358   errmsg:7F13FA6E0000+121D cpp_common_os:7F13FD6F1000+D8D5 cpp_common_os:7F13FD6F1000+D7D4 cpp_common_os:7F13FD6F1000+19A3E cpp_ifm:7F14106F1000+A198 cpp_mlppp_svr_lib:7F1406B63000+C351 cpp_mlppp_svr_lib:7F1406B63000+1CDC8 cpp_mlppp_svr_smc_lib:7F1406DA1000+2D28 cpp_common_os:7F13FD6F1000+11E6E cpp_common_os:7F13FD6F1000+118AA cpp_common_os:7F13FD6F1000+116EB evlib:7F13FC6D10
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: O CONFREQ [REQsent] id 13 len 10
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP:    Address xxx
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: Event[Timeout+] State[REQsent to REQsent]
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: O CONFREQ [REQsent] id 14 len 10
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP:    Address xxx
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: Event[Timeout+] State[REQsent to REQsent]

The router continues with “O CONFREQ” but never receives the “I CONFACK”.

Discussion: In this case the router is a L2TP server and handles multiple L2TP/PPP connection. Some of them are multilink PPP connections. The ASR software has a bug that leads to these tracebacks when the L2TP connections are going over an ethernet channel group. We opened a case with Cisco support. After one and a half month we received this answer:

---
Apologies for the delay. I was held up on other critical issues and hence was unable to reach out to you earlier. I was able to decode the tracebacks observed during the time of the issue and the issue points to a known software bug as the cause of the problem. Below are more details

CSCua16777 : FMFP-3-OBJ_DWNLD_TO_CPP_FAILED: SIP0: fman_fp_image: MLP bundle
Bug toolkit link : https://tools.cisco.com/bugsearch/bug/CSCua16777/?reffering_site=dumpcr

However the above bug is in closed state with the below release-notes

Symptom: FMFP-3-OBJ_DWNLD_TO_CPP_FAILED: SIP0: fman_fp_image: MLP bundle 8767, link 8766 download to CPP faile
Conditions: LNS MLPPP sessions don't stay up over port-channel
Workaround: MLPPP over port-channel is not supported on ASR1k. Don't use MLPPP over port-channel.
---

Dear Cisco! This is no solution. If you define an obvious bug as normal behaviour and the only workaround is “Don’t use..”, your customers will soon remember this:
“Cisco ? Don’t use…”

Solution: “Don’t use Cisco ?”

Version: Cisco ASR 1002-X, IOS XE Version: 03.09.02.S

Update: If I use the link of the bug report, I receive this answer:

Insufficient Permissions to View Bug
This bug contains proprietary information and is not yet publicly available.
Cisco Support Community

Unbelievable !! CSCO: STRONG SELL!

Cisco ASR 1002-X and PPTP

Problem: PPTP from any client to an ASR1002-X Cisco does not work. PPTP Connections starts but in PPP LCP phase the connection fails.

Solution: Cisco ASR1002-X with Software IOS-XE 15.3(2)S2 has no PPTP support. You have to take a different Router!

Discussion: The weird thing is, that most of the PPTP stack is still configureable and working, but all packets coming from the client inside the PPTP tunnel are dropped!

Some examples:

.) #show vpdn tunnel
%No active L2TP tunnels
%No active PPTP tunnels

.) in vpdn-group you can set protocol any

.) the router is answering PPTP (TCP 1723)

.) the router starts the PPP layer when a connection is coming in

.) the router even sends LCP O CONFREQ packets to the client

But! The Cisco ASR Router drops every LCP I CONFACK coming from the client.

Cisco was always a reliable piece of hardware for me, but this looks like they removed a feature without removing the code and their QA department worked like this: “it compiles, ship it”.

 

CDP Fun

Problem: You want to know which switch and what port your Linux machine is connected to?

Solution: If the switch does CDP (all Cisco switches do), it tells you a lot of information. Tcpdump can capture and show this information.

# tcpdump -i eth0 -n -v -s 1500 -c 1 'ether[20:2] == 8192' 
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 1500 bytes
16:47:43.099633 CDPv2, ttl: 180s, checksum: 692 (unverified), length 438
         Device-ID (0x01), length: 4 bytes: 'SW10'
         Platform (0x06), length: 20 bytes: 'cisco WS-C3750G-48TS'
         Address (0x02), length: 13 bytes: IPv4 (1) XXX.XXX.XXX.10
         Port-ID (0x03), length: 21 bytes: 'GigabitEthernet3/0/25'
         Capability (0x04), length: 4 bytes: (0x00000029): Router, L2 Switch, IGMP snooping
         Protocol-Hello option (0x08), length: 32 bytes: 
         VTP Management Domain (0x09), length: 7 bytes: 'XXX'
         Native VLAN ID (0x0a), length: 2 bytes: 1
         Duplex (0x0b), length: 1 byte: full
         Management Addresses (0x16), length: 13 bytes: IPv4 (1) XXX.XXX.XXX.10

I highlighted the most relevant information in bold.

Update Cisco Catalyst Software

I had to update the software of a new Cisco Catalyst 4948 yesterday.
As usual I did:

copy tftp://<hostname>/<filename> bootflash:
conf t
boot system flash bootflash:<filename>
exit
reload

But the switch ignored the new software image.
During boot it said:

Booting first image from bootflash

Solution: The config-register was set to 0x2101 right out of the box. I had to change this to 0x2102. With this value the switch honors the bootvar and boots into the configured system image.

conf t
config-register 0x2102
exit
reload

The config-register has different meanings on different cisco plattforms.
For the Catalyst 4948 the config-register contains a bit field explained at:

http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/31sga/configuration/guide/supcfg.html#wp1051990

Cisco Routing – Administrative Distance

Cisco routers are capable of different routing protocols and static and connected routes. Every routing protocol engine has its own distance/metric/weight to decide which route is best.

When a routing protocol has chosen its best route, the route is entered into the routing table. The routing engine uses an “administrative distance” per routing protocol to decide which route to use.  The default administrative distance for all the routing protocols is:

Default Administrative Distances
Connected 0
Static 1
eBGP 20
EIGRP (internal) 90
IGRP 100
OSPF 110
IS-IS 115
RIP 120
EIGRP (external) 170
iBGP 200
EIGRP summary route 5

Source: cisco.com “Route Selection in Cisco Routers”

i.g. if you like to set a static route that is only activated, if there is no eBGP route to the same target, you can configure:

ip route 193.0.0.0 255.255.255.0 80.80.80.80 25

WScale and Cisco Content Switch CSS.

Problem: Windows 7 clients wait 5 seconds to send simple HTTP requests to a web server behind a cisco CSS content switch. But there is no delay when the client connects directly to a web server behind the CSS.

Description: I found one difference between CSS and non-CSS connects: the TCP WSCALE option.
When the CSS is in layer 7 HTTP switching mode (e.g. url “/*”), it does the tcp handshake and the http-request parsing on it’s own, and afterwards it decides which service is receiving the request. For this initial connect the CSS uses WSCALE=0. Which means window scaling is welcome, but the css sets it’s inbound scaling to 1 (2^0). After the request is sent from the client to the css, the css connects to the server, which negotiates WSCALE=8 (scale 256, seams to be the Windows 2008 R2 standard value).
Now the client believes WSCALE=0 for outgoing data and the server believes WSCALE=8 for incoming data.
After the first HTTP request (Keep-alive: active) the Windows server sets it’s window to 256, which is 256*256 byte for the server and 256*1 byte for the client. When the client
requests a second url it has to send the new request in 256 byte pieces and waits for the ack from the server for each packet. And both Linux and Window clients do exactly that. This is a performance penalty of course, but still it should work, but windows sends this 256 byte data pieces after a timeout of 5 seconds. I don’t know why the windows client waits, but it does. Perhaps MS thinks the window will raise magically in the meantime.

Solution: We have 2 bugs here. CSS is doing the WSCALE wrong in layer 7 switching mode, and windows waits 5 seconds before it sends  > 256 byte data into a connection with a 256 byte window. For the first bug I have two work arounds:

  1. Use only layer 3/4 switching, if the url or header based switching is not necessary
  2. Apply the following setting on your css: flow tcp-window-scale disabled

The second work around disables wscale at all. The CSS sends no wscale option, which disables wscale in both directions (RFC 1323). Without the TCP WScale option the maximum window size is 64k, which should be enough. (except for huge file downloads, over far network distances perhaps)

Cisco IP Phone and NTP

Problem: A Cisco IP phone (in my case 7970G) does not sync with an NTP server. The phone sends an NTP request, receives an NTP response, but does not set its date and time.

Solution: It’s a known bug of Cisco’s IP phone software releases 8.3(3) or higher according to Cisco bug id: CSCso40588. Cisco says that these phones should sync their clock via a “Date:”-header in SIP packets, and NTP stays broken. Another info is, that 8.5(2)MN1.2 fixes this bug. We will try this.

See: http://www.cisco.com/en/US/docs/voice_ip_comm/cuipph/firmware/8_5_2/english/release/notes/7900_852.html

 

Windows Network Load Balancing NLB and Cisco Routers/Switches

Problem: Windows NLB IPs are are not reachable through and from Cisco routers and switches. NLB services could be IIS arrays, Exchange CAS arrays, etc.

Solution: NLB mode was set to Multicast. In this mode Windows incorrectly uses multicast mac addresses. Set the NLB mode to Unicast and configure static mac address table entries on your switch to prevent broadcast flooding.

Discussion: Windows NLB does only work if all members of an array get all packets for a balanced service. To achieve this, Windows knows 3 modes of load balancing: Unicast, Multicast, and Multicast IGMP. And all of them have problems:

Unicast: Windows uses a normal mac address for his virtual IP address, but never sends any packet from this mac. The switch never learns a mac address entry and has to broadcast all packets for this mac. (Broadcast Flooding)

Multicast (IGMP): Windows uses mutlicast macs for the virtual IP. This seams correct, because this way the switch could learn which ports are part of the array and which are not (IGMP), but the problem is: an arp request for an unicast address (virtual IP) must not resolve to an multicast mac address. Cisco switches simply ignore this arp responses. Multicast mac addresses start with the lsb bit of the first bit set, typically 01:XX:XX:XX:XX:XX or 03:XX:XX:XX:XX:XX.

My solution was to use Unicast mode and don’t use IGMP. The other solution would be to statically set arp and the mac-address-table on the cisco switch, and force it to use the incorrect mac address.

BTW: A Network Load Balancing mechanism where every array member receives all traffic, in no real “Network” load balancing, because you dont’t reduce the traffic per server, it just adds additional computers and no additional network capacity.

More info: http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml