Problem: Cisco ASR router is loosing connectivity to its directly attached Ethernet neighbors. In this situation interface status is still up, packets are going in and out on both ends, even IPv6 was still working. The actual problem was that the Cisco ASR was ignoring all ARP responses from its neighbors and the ARP table to this interface was empty. Later the same happened on a second interface.
A temporary work around was to reboot the router.
Solution: Cisco support suggested a software upgrade, even though the software was only some weeks old. After the software upgrade the error didn’t happen again until now.
The old IOS version was: asr1001x-universalk9.03.16.03.S.155-3.S3-ext.SPA.bin
The new IOS version is: asr1001x-universalk9.03.16.04a.S.155-3.S4a-ext.SPA.bin
The only fix that possibly fits to the problem is:
Problem: After upgrading an ethernet port to a channel-group, all MLPPP connections fail on a Cisco ASR 1002-X. The log file looks like this:
Jul 31 2015 07:04:44.801 CEST: Vi4 PPP: Phase is AUTHENTICATING, Authenticated User
Jul 31 2015 07:04:44.801 CEST: Vi4 CHAP: O SUCCESS id 143 len 4
Jul 31 2015 07:04:44.801 CEST: Vi4 PPP: Phase is VIRTUALIZED
Jul 31 2015 07:04:44.802 CEST: Vi6 MLP: Added link Vi4 to bundle xxx
Jul 31 2015 07:04:44.803 CEST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access4, changed state to up
Jul 31 2015 07:04:44.803 CEST: %LINK-3-UPDOWN: Interface Virtual-Access4, changed state to up
Jul 31 2015 07:04:44.805 CEST: %CPPOSLIB-3-ERROR_NOTIFY: SIP0: cpp_cp: cpp_cp encountered an error -Traceback= 1#795bed15105852c19a9ac138912d7358 errmsg:7F13FA6E0000+121D cpp_common_os:7F13FD6F1000+D8D5 cpp_common_os:7F13FD6F1000+D7D4 cpp_common_os:7F13FD6F1000+19A3E cpp_ifm:7F14106F1000+A198 cpp_mlppp_svr_lib:7F1406B63000+C351 cpp_mlppp_svr_lib:7F1406B63000+1CDC8 cpp_mlppp_svr_smc_lib:7F1406DA1000+2D28 cpp_common_os:7F13FD6F1000+11E6E cpp_common_os:7F13FD6F1000+118AA cpp_common_os:7F13FD6F1000+116EB evlib:7F13FC6D10
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: O CONFREQ [REQsent] id 13 len 10
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: Address xxx
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: Event[Timeout+] State[REQsent to REQsent]
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: O CONFREQ [REQsent] id 14 len 10
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: Address xxx
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: Event[Timeout+] State[REQsent to REQsent]
The router continues with “O CONFREQ” but never receives the “I CONFACK”.
Discussion: In this case the router is a L2TP server and handles multiple L2TP/PPP connection. Some of them are multilink PPP connections. The ASR software has a bug that leads to these tracebacks when the L2TP connections are going over an ethernet channel group. We opened a case with Cisco support. After one and a half month we received this answer:
Apologies for the delay. I was held up on other critical issues and hence was unable to reach out to you earlier. I was able to decode the tracebacks observed during the time of the issue and the issue points to a known software bug as the cause of the problem. Below are more details
CSCua16777 : FMFP-3-OBJ_DWNLD_TO_CPP_FAILED: SIP0: fman_fp_image: MLP bundle
Bug toolkit link : https://tools.cisco.com/bugsearch/bug/CSCua16777/?reffering_site=dumpcr
However the above bug is in closed state with the below release-notes
Symptom: FMFP-3-OBJ_DWNLD_TO_CPP_FAILED: SIP0: fman_fp_image: MLP bundle 8767, link 8766 download to CPP faile
Conditions: LNS MLPPP sessions don't stay up over port-channel
Workaround: MLPPP over port-channel is not supported on ASR1k. Don't use MLPPP over port-channel.
Dear Cisco! This is no solution. If you define an obvious bug as normal behaviour and the only workaround is “Don’t use..”, your customers will soon remember this:
“Cisco ? Don’t use…”
Solution: “Don’t use Cisco ?”
Version: Cisco ASR 1002-X, IOS XE Version: 03.09.02.S
Update: If I use the link of the bug report, I receive this answer:
Insufficient Permissions to View Bug
This bug contains proprietary information and is not yet publicly available.
Cisco routers are capable of different routing protocols and static and connected routes. Every routing protocol engine has its own distance/metric/weight to decide which route is best.
When a routing protocol has chosen its best route, the route is entered into the routing table. The routing engine uses an “administrative distance” per routing protocol to decide which route to use. The default administrative distance for all the routing protocols is:
Default Administrative Distances
EIGRP summary route
Source: cisco.com “Route Selection in Cisco Routers”
i.g. if you like to set a static route that is only activated, if there is no eBGP route to the same target, you can configure:
Problem: Windows 7 clients wait 5 seconds to send simple HTTP requests to a web server behind a cisco CSS content switch. But there is no delay when the client connects directly to a web server behind the CSS.
Description: I found one difference between CSS and non-CSS connects: the TCP WSCALE option.
When the CSS is in layer 7 HTTP switching mode (e.g. url “/*”), it does the tcp handshake and the http-request parsing on it’s own, and afterwards it decides which service is receiving the request. For this initial connect the CSS uses WSCALE=0. Which means window scaling is welcome, but the css sets it’s inbound scaling to 1 (2^0). After the request is sent from the client to the css, the css connects to the server, which negotiates WSCALE=8 (scale 256, seams to be the Windows 2008 R2 standard value).
Now the client believes WSCALE=0 for outgoing data and the server believes WSCALE=8 for incoming data.
After the first HTTP request (Keep-alive: active) the Windows server sets it’s window to 256, which is 256*256 byte for the server and 256*1 byte for the client. When the client
requests a second url it has to send the new request in 256 byte pieces and waits for the ack from the server for each packet. And both Linux and Window clients do exactly that. This is a performance penalty of course, but still it should work, but windows sends this 256 byte data pieces after a timeout of 5 seconds. I don’t know why the windows client waits, but it does. Perhaps MS thinks the window will raise magically in the meantime.
Solution: We have 2 bugs here. CSS is doing the WSCALE wrong in layer 7 switching mode, and windows waits 5 seconds before it sends > 256 byte data into a connection with a 256 byte window. For the first bug I have two work arounds:
Use only layer 3/4 switching, if the url or header based switching is not necessary
Apply the following setting on your css: flow tcp-window-scale disabled
The second work around disables wscale at all. The CSS sends no wscale option, which disables wscale in both directions (RFC 1323). Without the TCP WScale option the maximum window size is 64k, which should be enough. (except for huge file downloads, over far network distances perhaps)
Problem: A Cisco IP phone (in my case 7970G) does not sync with an NTP server. The phone sends an NTP request, receives an NTP response, but does not set its date and time.
Solution: It’s a known bug of Cisco’s IP phone software releases 8.3(3) or higher according to Cisco bug id: CSCso40588. Cisco says that these phones should sync their clock via a “Date:”-header in SIP packets, and NTP stays broken. Another info is, that 8.5(2)MN1.2 fixes this bug. We will try this.
Problem: Windows NLB IPs are are not reachable through and from Cisco routers and switches. NLB services could be IIS arrays, Exchange CAS arrays, etc.
Solution: NLB mode was set to Multicast. In this mode Windows incorrectly uses multicast mac addresses. Set the NLB mode to Unicast and configure static mac address table entries on your switch to prevent broadcast flooding.
Discussion: Windows NLB does only work if all members of an array get all packets for a balanced service. To achieve this, Windows knows 3 modes of load balancing: Unicast, Multicast, and Multicast IGMP. And all of them have problems:
Unicast: Windows uses a normal mac address for his virtual IP address, but never sends any packet from this mac. The switch never learns a mac address entry and has to broadcast all packets for this mac. (Broadcast Flooding)
Multicast (IGMP): Windows uses mutlicast macs for the virtual IP. This seams correct, because this way the switch could learn which ports are part of the array and which are not (IGMP), but the problem is: an arp request for an unicast address (virtual IP) must not resolve to an multicast mac address. Cisco switches simply ignore this arp responses. Multicast mac addresses start with the lsb bit of the first bit set, typically 01:XX:XX:XX:XX:XX or 03:XX:XX:XX:XX:XX.
My solution was to use Unicast mode and don’t use IGMP. The other solution would be to statically set arp and the mac-address-table on the cisco switch, and force it to use the incorrect mac address.
BTW: A Network Load Balancing mechanism where every array member receives all traffic, in no real “Network” load balancing, because you dont’t reduce the traffic per server, it just adds additional computers and no additional network capacity.