I had to upgrade several F5 load balancers from 11.5 to 12.1 in the last weeks. Usually updating F5 is quiet easy, but there are bugs or annoyances you should know:
Sometimes F5 asks for re-activating after the first boot into the new version. It seems that you have to install the new version in a specific order to prevent this: BIGIP-Firmware, licence re-activate, BIGIP-Hotfix, Restart.
Remember the appliance has to be in stand by mode when re-activating the licence.
If the F5 asks for licence re-activation after reboot, it should be easy to re-activate. But even after licence activation, F5 is not working correctly. The SNMP MIB for LTM is not complete. You have to reboot again to activate the LTM SNMP MIB tree again.
When switching from 11.5 to 12.1 the SNMP MIB changed. Serious manufactures that special care to keep the SNMP stable and compatible. F5 doesn’t they changed data types from 11.5 to 12.1 which means you have to update the MIB database. On the other hand if you do, you cannot query those OIDs from older machines. That’s why other manufactures never change data type, The correct way is to add new OIDs wait some years and deprecate the old OID. F5 doesn’t. Here’s a diff part of mibs_f5/F5-BIGIP-LOCAL-MIB.txt:
Beside this breaking incompatibility between 11.5 and 12.1, they also changed some value names, which breaks software that used these names. This is not a bug but still annoying. Remember: an API has to be stable and backward compatible.
Problem: after upgrading from Debian 6 to Debian 8 some of the machines lose their ethernet network connection under heavy load for some seconds rarely. You find lines like these in syslog:
[2333099.217735] NETDEV WATCHDOG: eth1 (tg3): transmit queue 0 timed out
[2333099.217966] tg3 0000:03:04.1 eth1: transmit timed out, resetting
[2333099.384391] tg3 0000:03:04.1 eth1: 0: Host status block [00000001:0000003c:(0000:0018:0000):(0018:01e9)]
[2333099.386091] tg3 0000:03:04.1 eth1: 0: NAPI info [00000022:00000022:(0016:01e9:01ff):019a:(0062:0000:0000:0000)]
[2333099.610954] tg3 0000:03:04.1 eth1: Link is down
[2333102.731813] tg3 0000:03:04.1 eth1: Link is up at 1000 Mbps, full duplex
[2333102.731822] tg3 0000:03:04.1 eth1: Flow control is off for TX and off for RX
The Debian upgrade changes the kernel and the new kernel seems to be not as stable as the old one which ran for years without any problem. One of the differences I found in the drivers is the ethernet acceleration mode for tg3 cards.
Workaround: after disabling some ethernet acceleration features I had no link resets. The computer is running about 9 weeks now with these settings:
/sbin/ethtool -K eth1 tso off
/sbin/ethtool -K eth1 gso off
/sbin/ethtool -K eth1 gro off
These commands disable segment offloading on eth1.
Problem: after compiling sendmail on Debian7 with “./Build” sendmail does not recognize hash .db files. You see the following error message:
readcf: map access: class hash not available
Discussion: ./Build should detect the berkley DB automatically. When devtools/bin/configure.sh finds libdb.so it adds -DNEWDB as compile option. On Debian7 the libdb.so file moved to /usr/lib/x86_64-linux-gnu/ and configure.sh fails to detect libdb.
Workaround: Link the libdb.so and libdb.a file to /usr/lib with these commands:
ln -s x86_64-linux-gnu/libdb-5.1.a libdb.a
ln -s x86_64-linux-gnu/libdb-5.1.so libdb.so
Problem: You have a network with two upstream routers and an F5 LTM loadbalancer. Even though the default gateway points to router R1 the F5 LTM sends packets to the mac address of R2.
Discussion: “This is a feature not a bug”. This “Feature” is called “Auto Last Hop”. Which means the F5 answers packets allways to the mac address of the received packet. This may be usefull in some cases. But from the view of standards, compliance and security this behavior is a bug. In my case R2 sent some traffic to the F5 because of BGP multihoming, and received the answer allthough R1 should have received the traffic. Unfortunatly this setting is “Enabled” by default on F5.
If a hacker manages to inject a request with a forged IP address, he will receive the answer even if the route to this IP points in a different direction.
Solution: This “Feature” can (and should) be disabled, if you don’t explicitly need it. It can be disabled globally, per VLAN, per Virtual Server or per SNAT policy.
You can find this setting globally in the web interface: System -> Configuration -> Local Traffic -> General
And for VLAN it can be disabled at: Network -> VLANs -> Configuration [ Advanced]
Problem: After upgrading an ethernet port to a channel-group, all MLPPP connections fail on a Cisco ASR 1002-X. The log file looks like this:
Jul 31 2015 07:04:44.801 CEST: Vi4 PPP: Phase is AUTHENTICATING, Authenticated User
Jul 31 2015 07:04:44.801 CEST: Vi4 CHAP: O SUCCESS id 143 len 4
Jul 31 2015 07:04:44.801 CEST: Vi4 PPP: Phase is VIRTUALIZED
Jul 31 2015 07:04:44.802 CEST: Vi6 MLP: Added link Vi4 to bundle xxx
Jul 31 2015 07:04:44.803 CEST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access4, changed state to up
Jul 31 2015 07:04:44.803 CEST: %LINK-3-UPDOWN: Interface Virtual-Access4, changed state to up
Jul 31 2015 07:04:44.805 CEST: %CPPOSLIB-3-ERROR_NOTIFY: SIP0: cpp_cp: cpp_cp encountered an error -Traceback= 1#795bed15105852c19a9ac138912d7358 errmsg:7F13FA6E0000+121D cpp_common_os:7F13FD6F1000+D8D5 cpp_common_os:7F13FD6F1000+D7D4 cpp_common_os:7F13FD6F1000+19A3E cpp_ifm:7F14106F1000+A198 cpp_mlppp_svr_lib:7F1406B63000+C351 cpp_mlppp_svr_lib:7F1406B63000+1CDC8 cpp_mlppp_svr_smc_lib:7F1406DA1000+2D28 cpp_common_os:7F13FD6F1000+11E6E cpp_common_os:7F13FD6F1000+118AA cpp_common_os:7F13FD6F1000+116EB evlib:7F13FC6D10
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: O CONFREQ [REQsent] id 13 len 10
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: Address xxx
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: Event[Timeout+] State[REQsent to REQsent]
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: O CONFREQ [REQsent] id 14 len 10
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: Address xxx
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: Event[Timeout+] State[REQsent to REQsent]
The router continues with “O CONFREQ” but never receives the “I CONFACK”.
Discussion: In this case the router is a L2TP server and handles multiple L2TP/PPP connection. Some of them are multilink PPP connections. The ASR software has a bug that leads to these tracebacks when the L2TP connections are going over an ethernet channel group. We opened a case with Cisco support. After one and a half month we received this answer:
Apologies for the delay. I was held up on other critical issues and hence was unable to reach out to you earlier. I was able to decode the tracebacks observed during the time of the issue and the issue points to a known software bug as the cause of the problem. Below are more details
CSCua16777 : FMFP-3-OBJ_DWNLD_TO_CPP_FAILED: SIP0: fman_fp_image: MLP bundle
Bug toolkit link : https://tools.cisco.com/bugsearch/bug/CSCua16777/?reffering_site=dumpcr
However the above bug is in closed state with the below release-notes
Symptom: FMFP-3-OBJ_DWNLD_TO_CPP_FAILED: SIP0: fman_fp_image: MLP bundle 8767, link 8766 download to CPP faile
Conditions: LNS MLPPP sessions don't stay up over port-channel
Workaround: MLPPP over port-channel is not supported on ASR1k. Don't use MLPPP over port-channel.
Dear Cisco! This is no solution. If you define an obvious bug as normal behaviour and the only workaround is “Don’t use..”, your customers will soon remember this:
“Cisco ? Don’t use…”
Solution: “Don’t use Cisco ?”
Version: Cisco ASR 1002-X, IOS XE Version: 03.09.02.S
Update: If I use the link of the bug report, I receive this answer:
Insufficient Permissions to View Bug
This bug contains proprietary information and is not yet publicly available.