Problem: after upgrading from Debian 6 to Debian 8 some of the machines lose their ethernet network connection under heavy load for some seconds rarely. You find lines like these in syslog:
[2333099.217735] NETDEV WATCHDOG: eth1 (tg3): transmit queue 0 timed out
[2333099.217966] tg3 0000:03:04.1 eth1: transmit timed out, resetting
[2333099.384391] tg3 0000:03:04.1 eth1: 0: Host status block [00000001:0000003c:(0000:0018:0000):(0018:01e9)]
[2333099.386091] tg3 0000:03:04.1 eth1: 0: NAPI info [00000022:00000022:(0016:01e9:01ff):019a:(0062:0000:0000:0000)]
[2333099.610954] tg3 0000:03:04.1 eth1: Link is down
[2333102.731813] tg3 0000:03:04.1 eth1: Link is up at 1000 Mbps, full duplex
[2333102.731822] tg3 0000:03:04.1 eth1: Flow control is off for TX and off for RX
The Debian upgrade changes the kernel and the new kernel seems to be not as stable as the old one which ran for years without any problem. One of the differences I found in the drivers is the ethernet acceleration mode for tg3 cards.
Workaround: after disabling some ethernet acceleration features I had no link resets. The computer is running about 9 weeks now with these settings:
/sbin/ethtool -K eth1 tso off
/sbin/ethtool -K eth1 gso off
/sbin/ethtool -K eth1 gro off
These commands disable segment offloading on eth1.
Problem: after compiling sendmail on Debian7 with “./Build” sendmail does not recognize hash .db files. You see the following error message:
readcf: map access: class hash not available
Discussion: ./Build should detect the berkley DB automatically. When devtools/bin/configure.sh finds libdb.so it adds -DNEWDB as compile option. On Debian7 the libdb.so file moved to /usr/lib/x86_64-linux-gnu/ and configure.sh fails to detect libdb.
Workaround: Link the libdb.so and libdb.a file to /usr/lib with these commands:
ln -s x86_64-linux-gnu/libdb-5.1.a libdb.a
ln -s x86_64-linux-gnu/libdb-5.1.so libdb.so
Problem: You have a network with two upstream routers and an F5 LTM loadbalancer. Even though the default gateway points to router R1 the F5 LTM sends packets to the mac address of R2.
Discussion: “This is a feature not a bug”. This “Feature” is called “Auto Last Hop”. Which means the F5 answers packets allways to the mac address of the received packet. This may be usefull in some cases. But from the view of standards, compliance and security this behavior is a bug. In my case R2 sent some traffic to the F5 because of BGP multihoming, and received the answer allthough R1 should have received the traffic. Unfortunatly this setting is “Enabled” by default on F5.
If a hacker manages to inject a request with a forged IP address, he will receive the answer even if the route to this IP points in a different direction.
Solution: This “Feature” can (and should) be disabled, if you don’t explicitly need it. It can be disabled globally, per VLAN, per Virtual Server or per SNAT policy.
You can find this setting globally in the web interface: System -> Configuration -> Local Traffic -> General
And for VLAN it can be disabled at: Network -> VLANs -> Configuration [ Advanced]
Problem: After upgrading an ethernet port to a channel-group, all MLPPP connections fail on a Cisco ASR 1002-X. The log file looks like this:
Jul 31 2015 07:04:44.801 CEST: Vi4 PPP: Phase is AUTHENTICATING, Authenticated User
Jul 31 2015 07:04:44.801 CEST: Vi4 CHAP: O SUCCESS id 143 len 4
Jul 31 2015 07:04:44.801 CEST: Vi4 PPP: Phase is VIRTUALIZED
Jul 31 2015 07:04:44.802 CEST: Vi6 MLP: Added link Vi4 to bundle xxx
Jul 31 2015 07:04:44.803 CEST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access4, changed state to up
Jul 31 2015 07:04:44.803 CEST: %LINK-3-UPDOWN: Interface Virtual-Access4, changed state to up
Jul 31 2015 07:04:44.805 CEST: %CPPOSLIB-3-ERROR_NOTIFY: SIP0: cpp_cp: cpp_cp encountered an error -Traceback= 1#795bed15105852c19a9ac138912d7358 errmsg:7F13FA6E0000+121D cpp_common_os:7F13FD6F1000+D8D5 cpp_common_os:7F13FD6F1000+D7D4 cpp_common_os:7F13FD6F1000+19A3E cpp_ifm:7F14106F1000+A198 cpp_mlppp_svr_lib:7F1406B63000+C351 cpp_mlppp_svr_lib:7F1406B63000+1CDC8 cpp_mlppp_svr_smc_lib:7F1406DA1000+2D28 cpp_common_os:7F13FD6F1000+11E6E cpp_common_os:7F13FD6F1000+118AA cpp_common_os:7F13FD6F1000+116EB evlib:7F13FC6D10
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: O CONFREQ [REQsent] id 13 len 10
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: Address xxx
Jul 31 2015 07:04:45.152 CEST: Vi6 IPCP: Event[Timeout+] State[REQsent to REQsent]
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: O CONFREQ [REQsent] id 14 len 10
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: Address xxx
Jul 31 2015 07:04:47.168 CEST: Vi6 IPCP: Event[Timeout+] State[REQsent to REQsent]
The router continues with “O CONFREQ” but never receives the “I CONFACK”.
Discussion: In this case the router is a L2TP server and handles multiple L2TP/PPP connection. Some of them are multilink PPP connections. The ASR software has a bug that leads to these tracebacks when the L2TP connections are going over an ethernet channel group. We opened a case with Cisco support. After one and a half month we received this answer:
Apologies for the delay. I was held up on other critical issues and hence was unable to reach out to you earlier. I was able to decode the tracebacks observed during the time of the issue and the issue points to a known software bug as the cause of the problem. Below are more details
CSCua16777 : FMFP-3-OBJ_DWNLD_TO_CPP_FAILED: SIP0: fman_fp_image: MLP bundle
Bug toolkit link : https://tools.cisco.com/bugsearch/bug/CSCua16777/?reffering_site=dumpcr
However the above bug is in closed state with the below release-notes
Symptom: FMFP-3-OBJ_DWNLD_TO_CPP_FAILED: SIP0: fman_fp_image: MLP bundle 8767, link 8766 download to CPP faile
Conditions: LNS MLPPP sessions don't stay up over port-channel
Workaround: MLPPP over port-channel is not supported on ASR1k. Don't use MLPPP over port-channel.
Dear Cisco! This is no solution. If you define an obvious bug as normal behaviour and the only workaround is “Don’t use..”, your customers will soon remember this:
“Cisco ? Don’t use…”
Solution: “Don’t use Cisco ?”
Version: Cisco ASR 1002-X, IOS XE Version: 03.09.02.S
Update: If I use the link of the bug report, I receive this answer:
Insufficient Permissions to View Bug
This bug contains proprietary information and is not yet publicly available.
Problem: A fresh installed Debian 8 (Jessie) 32bit 686 uses only one core of a 6 core Xeon CPU. amd64 kernels don’t have this problem.
Solution: The HP Proliant DL360 Gen9 has a BIOS option called “Processor x2APIC Support”. When you set this option to “Disabled” the Linux kernel uses all 6 cores. x2APIC is a new controller for multi core CPUs the works only for 64bit kernels (it seems).
Versions: CPU: Intel(R) Xeon(R) CPU E5-2620 v3, tested with Debian Linux Kernel: 3.16.7-ckt9-3~deb8u1 and self compiled kernel: linux-3.18.13 from kernel.org
Stats: compiling a Linux kernel on one core: 56 min
compiling on 6 cores (12 threads): 7 min