I noticed that every ~2 minutes my packets got dropped for ~3s, sometimes longer up to 15s. During that time one CPU si% (software interrupts) goes to >60% and then stays at 30% for 30s. That affects both WAN<->LAN traffic and LAN<->LAN traffic through router.
Any ideas what could cause that and how to fix? It’s very annoying that sometimes all the clients losing connectivity.
I tried to monitor local processes but fail to see any correlations.
Version: TurrisOS 5.1.9, Turris Omnia, but I think that started even before 5.1.8.
It happens over all connected ports, eth2 is SFP.
Drops:
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 532
RX: bytes packets errors dropped overrun mcast
17199186 61323 0 355 0 0
TX: bytes packets errors dropped carrier collsns
31419324 69177 0 0 0 0
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 532
RX: bytes packets errors dropped overrun mcast
337254821 360870 0 2409 0 0
TX: bytes packets errors dropped carrier collsns
16888669 72436 0 0 0 0
5: lan0@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP group default qlen 1000
RX: bytes packets errors dropped overrun mcast
1251559 11348 0 5 0 0
TX: bytes packets errors dropped carrier collsns
1925363 12598 0 0 0 0
6: lan1@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-lan state UP group default qlen 1000
RX: bytes packets errors dropped overrun mcast
11695058 36241 0 151 0 0
TX: bytes packets errors dropped carrier collsns
25397109 41635 0 0 0 0
Just a guess: I had a similar problem with cpu load when my Apple TV 4K was in the network. As soon i removed it, all was fine again. Its may a try and i actually was thinking it was about IPv6 of Apple TV … but in the end i just don’t use it anymore (Shield TV now).
I don’t have apple TV in my network, and there wasn’t huge load in the network during the problem.
Also it isn’t an excuse: I think border router should be stable whatever external or internal clients do.
Now I upgraded my Omnia back to TurrisOS 5.1.10 and got the same problem, but when I switched to alternative wifi driver and looks like it went away. But the driver isn’t stable and I’m loosing wifi sometimes.
I also removed all the not strictly necessary services (netdata, all data collection, etc). I’m going to add them back one by one when I have time.
Hi, i didn’t wrote “network load” just “cpu load”!!! so the load on cpu was high while the network was quite nothing. CPU went up to 60% sometimes 90% and all other clients got slow while surfing or had issues with streaming.
As you know i am not part of Turris and just wanted to help, so addressing any anger about the router to me is waste of time.
I am using 5.1.9 without any issues now and standard drivers. And if i need to use my Apple TV i use ethernet as it seems to happen more less than with wifi.
If you can reproduce your drop packages stuff. Try it with only one client active…then go on…but as said i am not a professional router developer nor i want to become one
So you see it is similar for me…but i do not have any issues with connectivity (playing CS…so would recognize it quite fast). Also have SFP module installed (eth2)
btw. the most packages are droped on RX side from internet…so it is also maybe a routing problem of your provider.
And maybe the firewall is responsible for dropping them
Chris, I’m sorry if it sounded like an anger attack. I didn’t mean it. Yeah I full of anger to the router and myself for buying it. It almost works, but that “almost” costs me a lot of time.
Your number of dropped packages on eth1 is huge. I’d investigate that. I have only 1257 for 6 days of uptime (5.1.10 with alternative wifi driver).
I’m reading https://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/ to get understanding which packets could counted as dropped. Unfortunately driver and/or ethernet don’t support enhance features like big NIC queue sizes (maximum is only 128 for RX) or control of network flows. I hope to get better understanding what’s going on or maybe I’ll give up and start using one of my servers as border router instead.
I don’t think that firewall drops counted as eth dropped packets. Try to use top and check software interrupts (si), it should be quite low (<3%), and only goes to ~50% when you pushing full gigabit through omnia’s CPU.
0.2% of packages dropped on external interface is okay…and uptime was 16d.
And i think it is the firewall or for good reason. Cause my wireguard device has this:
So no traffic but many drops and 2 errors.
What is also the case here is that dropps are no errors.
Since https://www.cyberciti.biz/faq/linux-show-dropped-packets-per-interface-command/ says that the drop is including firewall. in netdata i only see all 15 min a drop on lan0, what is my Mac in standby…
You may also check ifconfig shows dropped rx packets | Support | SUSE
CPU is low at 0.5…and 40% on full 1GB/s download…so all fine…even with that drop rate. So i have no issues and dropped packages are normal and working as designed. So maybe your connections issues are not related to the drops or drops are actually also only a symptom. In 4.0.5 there are also many newer features missing, so it is may also that one of the clients is not compatible to one of the new features. So again…test with only one client in network and post result
PS: since my first post the drops did not increase on eth2…so it is timely manner…what is a good reason for firewall stopped attack
I’ve checked and firewall isn’t related to interface drops. I was adding a rule with -j DROP into different tables and tried to send packets and that didn’t change any of dropped counters (ip -s l or ethtool -S eth1). Also different network card drivers report drops and errors differently. I think the truth is only in the source code.
I’m on 5.1.10 right now. The drop rate is very small and infrequent, but still noticeable. I hope there will be some time to do proper tests during upcoming holidays.
I’ve had trouble with dropped packets on eth2 (WAN) just in the last couple of weeks, on an Omnia running 5.3.1 HBT. Unpredictably, my connection changes from stable to “yo-yo mode” or back again. When my uplink is usable, I cannot count on its continued availability; when it is not usable, I have no idea when it will work again.
ISP has been as helpful as possible. They have detected and repaired a fault in the copper connection to the FTTC cabinet (some 400m away) and have also given me a new CPE unit (DSL modem/router), leaving the old one with me as well. I use it in bridge mode, as I don’t want an extra router in cascade with my Turris.
I have seen no difference in behaviour after I do any of the following:
swap CPE units,
use a different RJ45 outlet on the downstream side of the CPE unit,
simply re-seat the RJ45 cable at the WAN port of the Turris,
use a fresh RJ45 cable between CPE unit and Turris,
reboot the Turris.
Besides, when I connect a laptop directly downstream of the CPE unit, in parallel with the Turris, and while the Turris has unusable connectivity, the laptop has a beautifully clean connection.
If this all isn’t enough to localize the trouble to the Turris, what am I missing?
Thanks for these suggestions; I’ve just installed the packages.
As the Turris is operating normally at the moment, I plan to save a reference copy of the output from ethtool -S and wait until the trouble recurs to take another one, and then post both (when possible). Does that seem reasonable?