[PATCH] New SFP findings about HALNy HL-GSFP WIP

Hi Community,

TL;DR This module was not working right so I tried to improve it. Here is a patch:

cat 788-fix-quirks-for-HALNy-SFP-module.patch
From: "AreYouLoco?" <areyouloco@localhost.tld>
Date: Wed, 27 Dec 2023 17:10:12 +0100
Subject: [PATCH 1/1] net: sfp: add more quirks for HALNy GPON SFP

It seems that RX_LOS signal is indeed inverted. But TX_FAULT purpose
is simply unknown. Add more quirks to fully support that broken module.
And possibly fix 2,5Gbps as the module is capable of it. 

Signed-off-by: AreYouLoco? <areyouloco@localhost.tld>
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -335,11 +335,11 @@
 
 static void sfp_fixup_halny_gsfp(struct sfp *sfp)
 {
-	/* Ignore the TX_FAULT and LOS signals on this module.
-	 * these are possibly used for other purposes on this
-	 * module, e.g. a serial port.
-	 */
-	sfp->state_hw_mask &= ~(SFP_F_TX_FAULT | SFP_F_LOS);
+	/* Ignore the TX_FAULT, invert LOS on this module.
+	 * and fix long startup	 */
+	sfp_fixup_long_startup(sfp);
+	sfp_fixup_ignore_tx_fault(sfp);
+	sfp->state_hw_mask &= ~SFP_F_LOS;
 }
 
 static void sfp_quirk_2500basex(const struct sfp_eeprom_id *id,
@@ -379,6 +379,7 @@
 	}, {
 		.vendor = "HALNy",
 		.part = "HL-GSFP",
+		.modes = sfp_quirk_2500basex,
 		.fixup = sfp_fixup_halny_gsfp,
 	}, {
 		// Huawei MA5671A can operate at 2500base-X, but report 1.2GBd

Longer story: From the begining of use of that module I experienced disconnections from time to time. I blamed netifd for it but in fact netifd bringing the interface down and back up was just after effect of module sending TX_FAULT on pin 2. And time to time the link was not up but I didn’t have a time to look into it until few days ago so I just rebooted the router and it helped everytime.

root@router:~# dmesg -T | grep sfp
[Sat Dec 23 19:13:24 2023] sfp sfp: Host maximum
power 3.0W
[Sat Dec 23 19:13:24 2023] sfp sfp: module HALNy
HL-GSFP          rev V1.0 sn HALN1010493c     dc
20150525
[Sun Dec 24 02:17:35 2023] sfp sfp: module transm
it fault indicated
[Sun Dec 24 02:17:37 2023] sfp sfp: module transm
it fault recovered
[Sun Dec 24 17:46:55 2023] sfp sfp: module transm
it fault indicated
[Sun Dec 24 17:46:56 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 01:04:57 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 01:04:58 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 01:13:53 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 01:14:09 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 01:14:10 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 03:40:52 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 03:40:53 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 06:22:02 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 06:22:03 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 06:22:05 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 08:40:51 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 08:40:52 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 12:23:41 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 12:23:42 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 14:10:49 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 14:10:51 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 16:20:48 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 17:38:10 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 17:38:12 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 21:17:47 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 21:17:49 2023] sfp sfp: module transm
it fault recovered
[Mon Dec 25 22:17:10 2023] sfp sfp: module transm
it fault indicated
[Mon Dec 25 22:17:11 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 00:18:47 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 01:00:15 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 01:00:16 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 01:32:47 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 01:32:48 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 07:41:24 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 07:41:26 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 10:25:04 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 11:01:09 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 11:01:11 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 11:04:44 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 11:04:45 2023] sfp sfp: module transm
it fault recovered
[Tue Dec 26 12:42:48 2023] sfp sfp: module transm
it fault indicated
[Tue Dec 26 12:42:50 2023] sfp sfp: module transm
it fault recovered
[Wed Dec 27 00:06:19 2023] sfp sfp: module transm
it fault indicated
[Wed Dec 27 02:07:07 2023] sfp sfp: module transm
it fault indicated
[Wed Dec 27 02:07:08 2023] sfp sfp: module transm
it fault recovered
[Wed Dec 27 09:24:38 2023] sfp sfp: module transm
it fault indicated
[Wed Dec 27 09:24:39 2023] sfp sfp: module transm
it fault recovered

So now its looking like this in the log with my script running in the background. And I dont have to reboot anymore. And I get more or less constant connectivity. Bu we can do better than that so here how this patch came to life.

Continuing. I was rebooting the router quite often and I didn’t experience it that much before but now its more or less configured and is running constantly without counting power outages. So during the holidays I looked at logs and clearly after 5 tx_faults indicated the router is sending tx_disable to the module and thats when I lost link before. So I wrote a script that is monitoring /sys/kernel/debug/sfp/state and when there is 1 left at the count down it puts the interface down, the link down and then the other way around. And like that I dont have to reboot anymore. Because the counter resets back to 5.

What @backon figured out here when we add 45 seconds delay to the U-Boot then we also dont have to soft reboot after power outage and wait for the SFP to boot correctly but there is code already for it in sfp.c so I just used that. What @rmk wrote for BAD_GPON

Anyway to not to prolonge @rmk could you take a look at the patch based on your work and give a sign if its correct syntax in C. Also @mbehun if @rmk gives green light could you push it in the Turris Team and the easiest way would be to make an experimental branch lets call it crashlab-sfp based on hbs but with the above patch. So I can test it. I tried to build my own medkit to test that in advance but I simply failed and the Turris docs are not very friendly.

Also I don’t exclude hardware failure from reasoning of that SFP module and maybe indeed the laser is broken. But I dont think so as it was like this from the begining. But to rule that out I contacted my ISP and after New Year’s they gonna deliver me a brand new unit. If it stays the same as with the old one then definitely this module has broken tx_fault implementation. And its not inverted its simply something else that we dont know yet. And above patch should fix as much as possible. But until then @mbehun if you please already make a test branch so it builds slowly in the mean time.

Cheers!
AreYouLoco?

2 Likes

Never found out how to build a whole medkit, but It doesn’t seem to be needed here.
You can just build the kernel and simply install the ipk to test your changes:

opkg install --force-reinstall kernel_(...).ipk
1 Like

@xDSx How do I build a kernel with this patch? I saw in other topics related to SFP that you had some experience with that.

Edit: ok so I managed to build just a kernel package. Now wish me luck if it boots. Schnapps first and will see.

So it actually booted. And now SFP is at 2,5Gbps so the patch is there. Will see how it goes with other fixes…

root@router:~# ethtool eth2
Settings for eth2:
        Supported ports: [ FIBRE ]
        Supported link modes:   2500baseX/Full
                                1000baseX/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  2500baseX/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 2500Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Link detected: yes

Edit: And I do have an internet connection.

1 Like

So I just tried and its still neccesary to set:

fw_setenv bootdelay 45

So the fix for bad GPON and waiting for it 60s doesnt work or I did something wrong in code. Even bootdelay 40 was too short for cold boot

Edit: I will try again with:

sfp->module_t_wait = msecs_to_jiffies(60 * 1000);

Instead of fix_long_startup function

Edit 2: It didn’t work :frowning: