Pppd appears to become stuck when reconnecting after DSL outage

I noticed that when the ISP did maintenance one night, my router didn’t reconnect. I had a closer look and simulated outages by turning off the DSL modem or removing the telephone wire.

When the DSL modem regains the connection to the ISP (it is PPPoE) I notice that there is some pppd activity in /var/log/messages but it stops at this line:

2017-12-10T15:56:00+01:00 notice pppd[22919]: Connect: pppoe-wan <--> eth1

After that the pppd process is still running but doesn’t appear to be doing anything, there is no route, nothing new appears in the log. Here is the ps output:

# ps w | grep ppp
22919 root      1068 S    /usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan +ipv6 set AUTOIPV6=1 nodefaultroute usepeerdns maxfa

If I use ifdown wan and ifup wan to restart the pppd then it reconnects successfully. I notice that when it connects successfully, the log message after that pppoe-wan <--> eth1 is something about CHAP, for example:

2017-12-10T16:14:15+01:00 notice pppd[27075]: Connect: pppoe-wan <--> eth1
2017-12-10T16:14:15+01:00 info pppd[27075]: CHAP authentication succeeded

I notice other people have had problems with pppd reconnecting, such as this discussion but I don’t feel that is the same bug.

Have other people had problems like this?

Is there a way to add the keepalive option to the pppd config through LUCI or anywhere else? I don’t think that is relevant to this issue, but it would be good to have it too.

As a workaround, I have created a script that is called by cron every 5 minutes, this also tries to make sure StrongSWAN reconnects the IPsec VPN tunnel successfully, I’ve observed some issues it has with reconnection too.

#!/bin/bash

/usr/sbin/ip route | grep ^default > /dev/null

if [ $? -eq 0 ];
then
  /root/ipsec-check
  exit 0
fi

/sbin/ifdown wan

/bin/sleep 10

/sbin/ifup wan

/bin/sleep 30

/usr/sbin/ipsec restart >/dev/null 2>&1