mirror of
https://github.com/debauchee/barrier.git
synced 2026-05-15 14:16:02 -06:00
[GH-ISSUE #522] WARNING: error in socket multiplexer: Unknown error causes Barrier server to hang #406
Labels
No labels
HiDPI
bounty
bsd/freebsd
bsd/openbsd
bug
bug
build-infra
cantfix
critical
doc
duplicate
enhancement
fix-available
from git
from release
good first issue
help wanted
installer/package
invalid
linux
macOS
meta
needs testing
pull-request
query
question
regression
regression
v2.4.0
windows
wontfix
work-in-progress
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/barrier#406
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @stiggy87 on GitHub (Dec 9, 2019).
Original GitHub issue: https://github.com/debauchee/barrier/issues/522
Operating Systems
Server: Win10 Ver 1803 (OS Build 17134.407)
Client: Win10 Ver 1803 (OS Build 17134.1130)
Barrier Version
2.3.2
Steps to reproduce bug
Running the server version at random time I get a ton of this message:
WARNING: error in socket multiplexer: Unknown error
Over 10 of this message is done a second and it causes the server to hang which forces me to stop the server for a few seconds and restart to make it work again.
Other info
I've had to setup/configure my system through McAfee Firewall. I got it into the exception list.
I am currently running it with DEBUG2 to try and pinpoint to exactly when it happens what was the last thing to happen.
@stiggy87 commented on GitHub (Dec 9, 2019):
Just ran into this and saw this before it hung the server:
Noticed that the client went unresponsive and came back a lot (could be because of the network). But it looks like it was trying to open simultaneous sockets to the client.
@ackstorm23 commented on GitHub (Dec 21, 2019):
I've been running into this too. The server works for about 60 seconds, then the server-side (windows 10) hangs and connectivity is lost.
Temporarily resolved until I manually kill it and restart barrier server on windows 10, but it only comes back again 60 seconds later.
@IPWright83 commented on GitHub (Apr 21, 2020):
I'm seeing this regularly now too with a Windows 10 -> Ubuntu 18 setup. These errors are always on the Windows side (server) while the client reports the connection failed/timed out.
@github-actions[bot] commented on GitHub (Oct 21, 2020):
Is this issue still an issue for you? Please do comment and let us know! Alternatively, you may close the issue yourself if it is no longer an problem
@drysart commented on GitHub (Dec 20, 2020):
Commenting on this because this issue occurred with me today; and I isolated a scenario that could potentially assist in tracking down a cause. I have a network with two clients that (attempt to) connect to the Barrier server. I've had this setup for a while, and it was originally configured with the default settings of using HTTPS encryption, as part of diagnosing a performance issue some time back I disabled HTTPS encryption and just never got around to re-enabling it.
The server runs on Windows 10 2004.20277.1; and is configured not to use HTTPS encryption.
The first client is MacOS Big Sur 11.1. It is also configured not to use HTTPS encryption. This client was successfully connecting to the server -- until the server got stuck in a multiplexer error loop within a few minutes of being started and stopped accepting new connections.
The second client is a Windows 10 machine that, due to it not being part of the network for an extended period of time, was still configured to use HTTPS encryption. When this machine was re-added to the network it was never needed to return to the barrier group, and so the existing configuration was left even though it was non-functional because the need for it to be functioning wasn't there. The end result of this is that it was still configured to try to connect to the server; but due to the HTTPS setting mismatch, it failed to successfully connect. But it was in a retry loop and would attempt to reconnect several times a minute.
This misconfigured client's stream of failed connection attempts to the server is what was causing the server to fail. As soon as this machine was taken off the network again, the server was once again reliable with no other configuration changes. It would seem that there is a defect in the server's socket handling such that a failed connection and subsequently dead socket has a reasonably high chance of not getting cleaned out fully, thus getting the multiplexer stuck in an infinite error loop due that poisonous socket being in the set of sockets being polled and causing all polling to fail until the server is shut down.
Ideally the proper solution is to identify whatever defect is allowing those poisonous sockets from the failed connection attempts to remain in the socket collection; but short of that I'd imagine this could be worked-around by enhancing the error handling of the pollSocket call to iterate through all the individual sockets in the socket array to try to determine which one might be poisonous, and then dump that socket. That might be tricky to do, and so a secondary alternate solution could be to add circuit breaker logic to the multiplexer error itself, and if its frequency goes above a certain threshold have the server dump all of its connections.
@p12tic commented on GitHub (Jan 10, 2021):
@drysart That's a great investigation, thanks a lot. This will make fixing this issue much easier.
@drysart commented on GitHub (Jan 10, 2021):
I'll also add that on further investigation I determined it's not specifically due to the HTTPS/HTTP mismatch. I updated the second client so its HTTP setting matched the server, and then tried to see if simply having the server not have a configuration matching client's name was enough to keep the client out of the group. It showed the same behavior: the server rejected the client because of the name mismatch, and the client proceeded on a loop to retry the connection to the server. Eventually the server's multiplexer hung in the same error loop. The only way I found the avoid hanging the server was to not have the unwelcome client trying to connect at all.
@mckernanin commented on GitHub (Oct 13, 2021):
I'm experiencing this as well, I also have a synergy license and I get the same error from Synergy (though the desktop app hard crashes lol).
Server: latest windows 10
Client: latest macOS
I have a ticket in to synergy support, I'll post the outcome here.