mirror of
https://github.com/fatedier/frp.git
synced 2026-05-15 08:05:49 -06:00
[GH-ISSUE #1367] 心跳检测在节点异常后下线,然后节点恢复后无法恢复 #1086
Labels
No labels
In Progress
WIP
WaitingForInfo
bug
doc
duplicate
easy
enhancement
future
help wanted
invalid
lifecycle/stale
need-issue-template
need-usage-help
no plan
proposal
pull-request
question
todo
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/frp#1086
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @luyaotang on GitHub (Aug 9, 2019).
Original GitHub issue: https://github.com/fatedier/frp/issues/1367
描述:
在windows7中使用frpc_386的版本。配置了心跳检测,采用的tcp检测方式。在后端节点有问题时,能正常检测到并下线它。但几了几个小时后,后端节点恢复了,但frpc没有再检测到状态,并上线它。
环境:
使用的是:0.22.0
go : go1.11.5 linux/amd64
rrpc的配置:
[common]
server_addr = 192.168.21.254
server_port = 7100
log_level = info
log_max_days = 7
protocol = tcp
token = xxxxx
login_fail_exit = false
[zzz]
type = tcp
local_ip = 127.0.0.1
local_port = 3389
health_check_type = tcp
health_check_interval_s = 10
health_check_max_failed = 2
health_check_timeout_s = 3
分析:
1)在04:30后端节点检测到下线。然后根据检测时间间隔,会每几秒输出如下的检测状态。
2019/08/09 04:30:47 [W] [011f4ad2ffa5591f] [9001E855D9DE93B5DA100230BDEE1F61.zzz] do one health check failed: dial tcp xx.xx.xxx.xxx:xxxx: connectex: No connection could be made because the target machine actively refused it.
2)但在08:54:59之后就再也不输出心跳检测情况了,感觉检测线程退出了一样。正常来说他应该会检测到后端节点服务恢复,并change status。
2019/08/09 08:54:59 [W] [011f4ad2ffa5591f] [9001E855D9DE93B5DA100230BDEE1F61.zzz] do one health check failed: dial tcp xx.xx.xxx.xxx:xxxx: connectex: No connection could be made because the target machine actively refused it.
@fatedier commented on GitHub (Aug 9, 2019):
可复现?
当时的网络连接情况有吗?当时异常状态时的连续日志?
@luyaotang commented on GitHub (Aug 9, 2019):
应该是不可复现的,因为之前也有这种情况,后端节点恢复后,会输出,类似如下:
2019/08/08 12:57:23 [I] [011f4ad2ffa5591f] [9001E855D9DE93B5DA100230BDEE1F61.zzz] health check status change to success
2019/08/08 12:57:23 [I] [011f4ad2ffa5591f] [9001E855D9DE93B5DA100230BDEE1F61.zzz] health check success
2019/08/08 12:57:24 [I] [9001E855D9DE93B5DA100230BDEE1F61.zzz] start proxy success.
但是这次就没有。感觉检测线程已经不运行了,就更无法谈起说会重新上线。
@luyaotang commented on GitHub (Aug 9, 2019):
1)可以确定,程序是运行的。增加的每30分钟的系统信息输出是有日志的。
2)重启服务正常了。
3)怀疑是这部分不知道什么原因退出了:
func (monitor *HealthCheckMonitor) checkWorker() {
for {
ctx, cancel := context.WithDeadline(monitor.ctx, time.Now().Add(monitor.timeout))
err := monitor.doCheck(ctx)
@luyaotang commented on GitHub (Aug 9, 2019):
日志太大,分享在此https://c-t.work/s/71d3c3b652d948
@fatedier commented on GitHub (Aug 9, 2019):
确实有一定的可能性,我修复一下。