[GH-ISSUE #38] Split Brain when logged in user CWDed into ZFS volume #37

Open
opened 2026-05-05 03:32:29 -06:00 by gitea-mirror · 5 comments
Owner

Originally created by @rbicelli on GitHub (Apr 17, 2021).
Original GitHub issue: https://github.com/ewwhite/zfs-ha/issues/38

Hi,
Consider this scenario:

  • a user (let's say an admin user) is logged in and working (CWDed on a ZFS volume served by the cluster
  • a failover event is triggered (i.e secondary node goes off standby mode and tries to take over its resources)

I observed that a fence action is triggered.

The worst thing happened is that the fence action don't work as expected: the volume stays mounted on both nodes, causing ZFS errors (and file corruption). I assume SCSI reservations are somehow not honored.

I triple checked the configuration and looks like ok.

Since I'm planning to add sanoid/syncoid for snapshot/replica send, I would like to avoid a split brain in case of failover in the middle of a process on node using the filesystem.

I think this behaviour it reproducible with ease.

Originally created by @rbicelli on GitHub (Apr 17, 2021). Original GitHub issue: https://github.com/ewwhite/zfs-ha/issues/38 Hi, Consider this scenario: - a user (let's say an admin user) is logged in and working (CWDed on a ZFS volume served by the cluster - a failover event is triggered (i.e secondary node goes off standby mode and tries to take over its resources) I observed that a fence action is triggered. The worst thing happened is that the fence action don't work as expected: the volume stays mounted on both nodes, causing ZFS errors (and file corruption). I assume SCSI reservations are somehow not honored. I triple checked the configuration and looks like ok. Since I'm planning to add sanoid/syncoid for snapshot/replica send, I would like to avoid a split brain in case of failover in the middle of a process on node using the filesystem. I think this behaviour it reproducible with ease.
Author
Owner

@rbicelli commented on GitHub (Apr 17, 2021):

Relevant portion of log is (sorrry for cut but i was into a split-screened shell):

Apr 17 16:51:25 zsan02 crmd[3513]:  notice: Result of stop operation for vol1-ip on zsan02: 0 (ok)                                                                                    
Apr 17 16:51:25 zsan02 lrmd[3510]:  notice: vol1_stop_0:34973:stderr [ /usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected ]                             
Apr 17 16:51:25 zsan02 lrmd[3510]:  notice: vol1_stop_0:34973:stderr [ umount: /vol1: target is busy. ]                                                                               
Apr 17 16:51:25 zsan02 lrmd[3510]:  notice: vol1_stop_0:34973:stderr [         (In some cases useful info about processes that use ]                                                  
Apr 17 16:51:25 zsan02 lrmd[3510]:  notice: vol1_stop_0:34973:stderr [          the device is found by lsof(8) or fuser(1)) ]                                                         
Apr 17 16:51:25 zsan02 lrmd[3510]:  notice: vol1_stop_0:34973:stderr [ cannot unmount '/vol1': umount failed ]                                                                        
Apr 17 16:51:25 zsan02 lrmd[3510]:  notice: vol1_stop_0:34973:stderr [ /usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected ]                             
Apr 17 16:51:25 zsan02 crmd[3513]:  notice: Result of stop operation for vol1 on zsan02: 1 (unknown error)                                                                            
Apr 17 16:51:25 zsan02 crmd[3513]:  notice: zsan02-vol1_stop_0:63 [ /usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected\numount: /vol1: target is busy.\ 
n        (In some cases useful info about processes that use\n         the device is found by lsof(8) or fuser(1))\ncannot unmount '/vol1': umount failed\n/usr/lib/ocf/resource.d/he 
artbeat/ZFS: line 35: [: : integer expression expected\n ]                                                                                                                            
Apr 17 16:51:25 zsan02 stonith-ng[3509]:  notice: fence-vol1 can fence (reboot) zsan02: static-list                                                                                   
Apr 17 16:51:25 zsan02 stonith-ng[3509]:  notice: fence-vol2 can fence (reboot) zsan02: static-list                                                                                   
Apr 17 16:51:25 zsan02 stonith-ng[3509]:  notice: fence-vol3 can fence (reboot) zsan02: static-list                                                                                   
Apr 17 16:51:26 zsan02 stonith-ng[3509]:  notice: Operation 'reboot' targeting zsan02 on zsan01 for crmd.3980@zsan01.36ba1933: OK

looks like when something is using the filesystem locally the resource agent is unable to stop the fs, then crashes and triggers a fence event. Fencing that doesn't happen (I've configured idrac but doesn't power cycle the node if I trigger a fence). But this is another story.

Same behaviour occours with a zfs send in progress.

In order to mitigate this issue I wrote an helper script, that i put in /usr/lib/ocf/lib/heartbeat/helpers/zfs-helper:

#!/bin/bash
# Pre-Export script for ZFS Pool
# Check if there is some process using files in	Zpool and kill them
# Requires lsof, ps, awk, sed

zpool_pre_export () {

        # Forcibly Terminate all pids using zpool
        ZPOOL=$1
        #Exits gracefully anyway, for now
        RET=0
	
	lsof /$ZPOOL{*,/*} | awk '{print ($2)}' | sed -e "1d" | \
        while read PID
        do
          	echo "Terminating PID $PID"
                kill -9 $PID
        done
	
	# Check if some blocking ZFS operations are running, such 
        # zfs send ...
        ps aux | grep $ZPOOL | awk '{print ($2)}' | \
        while read PID
        do
          	echo "Terminating PID $PID"
                kill -9 $PID
        done

	exit $RET
}

case $1 in

	pre-export)
        zpool_pre_export $2
        ;;
esac
<!-- gh-comment-id:821859377 --> @rbicelli commented on GitHub (Apr 17, 2021): Relevant portion of log is (sorrry for cut but i was into a split-screened shell): ``` Apr 17 16:51:25 zsan02 crmd[3513]: notice: Result of stop operation for vol1-ip on zsan02: 0 (ok) Apr 17 16:51:25 zsan02 lrmd[3510]: notice: vol1_stop_0:34973:stderr [ /usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected ] Apr 17 16:51:25 zsan02 lrmd[3510]: notice: vol1_stop_0:34973:stderr [ umount: /vol1: target is busy. ] Apr 17 16:51:25 zsan02 lrmd[3510]: notice: vol1_stop_0:34973:stderr [ (In some cases useful info about processes that use ] Apr 17 16:51:25 zsan02 lrmd[3510]: notice: vol1_stop_0:34973:stderr [ the device is found by lsof(8) or fuser(1)) ] Apr 17 16:51:25 zsan02 lrmd[3510]: notice: vol1_stop_0:34973:stderr [ cannot unmount '/vol1': umount failed ] Apr 17 16:51:25 zsan02 lrmd[3510]: notice: vol1_stop_0:34973:stderr [ /usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected ] Apr 17 16:51:25 zsan02 crmd[3513]: notice: Result of stop operation for vol1 on zsan02: 1 (unknown error) Apr 17 16:51:25 zsan02 crmd[3513]: notice: zsan02-vol1_stop_0:63 [ /usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected\numount: /vol1: target is busy.\ n (In some cases useful info about processes that use\n the device is found by lsof(8) or fuser(1))\ncannot unmount '/vol1': umount failed\n/usr/lib/ocf/resource.d/he artbeat/ZFS: line 35: [: : integer expression expected\n ] Apr 17 16:51:25 zsan02 stonith-ng[3509]: notice: fence-vol1 can fence (reboot) zsan02: static-list Apr 17 16:51:25 zsan02 stonith-ng[3509]: notice: fence-vol2 can fence (reboot) zsan02: static-list Apr 17 16:51:25 zsan02 stonith-ng[3509]: notice: fence-vol3 can fence (reboot) zsan02: static-list Apr 17 16:51:26 zsan02 stonith-ng[3509]: notice: Operation 'reboot' targeting zsan02 on zsan01 for crmd.3980@zsan01.36ba1933: OK ``` looks like when something is using the filesystem locally the resource agent is unable to stop the fs, then crashes and triggers a fence event. Fencing that doesn't happen (I've configured idrac but doesn't power cycle the node if I trigger a fence). But this is another story. Same behaviour occours with a zfs send in progress. In order to mitigate this issue I wrote an helper script, that i put in /usr/lib/ocf/lib/heartbeat/helpers/zfs-helper: ```bash #!/bin/bash # Pre-Export script for ZFS Pool # Check if there is some process using files in Zpool and kill them # Requires lsof, ps, awk, sed zpool_pre_export () { # Forcibly Terminate all pids using zpool ZPOOL=$1 #Exits gracefully anyway, for now RET=0 lsof /$ZPOOL{*,/*} | awk '{print ($2)}' | sed -e "1d" | \ while read PID do echo "Terminating PID $PID" kill -9 $PID done # Check if some blocking ZFS operations are running, such # zfs send ... ps aux | grep $ZPOOL | awk '{print ($2)}' | \ while read PID do echo "Terminating PID $PID" kill -9 $PID done exit $RET } case $1 in pre-export) zpool_pre_export $2 ;; esac ```
Author
Owner

@intentions commented on GitHub (Apr 17, 2021):

Wouldn't using the multihost protection prevent the second host from mounting the pool?

<!-- gh-comment-id:821861232 --> @intentions commented on GitHub (Apr 17, 2021): Wouldn't using the multihost protection prevent the second host from mounting the pool?
Author
Owner

@rbicelli commented on GitHub (Apr 17, 2021):

Wouldn't using the multihost protection prevent the second host from mounting the pool?

Wasn't aware of this feature. I've enabled it and testing it.

<!-- gh-comment-id:821870559 --> @rbicelli commented on GitHub (Apr 17, 2021): > Wouldn't using the multihost protection prevent the second host from mounting the pool? Wasn't aware of this feature. I've enabled it and testing it.
Author
Owner

@Nooby1 commented on GitHub (Nov 1, 2021):

I have put it in /usr/lib/ocf/lib/heartbeat/zfs-helper.sh, as there is no helper directory in RHEL8 and there are other scripts in this directory.

Does anything else have to be done for this on RHEL8?

<!-- gh-comment-id:956157642 --> @Nooby1 commented on GitHub (Nov 1, 2021): I have put it in /usr/lib/ocf/lib/heartbeat/zfs-helper.sh, as there is no helper directory in RHEL8 and there are other scripts in this directory. Does anything else have to be done for this on RHEL8?
Author
Owner

@rbicelli commented on GitHub (Nov 10, 2021):

I don't remember since months are passed, but is possible that I needed to create the required directory.

<!-- gh-comment-id:965667841 --> @rbicelli commented on GitHub (Nov 10, 2021): I don't remember since months are passed, but is possible that I needed to create the required directory.
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/zfs-ha#37
No description provided.