github-starred/zfs-ha

Fork 0

mirror of https://github.com/ewwhite/zfs-ha.git synced 2026-05-15 22:05:04 -06:00

[GH-ISSUE #16] PCS cannot unmount file system during failover event #13

New issue

Closed

opened 2026-05-05 03:28:42 -06:00 by gitea-mirror · 9 comments

gitea-mirror commented

2026-05-05 03:28:42 -06:00

Owner

Originally created by @intentions on GitHub (Oct 19, 2017).
Original GitHub issue: https://github.com/ewwhite/zfs-ha/issues/16

While migrating data onto my new zfs system I attempted a failover to do some work on one of the heads. The process failed, with pcs being unable to unmount the zfs file system. I tried unmounting by hand and was told

root@scifs1701:~] zpool export -f expphyvol umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) cannot unmount '/expphyvol/hallc': umount failed root@scifs1701:~] umount -f /expphyvol/hallc/ umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1))

looking at pcs this was after the IP had been shut down, so I don't know how new writes could be coming to the device.

Originally created by @intentions on GitHub (Oct 19, 2017). Original GitHub issue: https://github.com/ewwhite/zfs-ha/issues/16 While migrating data onto my new zfs system I attempted a failover to do some work on one of the heads. The process failed, with pcs being unable to unmount the zfs file system. I tried unmounting by hand and was told ` root@scifs1701:~] zpool export -f expphyvol umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) cannot unmount '/expphyvol/hallc': umount failed root@scifs1701:~] umount -f /expphyvol/hallc/ umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) ` looking at pcs this was after the IP had been shut down, so I don't know how new writes could be coming to the device.

gitea-mirror closed this issue

2026-05-05 03:28:43 -06:00

gitea-mirror commented

2026-05-05 03:28:46 -06:00

Author

Owner

@ewwhite commented on GitHub (Oct 31, 2017):

Did you check the output of lsof /expphyvol/hallc ?

@ewwhite commented on GitHub (Oct 31, 2017): Did you check the output of `lsof /expphyvol/hallc` ?

gitea-mirror commented

2026-05-05 03:28:47 -06:00

Author

Owner

@intentions commented on GitHub (Oct 31, 2017):

the lsof returns nothing.

I asked about this on the zfs mailer and I got one response of "yea it happens sometimes", so I'm guessing it isn't a problem with the pacemaker setup

@intentions commented on GitHub (Oct 31, 2017): the lsof returns nothing. I asked about this on the zfs mailer and I got one response of "yea it happens sometimes", so I'm guessing it isn't a problem with the pacemaker setup

gitea-mirror commented

2026-05-05 03:28:48 -06:00

Author

Owner

@colttt commented on GitHub (Nov 13, 2017):

Hello,

I've the same issue, stop the nfs-server before you export the zfs pool, and start it before you import it. I was wondering why don't happen this in this how-to

@colttt commented on GitHub (Nov 13, 2017): Hello, I've the same issue, stop the nfs-server before you export the zfs pool, and start it before you import it. I was wondering why don't happen this in this how-to

gitea-mirror commented

2026-05-05 03:28:49 -06:00

Author

Owner

@intentions commented on GitHub (Nov 13, 2017):

Thanks, though the last time I restarted NFS all the clients yelled about stale file handlers and I had to reboot the head anyway.

I'm closing this because it now seems to be more of an issue with ZFS then what PCS is doing.

@intentions commented on GitHub (Nov 13, 2017): Thanks, though the last time I restarted NFS all the clients yelled about stale file handlers and I had to reboot the head anyway. I'm closing this because it now seems to be more of an issue with ZFS then what PCS is doing.

gitea-mirror commented

2026-05-05 03:28:51 -06:00

Author

Owner

@colttt commented on GitHub (Nov 13, 2017):

thats not an issue with ZFS! its an issue with NFS, because they dont stop the TIME_WAITS (it doesn't if the interface is down) and wait ca 2-4minutes and then stop this, you can decrease this parameters, but i don't remeber which paramters.. sorry

@colttt commented on GitHub (Nov 13, 2017): thats not an issue with ZFS! its an issue with NFS, because they dont stop the TIME_WAITS (it doesn't if the interface is down) and wait ca 2-4minutes and then stop this, you can decrease this parameters, but i don't remeber which paramters.. sorry

gitea-mirror commented

2026-05-05 03:28:53 -06:00

Author

Owner

@ewwhite commented on GitHub (Nov 13, 2017):

Are you using NFSv3 or NFSv4? For NFSv3, I find that it's good enough to keep the NFS daemon enabled and running on both hosts. The zpool export handles client notification, unexporting of the NFS share and the re-export all in one action.

@ewwhite commented on GitHub (Nov 13, 2017): Are you using NFSv3 or NFSv4? For NFSv3, I find that it's good enough to keep the NFS daemon enabled and running on both hosts. The zpool export handles client notification, unexporting of the NFS share and the re-export all in one action.

gitea-mirror commented

2026-05-05 03:28:54 -06:00

Author

Owner

@intentions commented on GitHub (Nov 13, 2017):

nfs v3

During my initial testing (10 odd clients) I didn't see any problems, but once the system entered production use (~900 clients) I started seeing this problem.

@intentions commented on GitHub (Nov 13, 2017): nfs v3 During my initial testing (10 odd clients) I didn't see any problems, but once the system entered production use (~900 clients) I started seeing this problem.

gitea-mirror commented

2026-05-05 03:28:55 -06:00

Author

Owner

@colttt commented on GitHub (Nov 14, 2017):

we use nfs v4.2 (tcp) if you use nfs v3 and 10G you have a high risk of dataloss (because its UDP).

@colttt commented on GitHub (Nov 14, 2017): we use nfs v4.2 (tcp) if you use nfs v3 and 10G you have a high risk of dataloss (because its UDP).

gitea-mirror commented

2026-05-05 03:28:56 -06:00

Author

Owner

@intentions commented on GitHub (Nov 14, 2017):

Data is going out over 56G FDR (but the clients are all on 40G QDR), we are using tcp

@intentions commented on GitHub (Nov 14, 2017): Data is going out over 56G FDR (but the clients are all on 40G QDR), we are using tcp

gitea-mirror referenced this issue

2026-05-05 03:31:23 -06:00

[GH-ISSUE #26] Should fence_mpath agent be utilized instead of the fence_scsi agent? #26