[GH-ISSUE #16] PCS cannot unmount file system during failover event #13

Closed
opened 2026-05-05 03:28:42 -06:00 by gitea-mirror · 9 comments
Owner

Originally created by @intentions on GitHub (Oct 19, 2017).
Original GitHub issue: https://github.com/ewwhite/zfs-ha/issues/16

While migrating data onto my new zfs system I attempted a failover to do some work on one of the heads. The process failed, with pcs being unable to unmount the zfs file system. I tried unmounting by hand and was told

root@scifs1701:~] zpool export -f expphyvol umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) cannot unmount '/expphyvol/hallc': umount failed root@scifs1701:~] umount -f /expphyvol/hallc/ umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1))

looking at pcs this was after the IP had been shut down, so I don't know how new writes could be coming to the device.

Originally created by @intentions on GitHub (Oct 19, 2017). Original GitHub issue: https://github.com/ewwhite/zfs-ha/issues/16 While migrating data onto my new zfs system I attempted a failover to do some work on one of the heads. The process failed, with pcs being unable to unmount the zfs file system. I tried unmounting by hand and was told ` root@scifs1701:~] zpool export -f expphyvol umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) cannot unmount '/expphyvol/hallc': umount failed root@scifs1701:~] umount -f /expphyvol/hallc/ umount: /expphyvol/hallc: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) ` looking at pcs this was after the IP had been shut down, so I don't know how new writes could be coming to the device.
Author
Owner

@ewwhite commented on GitHub (Oct 31, 2017):

Did you check the output of lsof /expphyvol/hallc ?

<!-- gh-comment-id:340623080 --> @ewwhite commented on GitHub (Oct 31, 2017): Did you check the output of `lsof /expphyvol/hallc` ?
Author
Owner

@intentions commented on GitHub (Oct 31, 2017):

the lsof returns nothing.

I asked about this on the zfs mailer and I got one response of "yea it happens sometimes", so I'm guessing it isn't a problem with the pacemaker setup

<!-- gh-comment-id:340762443 --> @intentions commented on GitHub (Oct 31, 2017): the lsof returns nothing. I asked about this on the zfs mailer and I got one response of "yea it happens sometimes", so I'm guessing it isn't a problem with the pacemaker setup
Author
Owner

@colttt commented on GitHub (Nov 13, 2017):

Hello,

I've the same issue, stop the nfs-server before you export the zfs pool, and start it before you import it. I was wondering why don't happen this in this how-to

<!-- gh-comment-id:343949586 --> @colttt commented on GitHub (Nov 13, 2017): Hello, I've the same issue, stop the nfs-server before you export the zfs pool, and start it before you import it. I was wondering why don't happen this in this how-to
Author
Owner

@intentions commented on GitHub (Nov 13, 2017):

Thanks, though the last time I restarted NFS all the clients yelled about stale file handlers and I had to reboot the head anyway.

I'm closing this because it now seems to be more of an issue with ZFS then what PCS is doing.

<!-- gh-comment-id:343950458 --> @intentions commented on GitHub (Nov 13, 2017): Thanks, though the last time I restarted NFS all the clients yelled about stale file handlers and I had to reboot the head anyway. I'm closing this because it now seems to be more of an issue with ZFS then what PCS is doing.
Author
Owner

@colttt commented on GitHub (Nov 13, 2017):

thats not an issue with ZFS! its an issue with NFS, because they dont stop the TIME_WAITS (it doesn't if the interface is down) and wait ca 2-4minutes and then stop this, you can decrease this parameters, but i don't remeber which paramters.. sorry

<!-- gh-comment-id:343956699 --> @colttt commented on GitHub (Nov 13, 2017): thats not an issue with ZFS! its an issue with NFS, because they dont stop the TIME_WAITS (it doesn't if the interface is down) and wait ca 2-4minutes and then stop this, you can decrease this parameters, but i don't remeber which paramters.. sorry
Author
Owner

@ewwhite commented on GitHub (Nov 13, 2017):

Are you using NFSv3 or NFSv4? For NFSv3, I find that it's good enough to keep the NFS daemon enabled and running on both hosts. The zpool export handles client notification, unexporting of the NFS share and the re-export all in one action.

<!-- gh-comment-id:343958759 --> @ewwhite commented on GitHub (Nov 13, 2017): Are you using NFSv3 or NFSv4? For NFSv3, I find that it's good enough to keep the NFS daemon enabled and running on both hosts. The zpool export handles client notification, unexporting of the NFS share and the re-export all in one action.
Author
Owner

@intentions commented on GitHub (Nov 13, 2017):

nfs v3

During my initial testing (10 odd clients) I didn't see any problems, but once the system entered production use (~900 clients) I started seeing this problem.

<!-- gh-comment-id:343959693 --> @intentions commented on GitHub (Nov 13, 2017): nfs v3 During my initial testing (10 odd clients) I didn't see any problems, but once the system entered production use (~900 clients) I started seeing this problem.
Author
Owner

@colttt commented on GitHub (Nov 14, 2017):

we use nfs v4.2 (tcp) if you use nfs v3 and 10G you have a high risk of dataloss (because its UDP).

<!-- gh-comment-id:344176313 --> @colttt commented on GitHub (Nov 14, 2017): we use nfs v4.2 (tcp) if you use nfs v3 and 10G you have a high risk of dataloss (because its UDP).
Author
Owner

@intentions commented on GitHub (Nov 14, 2017):

Data is going out over 56G FDR (but the clients are all on 40G QDR), we are using tcp

<!-- gh-comment-id:344271136 --> @intentions commented on GitHub (Nov 14, 2017): Data is going out over 56G FDR (but the clients are all on 40G QDR), we are using tcp
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/zfs-ha#13
No description provided.