[GH-ISSUE #40] Not an issue, just a question. #39

Open
opened 2026-05-05 03:32:29 -06:00 by gitea-mirror · 7 comments
Owner

Originally created by @JaredT6694 on GitHub (Jun 23, 2024).
Original GitHub issue: https://github.com/ewwhite/zfs-ha/issues/40

I'm curious if you (personally or have heard) of anyone using method on a debian based system. I'm thinking of trying this with proxmox and if there were any tips or tricks or changes that might be needed.

-jt

Originally created by @JaredT6694 on GitHub (Jun 23, 2024). Original GitHub issue: https://github.com/ewwhite/zfs-ha/issues/40 I'm curious if you (personally or have heard) of anyone using method on a debian based system. I'm thinking of trying this with proxmox and if there were any tips or tricks or changes that might be needed. -jt
Author
Owner

@mpeterson commented on GitHub (Jan 27, 2025):

@JaredT6694 can you post your findings if you go this route? Thanks!

<!-- gh-comment-id:2616836803 --> @mpeterson commented on GitHub (Jan 27, 2025): @JaredT6694 can you post your findings if you go this route? Thanks!
Author
Owner

@mikesoule commented on GitHub (Jan 27, 2025):

I haven't but even though I prefer RHEL-based distros I don't see anything used in the OS/software stack that isn't equally supported on Debian. In fact, RHEL 9 has drifted enough from what's used in the docs (RHEL 7) that the gaps you'll have to fill in on your own with Debian may be no larger than the ones you'd encounter with something like Rocky 9. If you're much more comfortable with Debian and have tools and systems in place to support Debian, I think you're better off using that.

<!-- gh-comment-id:2616860606 --> @mikesoule commented on GitHub (Jan 27, 2025): I haven't but even though I prefer RHEL-based distros I don't see anything used in the OS/software stack that isn't equally supported on Debian. In fact, RHEL 9 has drifted enough from what's used in the docs (RHEL 7) that the gaps you'll have to fill in on your own with Debian may be no larger than the ones you'd encounter with something like Rocky 9. If you're much more comfortable with Debian and have tools and systems in place to support Debian, I think you're better off using that.
Author
Owner

@ewwhite commented on GitHub (Jan 27, 2025):

I may take time to update this to reflect modern Rocky EL9, but also some commercial options that make more sense now.

<!-- gh-comment-id:2616866357 --> @ewwhite commented on GitHub (Jan 27, 2025): I may take time to update this to reflect modern Rocky EL9, but also some commercial options that make more sense now.
Author
Owner

@mpeterson commented on GitHub (Feb 4, 2025):

@ewwhite I'm currently doing a build based on Almalinux 9.5 (EL9), I will provide a gist with it so you can update your docs. Or if you guide me how would you like to format it, then I can submit a PR.

I wanted to go the direction of setting it up on Proxmox, but then later decided against it, as one of the core values I love about Proxmox is that I can provision a server with tofu and configure it with ansible in a blink, and be up and running really fast. As such I treat my Proxmox host as a cattle and replace my nodes whenever I have a need for it really easily.

For my NAS I want to have a bit more stability than that. So what I'm doing is to run Almalinux VMs on Proxmox with passthrough of my SAS controllers.

<!-- gh-comment-id:2634533423 --> @mpeterson commented on GitHub (Feb 4, 2025): @ewwhite I'm currently doing a build based on Almalinux 9.5 (EL9), I will provide a gist with it so you can update your docs. Or if you guide me how would you like to format it, then I can submit a PR. I wanted to go the direction of setting it up on Proxmox, but then later decided against it, as one of the core values I love about Proxmox is that I can provision a server with tofu and configure it with ansible in a blink, and be up and running really fast. As such I treat my Proxmox host as a cattle and replace my nodes whenever I have a need for it really easily. For my NAS I want to have a bit more stability than that. So what I'm doing is to run Almalinux VMs on Proxmox with passthrough of my SAS controllers.
Author
Owner

@ewwhite commented on GitHub (Feb 4, 2025):

Very interesting. I'd love to see what you develop!

<!-- gh-comment-id:2635188021 --> @ewwhite commented on GitHub (Feb 4, 2025): Very interesting. I'd love to see what you develop!
Author
Owner

@mpeterson commented on GitHub (Feb 15, 2025):

@ewwhite here it is raw from my notes, I hope you can use it to update your docs!

##
## pre-requisites for all hosts:
##
# - Attach two network interfaces: data plane and cluster mgmt plane
# - Recommended in two different networks/VLANs, with the cluster mgmt plane isolated as much as possible
#    In this case: data plane (10.5.0.0/24) and cluster mgmt plane (10.5.25.0/24)
# - Disable secure boot in the BIOS, otherwise the ZFS module won't work

# In one of the hosts: prepare SAS disks in 4kn
sudo sg_format --size=4096 --format /dev/sd[b-f]

##
## configuration:
##

# Turn off firewall 
systemctl stop firewalld
systemctl disable firewalld

# OR allow HA in firewall
firewall-cmd --permanent --add-service=high-availability
firewall-cmd --reload

# In all hosts 
dnf install -y epel-release
dnf install -y htop nvim wget


dnf install -y kernel-devel

# to make sure we are running the latest kernel so dkms will be built correctly
reboot  

# install zfs repo for el9
dnf install -y https://github.com/zfsonlinux/zfsonlinux.github.com/raw/refs/heads/master/epel/zfs-release-2-3.el9.noarch.rpm

# in particular I was interested in zfs 2.3 and not zfs 2.1
dnf config-manager --enable zfs-testing
dnf install -y zfs

dnf config-manager --set-enabled highavailability

dnf install -y device-mapper-multipath

# I want more stability around identifiers so I'd rather use WWID,
# thus the `--user_friendly_names n`
mpathconf --enable --user_friendly_names n
systemctl start multipathd
systemctl enable multipathd

# From here it is JUST once in one of the nodes, to create the ZFS pool
zpool create tank -o ashift=12 -o autoexpand=on -o autoreplace=on -o cachefile=none raidz2 35000cca28405aaa4 35000cca2a948ddd0 35000cca2ab79f580 35000cca2c2dd2454 35000cca2c2eed534

# protect ZFS with multihost
zgenhostid    # this is the only command that needs to be run on each node before continuing
zpool set multihost=on tank

zfs set atime=off tank
zfs set relatime=off tank
zfs set acltype=posixacl tank
zfs set xattr=sa tank

zfs create tank/media
zfs create tank/media/videos
zfs create tank/media/photos
zfs set recordsize=1M tank/media
zfs create tank/downloads
zfs create tank/downloads/torrents
zfs set recordsize=16KB tank/downloads/torrents

# Until here it is JUST once in one of the nodes, to create the ZFS pool

## Starting to configure the cluster 

# BEGIN on all nodes
dnf install -y pcs fence-agents-all
systemctl enable pcsd
systemctl enable corosync
systemctl enable pacemaker
systemctl start pcsd

cd /usr/lib/ocf/resource.d/heartbeat/
wget https://github.com/skiselkov/stmf-ha/raw/master/heartbeat/ZFS
chmod +x ZFS

passwd hacluster
# END on all nodes

# Choose one node and continue
pcs host auth nas-01 nas-02
pcs cluster setup zfs-cluster --start --enable nas-01 addr=10.5.25.11 nas-02 addr=10.5.25.12

pcs property set no-quorum-policy=ignore

pcs stonith create fence-tank fence_scsi pcmk_monitor_action="metadata" pcmk_host_list="nas-01,nas-02" devices="/dev/mapper/35000cca28405aaa4,/dev/mapper/35000cca2a948ddd0,/dev/mapper/35000cca2ab79f580,/dev/mapper/35000cca2c2dd2454,/dev/mapper/35000cca2c2eed534" meta provides=unfencing

# the `--future` is to allow the new `group` option instead of the old  `--group`
# `--future` shall be removed in newer versions once supported
pcs resource create --future tank 'ocf:heartbeat:ZFS' pool="tank" importargs="-d /dev/mapper/" op start timeout="90" op stop timeout="90" group group-nas

pcs resource create --future nas-ip 'ocf:heartbeat:IPaddr2' ip=10.5.0.10 cidr_netmask=24 group group-nas

pcs resource defaults update resource-stickiness=100

And the result:

# pcs status
Cluster name: zfs-cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: nas-01 (version 2.1.8-3.el9-3980678f0) - partition with quorum
  * Last updated: Sat Feb 15 20:38:31 2025 on nas-01
  * Last change:  Sat Feb 15 20:23:22 2025 by root via root on nas-01
  * 2 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ nas-01 nas-02 ]

Full List of Resources:
  * fence-tank       (stonith:fence_scsi):    Started nas-01
  * Resource Group: group-nas:
    * tank   (ocf:heartbeat:ZFS):     Started nas-01
    * nas-ip    (ocf:heartbeat:IPaddr2):         Started nas-01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

and checking the IP:

# ping nas
PING nas (10.5.0.10) 56(84) bytes of data.
64 bytes from 10.5.0.10: icmp_seq=1 ttl=64 time=0.350 ms
64 bytes from 10.5.0.10: icmp_seq=2 ttl=64 time=0.386 ms
64 bytes from 10.5.0.10: icmp_seq=3 ttl=64 time=0.460 ms
^C
--- nas ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2007ms
rtt min/avg/max/mdev = 0.350/0.398/0.460/0.045 ms
<!-- gh-comment-id:2661040698 --> @mpeterson commented on GitHub (Feb 15, 2025): @ewwhite here it is raw from my notes, I hope you can use it to update your docs! ``` ## ## pre-requisites for all hosts: ## # - Attach two network interfaces: data plane and cluster mgmt plane # - Recommended in two different networks/VLANs, with the cluster mgmt plane isolated as much as possible # In this case: data plane (10.5.0.0/24) and cluster mgmt plane (10.5.25.0/24) # - Disable secure boot in the BIOS, otherwise the ZFS module won't work # In one of the hosts: prepare SAS disks in 4kn sudo sg_format --size=4096 --format /dev/sd[b-f] ## ## configuration: ## # Turn off firewall systemctl stop firewalld systemctl disable firewalld # OR allow HA in firewall firewall-cmd --permanent --add-service=high-availability firewall-cmd --reload # In all hosts dnf install -y epel-release dnf install -y htop nvim wget dnf install -y kernel-devel # to make sure we are running the latest kernel so dkms will be built correctly reboot # install zfs repo for el9 dnf install -y https://github.com/zfsonlinux/zfsonlinux.github.com/raw/refs/heads/master/epel/zfs-release-2-3.el9.noarch.rpm # in particular I was interested in zfs 2.3 and not zfs 2.1 dnf config-manager --enable zfs-testing dnf install -y zfs dnf config-manager --set-enabled highavailability dnf install -y device-mapper-multipath # I want more stability around identifiers so I'd rather use WWID, # thus the `--user_friendly_names n` mpathconf --enable --user_friendly_names n systemctl start multipathd systemctl enable multipathd # From here it is JUST once in one of the nodes, to create the ZFS pool zpool create tank -o ashift=12 -o autoexpand=on -o autoreplace=on -o cachefile=none raidz2 35000cca28405aaa4 35000cca2a948ddd0 35000cca2ab79f580 35000cca2c2dd2454 35000cca2c2eed534 # protect ZFS with multihost zgenhostid # this is the only command that needs to be run on each node before continuing zpool set multihost=on tank zfs set atime=off tank zfs set relatime=off tank zfs set acltype=posixacl tank zfs set xattr=sa tank zfs create tank/media zfs create tank/media/videos zfs create tank/media/photos zfs set recordsize=1M tank/media zfs create tank/downloads zfs create tank/downloads/torrents zfs set recordsize=16KB tank/downloads/torrents # Until here it is JUST once in one of the nodes, to create the ZFS pool ## Starting to configure the cluster # BEGIN on all nodes dnf install -y pcs fence-agents-all systemctl enable pcsd systemctl enable corosync systemctl enable pacemaker systemctl start pcsd cd /usr/lib/ocf/resource.d/heartbeat/ wget https://github.com/skiselkov/stmf-ha/raw/master/heartbeat/ZFS chmod +x ZFS passwd hacluster # END on all nodes # Choose one node and continue pcs host auth nas-01 nas-02 pcs cluster setup zfs-cluster --start --enable nas-01 addr=10.5.25.11 nas-02 addr=10.5.25.12 pcs property set no-quorum-policy=ignore pcs stonith create fence-tank fence_scsi pcmk_monitor_action="metadata" pcmk_host_list="nas-01,nas-02" devices="/dev/mapper/35000cca28405aaa4,/dev/mapper/35000cca2a948ddd0,/dev/mapper/35000cca2ab79f580,/dev/mapper/35000cca2c2dd2454,/dev/mapper/35000cca2c2eed534" meta provides=unfencing # the `--future` is to allow the new `group` option instead of the old `--group` # `--future` shall be removed in newer versions once supported pcs resource create --future tank 'ocf:heartbeat:ZFS' pool="tank" importargs="-d /dev/mapper/" op start timeout="90" op stop timeout="90" group group-nas pcs resource create --future nas-ip 'ocf:heartbeat:IPaddr2' ip=10.5.0.10 cidr_netmask=24 group group-nas pcs resource defaults update resource-stickiness=100 ``` And the result: ``` # pcs status Cluster name: zfs-cluster Cluster Summary: * Stack: corosync (Pacemaker is running) * Current DC: nas-01 (version 2.1.8-3.el9-3980678f0) - partition with quorum * Last updated: Sat Feb 15 20:38:31 2025 on nas-01 * Last change: Sat Feb 15 20:23:22 2025 by root via root on nas-01 * 2 nodes configured * 3 resource instances configured Node List: * Online: [ nas-01 nas-02 ] Full List of Resources: * fence-tank (stonith:fence_scsi): Started nas-01 * Resource Group: group-nas: * tank (ocf:heartbeat:ZFS): Started nas-01 * nas-ip (ocf:heartbeat:IPaddr2): Started nas-01 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ``` and checking the IP: ``` # ping nas PING nas (10.5.0.10) 56(84) bytes of data. 64 bytes from 10.5.0.10: icmp_seq=1 ttl=64 time=0.350 ms 64 bytes from 10.5.0.10: icmp_seq=2 ttl=64 time=0.386 ms 64 bytes from 10.5.0.10: icmp_seq=3 ttl=64 time=0.460 ms ^C --- nas ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2007ms rtt min/avg/max/mdev = 0.350/0.398/0.460/0.045 ms ```
Author
Owner

@mpeterson commented on GitHub (Feb 16, 2025):

after playing a bit with zfs multihost I ended up disabling it, as it was adding way too much delay (~10s) while fence_scsi should be more than enough to protect the pool.

<!-- gh-comment-id:2661342060 --> @mpeterson commented on GitHub (Feb 16, 2025): after playing a bit with `zfs multihost` I ended up disabling it, as it was adding way too much delay (~10s) while `fence_scsi` should be more than enough to protect the pool.
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/zfs-ha#39
No description provided.