Is it storage system time ?
So todays’ post will be short about creating ZFS pool on CentOs 7. This is logical follow up from previous post where I covered build out of new server. So what i have decided on is software RAID-1 for OS system using LVM.
Now for the data disk I have 3x4TB disks. And after looking around I made decision to use ZFS. Why ZFS ? Its reliable ( worked with systems based on it before ) and its really fast if you do a deep dive and configure it up to your needs. As I would like to avoid duplication of posts you can find install guidelines in here on ZFS wiki.
For some of ppl ( like me 🙂 ) it’s handy to drop an eye on documentation so you know what you are dealing with. This can be good entry point before we continue and I will most probably refer you to RT*M 🙂 couple of times along the way. Documentation for administering ZFS is here
Which drives do we use ?
So let’s start by checking our available disks
[[email protected] ~]# fdisk -l /dev/sd? ### OS DISKS REMOVED FOR VISIBILITY ### Disk /dev/sdc: 4000.8 GB, 4000787030016 bytes, 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk /dev/sdd: 4000.8 GB, 4000787030016 bytes, 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk /dev/sde: 4000.8 GB, 4000787030016 bytes, 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk /dev/sdf: 240.1 GB, 240057409536 bytes, 468862128 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0xdb1d2969 Device Boot Start End Blocks Id System [[email protected] ~]#
Although here it might be worth to look into assign human readable alias details to your drives. In single host scenario it might not be so useful. But when you get into working with enterprise systems in production where for obvious reasons 🙂 you have more than one server it becomes really handy.
But before actually doing this on operating system I have done the prep work on the server itself
So off we go to create vdev_id.conf /etc/zfs/vdev_id.conf
# # Custom by-path mapping for large JBOD configurations # #<ID> <by-path name> alias BAY1_DISK1 pci-0000:00:17.0-ata-1.0 alias BAY1_DISK2 pci-0000:00:17.0-ata-2.0 alias BAY0_DISK2 pci-0000:00:17.0-ata-3.0 alias BAY0_DISK1 pci-0000:00:17.0-ata-4.0 alias BAY0_DISK0 pci-0000:00:17.0-ata-5.0 # alias xxx pci-0000:00:17.0-ata-6.0
Once this is done we need to trigger update using the udevadm command
Now after doing the above we will be able to list the disks using our aliases
Now all its left to do is to create ZFS pool. However just to be on the safe side we can execute a dry run.
zpool create -f -n data raidz BAY0_DISK0 BAY0_DISK1 BAY0_DISK2
In the command above the following happens:
- we request pool to be created by using zpool create
- we indicate we would like to have a dry run by using the -n switch
- data is our pool name
- RaidZ is ZFS raid type which I have chosen since I have 3 disks ( would be cool to have 4 and use RaidZ2)
Result shows what would be done for our drives
For me this looks promising – lets go ahead and get our pool created for real.
zpool create -f -o ashift=12 -O atime=off -m /pools/data data raidz BAY0_DISK0 BAY0_DISK1 BAY0_DISK2
- -f : forces creation as ZFS suspects we have partition on those drives – but trust me – we don’t
- ashift=12 : following recommendation of drives with 4k blocksizes ( Advanced Format Drives – which I recommended to get familiar with)
- atime=off : disable access time which in return gives us more performance boost. This is something you need to decide if you would be using it
- -m : is our mount point for the pool. Directory needs to exist already
- RAIDZ : is of course the type of RAIDZ we would be using
The reason I’m mentioning here 4K Advanced Format drive is performance. Found here is snippet from forum thread that explains what we are looking at:
Furthermore, some ZFS pool configurations are much better suited towards 4K advanced format drives.
The following ZFS pool configurations are optimal for modern 4K sector harddrives:
RAID-Z: 3, 5, 9, 17, 33 drives
RAID-Z2: 4, 6, 10, 18, 34 drives
RAID-Z3: 5, 7, 11, 19, 35 drives
The trick is simple: substract the number of parity drives and you get:
2, 4, 8, 16, 32 …
This has to do with the recordsize of 128KiB that gets divided over the number of disks. Example for a 3-disk RAID-Z writing 128KiB to the pool:
disk1: 64KiB data (part1)
disk2: 64KiB data (part2)
disk3: 64KiB parity
Each disk now gets 64KiB which is an exact multiple of 4KiB. This means it is efficient and fast. Now compare this with a non-optimal configuration of 4 disks in RAID-Z:
disk1: 42,66KiB data (part1)
disk2: 42,66KiB data (part2)
disk3: 42,66KiB data (part3)
disk4: 42,66KiB parity
Now this is ugly! It will either be downpadded to 42.5KiB or padded toward 43.00KiB, which can vary per disk. Both of these are non optimal for 4KiB sector harddrives. This is because both 42.5K and 43K are not whole multiples of 4K. It needs to be a multiple of 4K to be optimal.
So after running the command above we have our pool running
And thats more less it for now 🙂 we got our pool running and mounted as it should.
Extra resources ? Something for future ?
In later posts we will look into performance consideration within different configurations. Which will enable us to be faster based on factual decisions in configuration.
Also I have came across really useful post about ZFS which you can find below: