Native ZFS on Linux
FAQ - Table of Contents
1.1 What is ZFS?

ZFS is a highly scalable future proof file system and logical volume manager designed around a few key concepts:

  • Data integrity is paramount.
  • Storage administration should be simple.
  • Everything can be done online.

For a full overview and description of the available features see this detailed wikipedia article. In addition to this functionality the OpenZFS development community is regularly adding new features and performance improvements.

1.2 How do I install it?

ZFS on Linux is available for numerous distributions and the installation process largely depends on the package manager. The following distributions all have support for ZFS and documentation on how it can be installed. If your distribution isn't listed below you can build ZFS using the officially released tarballs.

1.3 Why doesn’t it build?

Building a kernel module against an arbitrary kernel version is a complicated thing to do. Every Linux distribution has their own idea of how this should be done. It depends on the base kernel version, any distribution specific patches, and exactly how the kernel was configured. If you run in to problems here are few thing to check. If none of these things explain your problem, then please open a new issue which fully describes the problem.

  • The kernel API changes frequently, version 0.6.3 supports 2.6.26 - 3.15 kernels.
  • There are lots of Linux distributions, only these releases are part of the automated build/regression testing.
  • This may be a known issue check the SPL and ZFS issue tracker.
1.4 How do I mount the file system?

A mountable dataset will be created and automatically mounted when you first create the pool with zpool create. Additional datasets can be created with zfs create and they will be automatically mounted.

1.5 Why should I use a 64-bit system?

You are strongly encouraged to use a 64-bit kernel. At the moment zfs will build in a 32-bit environment but will not run stably.

In the Solaris kernel it is common practice to make heavy use of the virtual address space because it is designed to work well. However, in the Linux kernel most memory is addressed with a physical address and use of the virtual address space is strongly discouraged. This is particularly true on 32-bit arches where the virtual address space is limited to roughly 100MiB by default. Using the virtual address space on 64-bit Linux kernels is also discouraged. But in this case the address space is so much larger than physical memory it is not as much of an issue.

If you are bumping up against the virtual memory limit you will see the following message in your system logs. You can increase the virtual address size with the boot option vmalloc=512M.

vmap allocation for size 4198400 failed: use vmalloc=<size> to increase size.

However, even after making this change your system will likely not be entirely stable. Proper support for 32-bit systems is contingent upon the zfs code being weaned off its dependence on virtual memory. This will take some time to do correctly but it is planned for the Linux port. This change is also expected to improve how efficiently zfs utilizes the systems memory. And can be further leveraged to allow tighter integration with the standard Linux VM mechanisms.
1.6 What kernel versions are supported?

The current spl/zfs-0.6.3 release supports Linux 2.6.26 - 3.15 kernels. This covers most of the kernels used in the major Linux distributions. The following distributions are regularly tested using a buildbot based continuous integration development model. If you need support for a newer kernel you may find it in the latest github sources.

    RHEL Ubuntu Fedora
  • RHEL 7 - x86_64
  • RHEL 6 - x86_64
  • CentOS 6 - x86_64
  • TOSS 2 - x86_64
  • Ubuntu 10.04 (Lucid) - x86_64
  • Ubuntu 12.04 (Precise) - x86_64
  • Ubuntu 12.04 (Precise) - i386
  • Ubuntu 12.10 (Quantal) - x86_64
  • Ubuntu 13.04 (Raring) - x86_64
  • Ubuntu 13.10 (Saucy) - x86_64
  • Ubuntu 14.04 (Trusty) - x86_64
  • Fedora 18 (Spherical Cow) - x86_64
  • Fedora 19 (Schrodinger's Cat) - x86_64
  • Fedora 20 (Heisenbug) - x86_64
  • Debian Other
  • Debian 6.0 (Squeeze) - x86_64
  • Debian 7.0 (Wheezy) - x86_64
  • Proxmox 2.0 - x86_64
  • ArchLinux (Current) - x86_64
  • Gentoo (Current) - x86_64
1.7 What /dev/ names should I use when creating my pool?

There are different /dev/ names that can be used when creating a ZFS pool. Each option has advantages and drawbacks, the right choice for your ZFS pool really depends on your requirements. For development and testing using /dev/sdX naming is quick and easy. A typical home server might prefer /dev/disk/by-id/ naming for simplicity and readability. While very large configurations with multiple controllers, enclosures, and switches will likely prefer /dev/disk/by-vdev naming for maximum control. But in the end, how you choose to identify your disks is up to you.

  • /dev/sdX, /dev/hdX: Best for development/test pools
    • Summary: The top level /dev/ names are the default for consistency with other ZFS implementations. They are available under all Linux distributions and are commonly used. However, because they are not persistent they should only be used with ZFS for development/test pools.
    • Benefits:This method is easy for a quick test, the names are short, and they will be available on all Linux distributions.
    • Drawbacks:The names are not persistent and will change depending on what order they disks are detected in. Adding or removing hardware for your system can easily cause the names to change. You would then need to remove the zpool.cache file and re-import the pool using the new names.
    • Example:
      $ sudo zpool create tank sda sdb

  • /dev/disk/by-id/: Best for small pools (less than 10 disks)
    • Summary: This directory contains disk identifiers with more human readable names. The disk identifier usually consists of the interface type, vendor name, model number, device serial number, and partition number. This approach is more user friendly because it simplifies identifying a specific disk.
    • Benefits: Nice for small systems with a single disk controller. Because the names are persistent and guaranteed not to change, it doesn't matter how the disks are attached to the system. You can take them all out, randomly mixed them up on the desk, put them back anywhere in the system and your pool will still be automatically imported correctly.
    • Drawbacks: Configuring redundancy groups based on physical location becomes difficult and error prone.
    • Example:
      $ sudo zpool create tank scsi-SATA_Hitachi_HTS7220071201DP1D10DGG6HMRP

  • /dev/disk/by-path/: Good for large pools (greater than 10 disks)
    • Summary: This approach is to use device names which include the physical cable layout in the system, which means that a particular disk is tied to a specific location. The name describes the PCI bus number, as well as enclosure names and port numbers. This allows the most control when configuring a large pool.
    • Benefits: Encoding the storage topology in the name is not only helpful for locating a disk in large installations. But it also allows you to explicitly layout your redundancy groups over multiple adapters or enclosures.
    • Drawbacks: These names are long, cumbersome, and difficult for a human to manage.
    • Example:
      $ sudo zpool create tank pci-0000:00:1f.2-scsi-0:0:0:0 pci-0000:00:1f.2-scsi-1:0:0:0

  • /dev/disk/by-vdev/: Best for large pools (greater than 10 disks)
    • Summary: This approach provides administrative control over device naming using the configuration file /etc/zfs/vdev_id.conf. Names for disks in JBODs can be generated automatically to reflect their physical location by enclosure IDs and slot numbers. The names can also be manually assigned based on existing udev device links, including those in /dev/disk/by-path or /dev/disk/by-id. This allows you to pick your own unique meaningful names for the disks. These names will be displayed by all the zfs utilities so it can be used to clarify the administration of a large complex pool. See the vdev_id and vdev_id.conf man pages for further details.
    • Benefits: The main benefit of this approach is that it allows you to choose meaningful human-readable names. Beyond that, the benefits depend on the naming method employed. If the names are derived from the physical path the benefits of /dev/disk/by-path are realized. On the other hand, aliasing the names based on drive identifiers or WWNs has the same benefits as using /dev/disk/by-id.
    • Drawbacks: This method relies on having a /etc/zfs/vdev_id.conf file properly configured for your system. To configure this file please refer to section 1.9 How do I setup the /etc/zfs/vdev_id.conf file? As with benefits, the drawbacks of /dev/disk/by-id or /dev/disk/by-path may apply depending on the naming method employed.
    • Example:
      $ sudo zpool create tank mirror A1 B1 mirror A2 B2
1.8 How do I change the /dev/ names on an existing pool?

Changing the /dev/ names on an existing pool can be done by simply exporting the pool and re-importing it with the -d option to specify which new names should be used. For example, to use the custom names in /dev/disk/by-vdev:

$ sudo zpool export tank
$ sudo zpool import -d /dev/disk/by-vdev tank
1.9 How do I setup the /etc/zfs/vdev_id.conf file?

In order to use /dev/disk/by-vdev/ naming the /etc/zfs/vdev_id.conf must be configured. The format of this file is described in the vdev_id.conf man page. Several examples follow.

  • A non-multipath configuration with direct-attached SAS enclosures and an arbitrary slot re-mapping.
    
                multipath     no
                topology      sas_direct
                phys_per_port 4
    
                #       PCI_SLOT HBA PORT  CHANNEL NAME
                channel 85:00.0  1         A
                channel 85:00.0  0         B
    
                #    Linux      Mapped
                #    Slot       Slot
                slot 0          2
                slot 1          6
                slot 2          0
                slot 3          3
                slot 4          5
                slot 5          7
                slot 6          4
                slot 7          1
    
  • A SAS-switch topology. Note that the channel keyword takes only two arguments in this example.
                topology      sas_switch
    
                #       SWITCH PORT  CHANNEL NAME
                channel 1            A
                channel 2            B
                channel 3            C
                channel 4            D
    
  • A multipath configuration. Note that channel names have multiple definitions - one per physical path.
                multipath yes
    
                #       PCI_SLOT HBA PORT  CHANNEL NAME
                channel 85:00.0  1         A
                channel 85:00.0  0         B
                channel 86:00.0  1         A
                channel 86:00.0  0         B
    
  • A configuration using device link aliases.
                #     by-vdev
                #     name     fully qualified or base name of device link
                alias d1       /dev/disk/by-id/wwn-0x5000c5002de3b9ca
                alias d2       wwn-0x5000c5002def789e
    

After defining the new disk names run udevadm trigger to prompt udev to parse the configuration file. This will result in a new /dev/disk/by-vdev directory which is populated with symlinks to /dev/sdX names. Following the first example above, you could then create the new pool of mirrors with the following command:

$ sudo zpool create tank \
	mirror A0 B0 mirror A1 B1 mirror A2 B2 mirror A3 B3 \
	mirror A4 B4 mirror A5 B5 mirror A6 B6 mirror A7 B7

$ sudo zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    A0      ONLINE       0     0     0
	    B0      ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    A1      ONLINE       0     0     0
	    B1      ONLINE       0     0     0
	  mirror-2  ONLINE       0     0     0
	    A2      ONLINE       0     0     0
	    B2      ONLINE       0     0     0
	  mirror-3  ONLINE       0     0     0
	    A3      ONLINE       0     0     0
	    B3      ONLINE       0     0     0
	  mirror-4  ONLINE       0     0     0
	    A4      ONLINE       0     0     0
	    B4      ONLINE       0     0     0
	  mirror-5  ONLINE       0     0     0
	    A5      ONLINE       0     0     0
	    B5      ONLINE       0     0     0
	  mirror-6  ONLINE       0     0     0
	    A6      ONLINE       0     0     0
	    B6      ONLINE       0     0     0
	  mirror-7  ONLINE       0     0     0
	    A7      ONLINE       0     0     0
	    B7      ONLINE       0     0     0

errors: No known data errors
1.10 What’s going on with performance?

To achieve good performance with your pool there are some easy best practices you should follow. Additionally, it should be made clear that the ZFS on Linux implementation has not yet been optimized for performance. As the project matures we can expect performance to improve.

  • Evenly balance your disk across controllers: Often the limiting factor for performance is not the disk but the controller. By balancing your disks evenly across controllers you can often improve throughput.
  • Create your pool using whole disks: When running zpool create use whole disk names. This will allow ZFS to automatically partition the disk to ensure correct alignment. It will also improve interoperability with other ZFS implementations which honor the wholedisk property.
  • Have enough memory: A minimum of 2GB of memory is recommended for ZFS. Additional memory is strongly recommended when the compression and deduplication features are enabled.
  • Improve performance by setting ashift=12: You may be able to improve performance for some workloads by setting ashift=12. This tuning can only be set when the pool is first created and it will result in a decrease of capacity. For additional detail on why you should set this option when using Advanced Format drives see section 1.15 How does ZFS on Linux handles Advanced Format disks?
1.11 What does the /etc/zfs/zpool.cache file do?

Whenever a pool is imported in the system it will be added to the /etc/zfs/zpool.cache file. This file stores pool configuration information such as the vdev device names and the active pool state. If this file exists when the ZFS modules are loaded then any pool listed in the cache file will be automatically imported. When a pool is not listed in the cache file it will need to be explicitly imported.

1.12 How do I setup an NFS or SMB shares?

ZFS has been integrated with the Linux NFS and SMB servers. You can share a ZFS file system by setting the sharenfs or sharesmb file system property. For example, to share the file system tank/home via NFS and SMB with the default options.

$ sudo zfs set sharenfs=on tank/home
$ sudo zfs set sharesmb=on tank/home

Note you must still manually configure your network to allow NFS of SMB. You will also need to make sure that the NFS and SMB packages for your distribution are installed.

1.13 Can I boot from ZFS?
Yes, numerous people have had success with this. However, because it still requires the latest versions of grub and is distribution specific we don't recommend it. Instead we suggest using ZFS as your root file system. There are excellent walk through available for both Ubuntu and Gentoo.
1.14 How do I automatically mount ZFS file systems during startup?
  • Ubuntu PPA: Auto mounting is provided by the enhanced mountall package from the ZFS PPA. Install the ubuntu-zfs package to get this feature.
  • Fedora, RHEL, Arch, Gentoo, Lunar: Init scripts for these distributions have been provided. If your distribution of choice isn't represented please submit an init script modeled on one of these so we can include it.

Note that the SELinux policy for ZFS on Linux is not yet implemented. This can lead to issues such as the init script failing to auto-mount the filesystems when SELinux is set to enforce. The long term solution is to add ZFS as a known filesystem type which supports xattrs to the default SELinux policy. This is something which must be done by the upstream Linux distribution. In the mean time, you can workaround this by setting SELinux to permissive or disabled.

$ cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted
1.15 How does ZFS on Linux handles Advanced Format disks?

Advanced Format (AF) is a new disk format which natively uses a 4,096 byte instead of 512 byte sector size. To maintain compatibility with legacy systems AF disks emulate a sector size of 512 bytes. By default, ZFS will automatically detect the sector size of the drive. This combination will result in poorly aligned disk access which will greatly degrade the pool performance.

Therefore the ability to set the ashift property has been added to the zpool command. This allows users to explicitly assign the sector size at pool creation time. The ashift values range from 9 to 16 with the default value 0 meaning auto-detect the sector size. This value is actually a bit shift value, so an ashift value for 512 bytes is 9 (29 = 512) while the ashift value for 4,096 bytes is 12 (212 = 4,096). To force the pool to use 4,096 byte sectors we must specify this at pool creation time:

$ sudo zpool create -o ashift=12 tank mirror sda sdb
1.16 Do I have to use ECC memory for ZFS?

Using ECC memory for ZFS is strongly recommended for enterprise environments where the strongest data integrity guarantees are required. Without ECC memory rare random bit flips caused by cosmic rays or by faulty memory can go undetected. If this were to occur ZFS (or any other filesystem) will write the damaged data to disk and be unable to automatically detect the corruption.

Unfortunately, ECC memory is not always supported by consumer grade hardware. And even when it is ECC memory will be more expensive. For home users the additional safety brought by ECC memory might not justify the cost. It's up to you to determine what level of protection your data requires.

1.17 Can I use a ZVOL for swap?

Yes. Just make sure you set the ZVOL block size to match your systems page size, for x86_64 systems that is 4k. This tuning prevents ZFS from having to perform read-modify-write options on a larger block while the system is already low on memory.

$ sudo zfs create tank/swap -V 2G -b 4K
$ sudo mkswap -f /dev/tank/swap
$ sudo swapon /dev/tank/swap
1.18 How do I generate the /etc/zfs/zpool.cache file?

The /etc/zfs/zpool.cache file will be automatically updated when your pool configuration is changed. However, if for some reason it becomes stale you can force the generation of a new /etc/zfs/zpool.cache file by setting the cachefile property on the pool.

$ sudo zpool set cachefile=/etc/zfs/zpool.cache tank
1.19 Can I run ZFS on Xen Hypervisor or Xen Dom0?

Sure, but it is usually recommended to keep virtual machine storage and hypervisor pools, quite separate. Although few people have managed to successfully deploy and run ZFS on the same machine configured as Dom0. There are few caveats:

  • Set a fair amount of memory in grub.conf, dedicated to Dom0. e.g.
    dom0_mem=16384M,max:16384M
  • Allocate no more of 30-40% of Dom0's memory to ZFS in /etc/modprobe.d/zfs.conf. e.g.
    options zfs zfs_arc_max=6442450944
  • Disable Xen's auto-ballooning in
    /etc/xen/xl.conf
  • Watch out for any Xen bugs, such as this one related to ballooning
For details, please see issue #1067 on GitHub.

2.1 How can I help?

The most helpful thing you can do is to try ZFS on your Linux system and report any issues. If you like what you see and would like to contribute to the project please send me an email. There are quite a few open issues on the issue tracker which need attention or if you have an idea of your own that is fine too.

2.2 What is the licensing concern?

ZFS is licensed under the Common Development and Distribution License (CDDL), and the Linux kernel is licensed under the GNU General Public License Version 2 (GPLv2). While both are free open source licenses they are restrictive licenses. The combination of them causes problems because it prevents using pieces of code exclusively available under one license with pieces of code exclusively available under the other in the same binary. In the case of the kernel, this prevents us from distributing ZFS as part of the kernel binary. However, there is nothing in either license that prevents distributing it in the form of a binary module or in the form of source code.

For further reading on this issue see the following excellent article regarding non-GPL licensed kernel modules.

3.1 How do I report problems?

You can open a new issue and search existing issues using the public issue tracker. The issue tracker is used to organize outstanding bug reports, feature requests, and other development tasks. Anyone may post comments after signing up for a github account.

When a new issue is opened it's not uncommon for a developer to request additional information about the problem. In general, the more detail you share about a problem the quicker a developer can resolve it. For example, providing a simple test case is always exceptionally helpful. At a minimum you should provide a description of the problem and the following:

  • Your pool configuration from the output of `zdb` if possible, or `zpool status` otherwise.
  • Your hardware configuration, such as
    • Number of CPUs
    • Amount of memory
    • Whether it is running under a VMM/Hypervisor
    • Whether your system has ECC memory
  • System configuration, such as
    • Linux distribution name and version
    • Kernel version as displayed by `uname -a`
    • If you have a custom built kernel then the configuration file for your kernel is also useful. eg: `zcat /proc/config.gz`
    • SPL and ZFS version and where it came from (package from the ZoL package repository or built from the GIT repository - be sure to include the commit revision
      To find out what exact version is actually loaded, run
      # dmesg | grep -E 'SPL:|ZFS:'
      If nothing is shown, it might have been lost (the kernel only keeps a certain amount). In that case, you will have to look for it in the logfiles:
      # cat /var/log/dmesg | grep -E 'SPL:|ZFS:'
      If it isn't there either, try some of the other dmesg.* files.
    • Values of the ZFS/SPL module parameters.
      To get the full list, run
      for param in /sys/module/{spl,zfs}/parameters/*; do printf "%-50s" `basename $param`; cat $param; done
    • short description on what the system does (fileserver, iSCSI, mail- or user homes server etc)
  • A description of what the system was doing at the time of the crash. For example, clients were communicating with a Samba server configured to use AIO.
  • If you include a stack trace or large blocks of text, make sure to use the GitHub markup language appropriately (using the 'triple backticks' example).

You can use this template example (notice the three backticks):

```
CPUs:			1
Memory:			1GB
VM/Hypervisor:		yes
ECC mem:		no
Distribution:		Debian GNU/Linux
Kernel version:		Linux DebianZFS-Wheezy-SCST2 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux
SPL/ZFS source:		Packages built from GIT repository
SPL/ZFS version:	DebianZFS-Wheezy-SCST2:/usr/src/zfs# dmesg | grep -E 'SPL:|ZFS:'
			[    4.937840] SPL: Loaded module v0.6.2-23_g4c99541 (DEBUG mode)
			[    5.384665] ZFS: Loaded module v0.6.2-296_g21b446a, ZFS pool version 5000, ZFS filesystem version 5
			[    5.875207] SPL: using hostid 0xa8c03245
System services:	Development and testing of ZoL
Short  description:	Removing large number of files caused the system to hang
```

The following information is also helpful, especially when dealing with hung processes:

  • Stack traces for all threads. Because ZFS is asynchronous all threads should be reported where possible. These can be found in /proc/$pid/task/$tid/stack for all processes. An example script to retrieve this information would be:
    # cd /proc
    # for pid in [1-9]*
    do
      echo $pid:
      cat /proc/$pid/cmdline
      echo
      for task in /proc/$pid/task/*
      do
         cat $task/stack
         echo ===
      done
    done > /tmp/stacktraces.log
    
    • If these files does not exist then you can run:
      # echo t > /proc/sysrq-trigger
      This will dump the stacks of all processes to the output of `dmesg` and your system logs. However this method is not as reliable as the amount of information may overflow the log buffer.
    • Your kernel may automatically dump stack traces of hung tasks in the D state every 2 minutes.
  • The contents of the files /proc/spl/kstat/zfs/arcstats and /proc/spl/kmem/slab

Of course include any other information you feel is relevant.

As some of this can be a large amount of information github has a service similar to a pastebin at https://gist.github.com/ which you can use to hold the bulk of the data.