This document explains how ZFS calculates the amount of used space within a zpool and provides examples of how various ZFS features affect that calculation.
DETAILS
A ZFS storage pool is a logical collection of devices that provide space for datasets such as filesystems, snapshots and volumes. A zfs pool can be used as a filesystem, i.e. such as mounting/unmounting; to take snapshots that provides read-only (clones are writable copies of snapshots) copies of the filesystem taken in the past; to create volumes that can be accessed as a raw and a block device and for setting properties on datasets.
Interaction of all these datasets and associated properties such as quotas, reservations, compression and deduplication (Solaris 11 only) play a role in how ZFS calculates space usage. Neither the du nor df command have been updated to account for ZFS file system space usage. When calculating zfs space usage, always use the following command.
# zfs list -r -o space <pool_name>
Alternatively, if the system is running Solaris 10 Update 8 or later, one can use:
zfs list -t all -o name,avail,used,usedsnap,usedds,usedrefreserv,usedchild -r pool_name
ZFS supports datasets of types: filesystem, volume and snapshot.
Snapshot: Snapshots are read-only copies of the filesystem taken at some time in the past. A clone is a type of snapshot but it is writable. When snapshots are created, their space are initially shared between the snapshot and filesystem and possibly with the previous snapshots. As the filesystem changes, space that was previously shared becomes unique to the snapshot and thus counted in the snapshot's space usage. Deleting snapshots can increase the amount of used space unique to the snapshots.
Volume: A logical volume exported as a raw or block device.
Properties like quota, reservation and deduplication (dedup) also influence the space reported by ZFS:
Quota: Limits the amount of space a dataset and its descendents can consume. It is the limit at which, if there is no space available in the zpool, processes will fail to create files or allocate space.
Reservation: The minimum amount of space guaranteed to a dataset and its descendents. When the amount of space used is below this value, the dataset is treated as if it were taking up the amount of space specified by its reservation. Reservations are accounted for in the parent datasets' space usage and count against the parent datasets' quotas and reservations.
Deduplication:
Deduplication is the process of eliminating duplicate copies of data. ZFS provides inline block-level deduplication. Enabling the ZFS dedup property saves space and also increases performance. Performance improvement is due to the elimination of disk writes when storing duplicate data, plus the reduced memory footprint due to many applications sharing the same pages of memory. Deduplicated space accounting is reported at the pool level. You must use the zpool list command rather than the zfs list command to identify disk space consumption when dedup is enabled. If you use the zfs list command to review deduplicated space, you might see that the filesystem size appears to be increasing because we are able to store more data on the same physical device. Using the zpool list will show you how much physical space is being consumed and it will also show you the dedup ratio.
Note: The df and du commands are not dedup or compression aware and will not provide accurate space accounting. Always use the zfs and zpool commands for accurate space accounting information.
The following examples show common scenarios which explain why zfs and zpool may show different values to df and du.
Example #1: Creating snapshots and calculating space used.
Let's create a non-redundant pool of size 1G:
# mkfile 1G /dev/dsk/mydisk
# zpool create mypool mydisk
# zpool list mypool
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool 1016M 88K 1016M 0% ONLINE -
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 85K 984M 24.5K /mypool
# zpool create mypool mydisk
# zpool list mypool
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool 1016M 88K 1016M 0% ONLINE -
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 85K 984M 24.5K /mypool
Now let's create some files and snapshots:
# mkfile 100M /mypool/file1
# zfs snapshot mypool@snap1
# mkfile 100M /mypool/file2
# zfs snapshot mypool@snap2
# zfs snapshot mypool@snap1
# mkfile 100M /mypool/file2
# zfs snapshot mypool@snap2
List the amount of data referenced by the snapshots and clones. Initially, snapshots and clones reference the same amount of space as the filesystem since their contents are identical.
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 200M /mypoolome
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 200M /mypoolome
As shown, 200MB of disk space has been used, none of which is used by the snapshots. snap1 refers to 100MB of data (file1) and snap2 refers to 200MB of data (file1 and file2).
# zfs list -t snapshot -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool@snap1 23.5K - 100M -
mypool@snap2 0 - 200M -
Now let's remove file1 to see how it affects the space usage. As you can see nothing changed except the referenced value, considering it is still referenced by snapshots.
# rm /mypool/file1
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 100M /mypool
# zfs list -t snapshot -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool@snap1 23.5K - 100M -
mypool@snap2 23.5K - 200M -
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 100M /mypool
# zfs list -t snapshot -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool@snap1 23.5K - 100M -
mypool@snap2 23.5K - 200M -
So why don't the snapshots reflect this in their USED column? You may think we should show 100MB used by snap1, however, this would be misleading as deleting snap1 has no effect on the data used by the mypool filesystem. Deleting snap1 would only free up 17KB of disk space. You can, however, use the following zfs option to find space used by snapshots.
# zfs list -t all -o space -r mypool
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
mypool 784M 200M 100M 100M 0 154K
mypool@snap1 - 17K - - - -
mypool@snap2 - 17K - - - -
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
mypool 784M 200M 100M 100M 0 154K
mypool@snap1 - 17K - - - -
mypool@snap2 - 17K - - - -
As you can see that 20MB is used by the file system, 100MB is used by snapshots (file1) and 100MB is used by the dataset itself (file2).
Now let's delete snapshot snap1 considering it is not using much space.
Now let's delete snapshot snap1 considering it is not using much space.
# zfs destroy mypool@snap1
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 100M /mypool
# zfs list -t snapshot -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool@snap2 100M - 200M -
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 100M /mypool
# zfs list -t snapshot -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool@snap2 100M - 200M -
We can see that snap2 now shows 100MB used. If I were to delete snap2, I would be deleting 100MB of data (or reclaiming 100MB of space).
# zfs destroy mypool@snap2
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 100M 884M 100M /mypool
# zfs list -t snapshot -r mypool
no datasets available
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 100M 884M 100M /mypool
# zfs list -t snapshot -r mypool
no datasets available
Example #2: Using Quotas
Quotas set the limit on the space used by a dataset. It does not reserve any ZFS filesystem space.
# zfs create -o quota=100m mypool/amer
# zfs get quota mypool/amer
NAME PROPERTY VALUE SOURCE
mypool/amer quota 100M local
# zfs get quota mypool/amer
NAME PROPERTY VALUE SOURCE
mypool/amer quota 100M local
As you can see, there no change in the amount of space used:
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 100M 884M 100M /mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 100M 884M 100M /mypool
Let take a snapshot of the amer dataset.
# zfs snapshot mypool/amer@snap1
# zfs list mypool/amer
NAME USED AVAIL REFER MOUNTPOINT
mypool/amer 24.5K 100M 24.5K /mypool/amer
# zfs list -t snapshot -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool/amer@snap1 0 - 24.5K -
# zfs list mypool/amer
NAME USED AVAIL REFER MOUNTPOINT
mypool/amer 24.5K 100M 24.5K /mypool/amer
# zfs list -t snapshot -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool/amer@snap1 0 - 24.5K -
Upgrade the snapshot to a clone and set a 100MB quota. Remember, clones can only be created from a snapshot. When a snapshot is cloned, an implicit dependency is created between the clone and snapshot. Even though the clone is created somewhere else in the dataset hierarchy, the original snapshot cannot be destroyed as long as the clone exists. Because a clone initially shares all its disk space with the original snapshot, its used property is initially zero. As changes are made to the clone, it uses more space. The used property of the original snapshot does not consider the disk space consumed by the clone.
# zfs clone mypool/amer@snap1 mypool/clone1
# zfs set quota=100m mypool/clone1
# zfs get quota mypool/clone1
NAME PROPERTY VALUE SOURCE
mypool/clone1 quota 100M local
# zfs list mypool/amer
NAME USED AVAIL REFER MOUNTPOINT
mypool/amer 24.5K 100M 24.5K /mypool/amer
# zfs list mypool/clone1
NAME USED AVAIL REFER MOUNTPOINT
mypool/clone1 0 100M 24.5K /mypool/clone1
# zfs set quota=100m mypool/clone1
# zfs get quota mypool/clone1
NAME PROPERTY VALUE SOURCE
mypool/clone1 quota 100M local
# zfs list mypool/amer
NAME USED AVAIL REFER MOUNTPOINT
mypool/amer 24.5K 100M 24.5K /mypool/amer
# zfs list mypool/clone1
NAME USED AVAIL REFER MOUNTPOINT
mypool/clone1 0 100M 24.5K /mypool/clone1
To get a breakdown of space used by snapshot and clone use:
# zfs list -r -o space mypool
Example #3: Setting Reservations
Reservations deduct space from the zpool and reserve it for the dataset.
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 100M 884M 100M /mypool
# zfs create -o reservation=100m mypool/ather
# zfs get reservation mypool/ather
NAME PROPERTY VALUE SOURCE
mypool/ather reservation 100M local
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 100M /mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 100M 884M 100M /mypool
# zfs create -o reservation=100m mypool/ather
# zfs get reservation mypool/ather
NAME PROPERTY VALUE SOURCE
mypool/ather reservation 100M local
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 100M /mypool
Example #4: ZFS Volumes (ZVOL)
Similar to the reservation property, creating a zvol results in space being deducted from the zpool.
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 100M 884M 100M /mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 100M 884M 100M /mypool
Let's create a 100MB zvol.
# zfs create -V 100m mypool/vol1
# zfs list -t volume -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool/vol1 22.5K 884M 22.5K -
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 100M /mypool
# zfs list -t volume -r mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool/vol1 22.5K 884M 22.5K -
# zfs list mypool
NAME USED AVAIL REFER MOUNTPOINT
mypool 200M 784M 100M /mypool
Example #5: Overlaid Mounts
An overlaid mountpoint occurs when a filesystem is mounted over a directory that contains data. Normally a filesystem mountpoint should be empty to prevent space accounting anolomies. In situations like this, zfs and zpool will show more space used than df and du because df/du cannot see underneath the mountpoint whereas zpool and zfs can.
In this example a new pool called 'tank' is created. Within the root filesystem (/tank) a 2GB file is created. A 500MB file is created within a new directory 'tank/challenger2'
# zpool create tank c0d1
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
rpool 19.9G 5.45G 14.4G 27% ONLINE -
tank 4.97G 78.5K 4.97G 0% ONLINE -
# mkfile 2g /tank/2GBFile
# mkdir tank/challenger2
# mkfile 500m /tank/challenger2/500MBFILE
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
rpool 19.9G 5.45G 14.4G 27% ONLINE -
tank 4.97G 2.49G 2.48G 50% ONLINE -
# zfs list -r
NAME USED AVAIL REFER MOUNTPOINT
rpool 5.86G 13.7G 106K /rpool
rpool/ROOT 4.34G 13.7G 31K legacy
rpool/ROOT/s10s_u10wos_17b 4.34G 13.7G 4.34G /
rpool/dump 1.00G 13.7G 1.00G -
rpool/export 63K 13.7G 32K /export
rpool/export/home 31K 13.7G 31K /export/home
rpool/swap 528M 14.1G 114M -
tank 2.49G 2.40G 2.49G /tank
# df -h
Filesystem size used avail capacity Mounted on
rpool/ROOT/s10s_u10wos_17b
20G 4.3G 14G 25% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 400M 416K 399M 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
sharefs 0K 0K 0K 0% /etc/dfs/sharetab
/platform/SUNW,T5240/lib/libc_psr/libc_psr_hwcap2.so.1
18G 4.3G 14G 25% /platform/sun4v/lib/libc_psr.so.1
/platform/SUNW,T5240/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
18G 4.3G 14G 25% /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd 0K 0K 0K 0% /dev/fd
swap 399M 32K 399M 1% /tmp
swap 399M 48K 399M 1% /var/run
rpool/export 20G 32K 14G 1% /export
rpool/export/home 20G 31K 14G 1% /export/home
rpool 20G 106K 14G 1% /rpool
tank 4.9G 2.5G 2.4G 51% /tank
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
rpool 19.9G 5.45G 14.4G 27% ONLINE -
tank 4.97G 78.5K 4.97G 0% ONLINE -
# mkfile 2g /tank/2GBFile
# mkdir tank/challenger2
# mkfile 500m /tank/challenger2/500MBFILE
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
rpool 19.9G 5.45G 14.4G 27% ONLINE -
tank 4.97G 2.49G 2.48G 50% ONLINE -
# zfs list -r
NAME USED AVAIL REFER MOUNTPOINT
rpool 5.86G 13.7G 106K /rpool
rpool/ROOT 4.34G 13.7G 31K legacy
rpool/ROOT/s10s_u10wos_17b 4.34G 13.7G 4.34G /
rpool/dump 1.00G 13.7G 1.00G -
rpool/export 63K 13.7G 32K /export
rpool/export/home 31K 13.7G 31K /export/home
rpool/swap 528M 14.1G 114M -
tank 2.49G 2.40G 2.49G /tank
# df -h
Filesystem size used avail capacity Mounted on
rpool/ROOT/s10s_u10wos_17b
20G 4.3G 14G 25% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 400M 416K 399M 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
sharefs 0K 0K 0K 0% /etc/dfs/sharetab
/platform/SUNW,T5240/lib/libc_psr/libc_psr_hwcap2.so.1
18G 4.3G 14G 25% /platform/sun4v/lib/libc_psr.so.1
/platform/SUNW,T5240/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
18G 4.3G 14G 25% /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd 0K 0K 0K 0% /dev/fd
swap 399M 32K 399M 1% /tmp
swap 399M 48K 399M 1% /var/run
rpool/export 20G 32K 14G 1% /export
rpool/export/home 20G 31K 14G 1% /export/home
rpool 20G 106K 14G 1% /rpool
tank 4.9G 2.5G 2.4G 51% /tank
Now mount an NFS Filesystem on /tank/challenger2. The filesystem type is not important.
# mount -F nfs nfssrv:/export/iso /tank/challenger2
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
rpool 19.9G 5.45G 14.4G 27% ONLINE -
tank 4.97G 2.49G 2.48G 50% ONLINE -
# zfs list -r
NAME USED AVAIL REFER MOUNTPOINT
rpool 5.86G 13.7G 106K /rpool
rpool/ROOT 4.34G 13.7G 31K legacy
rpool/ROOT/s10s_u10wos_17b 4.34G 13.7G 4.34G /
rpool/dump 1.00G 13.7G 1.00G -
rpool/export 63K 13.7G 32K /export
rpool/export/home 31K 13.7G 31K /export/home
rpool/swap 528M 14.1G 114M -
tank 2.49G 2.40G 2.49G /tank
# df -h
Filesystem size used avail capacity Mounted on
rpool/ROOT/s10s_u10wos_17b
20G 4.3G 14G 25% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 399M 416K 399M 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
sharefs 0K 0K 0K 0% /etc/dfs/sharetab
/platform/SUNW,T5240/lib/libc_psr/libc_psr_hwcap2.so.1
18G 4.3G 14G 25% /platform/sun4v/lib/libc_psr.so.1
/platform/SUNW,T5240/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
18G 4.3G 14G 25% /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd 0K 0K 0K 0% /dev/fd
swap 399M 32K 399M 1% /tmp
swap 399M 48K 399M 1% /var/run
rpool/export 20G 32K 14G 1% /export
rpool/export/home 20G 31K 14G 1% /export/home
rpool 20G 106K 14G 1% /rpool
tank 4.9G 2.5G 2.4G 51% /tank
nfssrv:/export/iso
12T 1.8T 11T 15% /tank/challenger2
# du -sh /tank <-- Reports space used by spanning filesystem boundaries
1.8T /tank
# du -dsh /tank <-- Reports space used without spanning filesystem boundaries (local)
2.0G /tank
# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
rpool 19.9G 5.45G 14.4G 27% ONLINE -
tank 4.97G 2.49G 2.48G 50% ONLINE -
# zfs list -r
NAME USED AVAIL REFER MOUNTPOINT
rpool 5.86G 13.7G 106K /rpool
rpool/ROOT 4.34G 13.7G 31K legacy
rpool/ROOT/s10s_u10wos_17b 4.34G 13.7G 4.34G /
rpool/dump 1.00G 13.7G 1.00G -
rpool/export 63K 13.7G 32K /export
rpool/export/home 31K 13.7G 31K /export/home
rpool/swap 528M 14.1G 114M -
tank 2.49G 2.40G 2.49G /tank
# df -h
Filesystem size used avail capacity Mounted on
rpool/ROOT/s10s_u10wos_17b
20G 4.3G 14G 25% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 399M 416K 399M 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
sharefs 0K 0K 0K 0% /etc/dfs/sharetab
/platform/SUNW,T5240/lib/libc_psr/libc_psr_hwcap2.so.1
18G 4.3G 14G 25% /platform/sun4v/lib/libc_psr.so.1
/platform/SUNW,T5240/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
18G 4.3G 14G 25% /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd 0K 0K 0K 0% /dev/fd
swap 399M 32K 399M 1% /tmp
swap 399M 48K 399M 1% /var/run
rpool/export 20G 32K 14G 1% /export
rpool/export/home 20G 31K 14G 1% /export/home
rpool 20G 106K 14G 1% /rpool
tank 4.9G 2.5G 2.4G 51% /tank
nfssrv:/export/iso
12T 1.8T 11T 15% /tank/challenger2
# du -sh /tank <-- Reports space used by spanning filesystem boundaries
1.8T /tank
# du -dsh /tank <-- Reports space used without spanning filesystem boundaries (local)
2.0G /tank
Example #6: Using zdb(1M) to look for large files and directories
A method to look for files or objects consuming space that cannot be seen from df, du, zfs, or zpool would be to use zdb(1M) with verbose options to dump out all the objects on a given pool or dataset. The size for each object can then be summed to find out where the space is being consumed.
Usage: zdb -ddddd pool|dataset > outputfile
Using the same 'tank' pool as used in example #6 there is only one dataset, so dump out all the objects. Note: For a pool with lots of files, this can take a long time and will result in a large output file.
# zdb -ddddd tank > zdb-dddd_tank.out
The output file can be processed and the size of each object counted to see the total. This number should match the 'zfs list' ALLOC number
# awk '{if($1 == "Object") {getline;print $5}}' zdb-dddd_tank.out |sed -e 's/T/ 1099511627776/g' -e 's/G/ 1073741824/g' -e 's/M/ 1048576/g' -e 's/K/ 1024/g'|nawk '{if(NF == 1) { s+=$1 } else {s+=($1*$2)} } END {print s/1073741824"Gb"}'
2.48853Gb
2.48853Gb
Scan the same output file but look for 'ZFS plainfiles' only. Large differences would indicate ZFS is using space for internal objects. This is less likely unless the pool has millions of files.
# grep "ZFS plain file" zdb-dddd_tank.out |awk '{print $5}'|sed -e 's/T/ 1099511627776/g' -e 's/G/ 1073741824/g' -e 's/M/ 1048576/g' -e 's/K/ 1024/g'|nawk '{if(NF == 1) { s+=$1 } else {s+=($1*$2)} } END {print s/1073741824"Gb"}'
2.48828Gb
2.48828Gb
An example of a file and directory object from the zdb-dddd_tank.out file is shown below. From this information the filename can be extracted and reviewed manually or scripts written to parse the data further.
=== zdb-dddd_tank.out ===
[...snip...]
Object lvl iblk dblk dsize lsize %full type
7 1 16K 512 1K 512 100.00 ZFS directory
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /
uid 0
gid 0
atime Thu Jan 10 18:58:19 2013
mtime Thu Jan 10 18:58:19 2013
ctime Thu Jan 10 18:58:19 2013
crtime Thu Jan 10 18:58:19 2013
gen 4
mode 40555
size 2
parent 7
links 2
pflags 144
microzap: 512 bytes, 0 entries
[...snip...]
[...snip...]
Object lvl iblk dblk dsize lsize %full type
8 3 16K 128K 2.00G 2G 100.00 ZFS plain file
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 16383
path /2GBFile
uid 0
gid 0
atime Thu Jan 10 18:58:49 2013
mtime Thu Jan 10 19:00:09 2013
ctime Thu Jan 10 19:00:09 2013
crtime Thu Jan 10 18:58:49 2013
gen 5
mode 101600
size 2147483648
parent 4
links 1
pflags 4
[...snip...]
=== END ===
[...snip...]
Object lvl iblk dblk dsize lsize %full type
7 1 16K 512 1K 512 100.00 ZFS directory
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /
uid 0
gid 0
atime Thu Jan 10 18:58:19 2013
mtime Thu Jan 10 18:58:19 2013
ctime Thu Jan 10 18:58:19 2013
crtime Thu Jan 10 18:58:19 2013
gen 4
mode 40555
size 2
parent 7
links 2
pflags 144
microzap: 512 bytes, 0 entries
[...snip...]
[...snip...]
Object lvl iblk dblk dsize lsize %full type
8 3 16K 128K 2.00G 2G 100.00 ZFS plain file
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 16383
path /2GBFile
uid 0
gid 0
atime Thu Jan 10 18:58:49 2013
mtime Thu Jan 10 19:00:09 2013
ctime Thu Jan 10 19:00:09 2013
crtime Thu Jan 10 18:58:49 2013
gen 5
mode 101600
size 2147483648
parent 4
links 1
pflags 4
[...snip...]
=== END ===
A nawk script to process the 'zdb -dddd' output can be downloaded here.
$ ./zdb_parser.nawk ROOT_s10x_u9wos_14a > ROOT_s10x_u9wos_14a.parsed
The script creates a file with two tab separated columns; "FILE/OBJECT SIZE (Bytes)", eg:
$ head -5 ROOT_s10x_u9wos_14a.parsed
FILE SIZE
/var/sadm/install/.lockfile 128
/var/sadm/install/.pkg.lock.client 0
/var/sadm/install/.pkg.lock 0
/var/sadm/install/.door 0
/var/sadm/pkg/SUNWkvm/pkginfo 3446
FILE SIZE
/var/sadm/install/.lockfile 128
/var/sadm/install/.pkg.lock.client 0
/var/sadm/install/.pkg.lock 0
/var/sadm/install/.door 0
/var/sadm/pkg/SUNWkvm/pkginfo 3446
Sort the parsed file to get the files in size order, largest files at the bottom:
$ sort -nk 2,2 ROOT_s10x_u9wos_14a.parsed > ROOT_s10x_u9wos_14a.parsed.sorted
In this example a 35GB file/object is shown. The reason we see the object number instead of a filename is because the file has been deleted from the filesystem but is still know to ZFS
$ tail ROOT_s10x_u9wos_14a.parsed.sorted
/var/adm/sa/sa24 111347584
/var/adm/sa/sa16 192024576
/var/adm/sa/sa17 192024576
/var/adm/sa/sa18 192024576
/var/adm/sa/sa19 192024576
/var/adm/sa/sa20 192024576
/var/adm/sa/sa21 192024576
/var/adm/sa/sa22 192024576
/var/adm/sa/sa23 192024576
???<object#420528> 34969691083 <-- ~35GB
/var/adm/sa/sa24 111347584
/var/adm/sa/sa16 192024576
/var/adm/sa/sa17 192024576
/var/adm/sa/sa18 192024576
/var/adm/sa/sa19 192024576
/var/adm/sa/sa20 192024576
/var/adm/sa/sa21 192024576
/var/adm/sa/sa22 192024576
/var/adm/sa/sa23 192024576
???<object#420528> 34969691083 <-- ~35GB
The object can be found in the original 'ROOT_s10x_u9wos_14a' file
Object lvl iblk dblk dsize lsize %full type
420528 4 16K 128K 32.6G 32.6G 100.00 ZFS plain file
264 bonus ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 266797
path ???<object#420528>
uid 0
gid 0
atime Wed Mar 18 21:21:17 2015
mtime Fri Apr 24 13:50:14 2015
ctime Fri Apr 24 13:50:14 2015
crtime Wed Mar 18 21:21:17 2015
gen 4325862
mode 100600
size 34969691083
parent 1966 <-------- [2]
links 0
xattr 0
rdev 0x0000000000000000
Indirect blocks:
0 L3 0:c0030bc00:600 4000L/600P F=266798 B=4432145/4432145
0 L2 0:a63c72e00:1e00 4000L/1e00P F=16384 B=4334434/4334434
0 L1 0:e2c7d0800:1a00 4000L/1a00P F=128 B=4326086/4326086
0 L0 0:e2b2d1200:20000 20000L/20000P F=1 B=4325864/4325864
20000 L0 0:e2bcf3c00:20000 20000L/20000P F=1 B=4325865/4325865
40000 L0 0:e2c6a4c00:20000 20000L/20000P F=1 B=4325867/4325867
60000 L0 0:e2d8ae000:20000 20000L/20000P F=1 B=4325870/4325870
[...snip...]
420528 4 16K 128K 32.6G 32.6G 100.00 ZFS plain file
264 bonus ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 266797
path ???<object#420528>
uid 0
gid 0
atime Wed Mar 18 21:21:17 2015
mtime Fri Apr 24 13:50:14 2015
ctime Fri Apr 24 13:50:14 2015
crtime Wed Mar 18 21:21:17 2015
gen 4325862
mode 100600
size 34969691083
parent 1966 <-------- [2]
links 0
xattr 0
rdev 0x0000000000000000
Indirect blocks:
0 L3 0:c0030bc00:600 4000L/600P F=266798 B=4432145/4432145
0 L2 0:a63c72e00:1e00 4000L/1e00P F=16384 B=4334434/4334434
0 L1 0:e2c7d0800:1a00 4000L/1a00P F=128 B=4326086/4326086
0 L0 0:e2b2d1200:20000 20000L/20000P F=1 B=4325864/4325864
20000 L0 0:e2bcf3c00:20000 20000L/20000P F=1 B=4325865/4325865
40000 L0 0:e2c6a4c00:20000 20000L/20000P F=1 B=4325867/4325867
60000 L0 0:e2d8ae000:20000 20000L/20000P F=1 B=4325870/4325870
[...snip...]
This object is indeed found in the 'ZFS delete queue':
Object lvl iblk dblk dsize lsize %full type
2 1 16K 26.0K 3.00K 26.0K 100.00 ZFS delete queue
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
microzap: 26624 bytes, 13 entries
5ed80 = 388480
5edc0 = 388544
5edbc = 388540
5ed83 = 388483
5ed84 = 388484
61878 = 399480
5ed7f = 388479
5ed81 = 388481
5ed85 = 388485
69c2c = 433196
5ed7e = 388478
5ed82 = 388482
66ab0 = 420528 <-- This one
2 1 16K 26.0K 3.00K 26.0K 100.00 ZFS delete queue
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
microzap: 26624 bytes, 13 entries
5ed80 = 388480
5edc0 = 388544
5edbc = 388540
5ed83 = 388483
5ed84 = 388484
61878 = 399480
5ed7f = 388479
5ed81 = 388481
5ed85 = 388485
69c2c = 433196
5ed7e = 388478
5ed82 = 388482
66ab0 = 420528 <-- This one
The delete queue will be processed automatically in the background, though this can take an undetermined amount of time.
To force the delete queue to be processed, the filesystem containing the obejcts to be deleted need to be unmounted and mounted.
As the full path to this file is unknown, we turn to the output of the object. [2] shows the parent object for the 35GB file which is this object:
Object lvl iblk dblk dsize lsize %full type
1966 1 16K 512 1K 512 100.00 ZFS directory
264 bonus ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /var/adm/exacct
[...snip...]
1966 1 16K 512 1K 512 100.00 ZFS directory
264 bonus ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /var/adm/exacct
[...snip...]
This is a directory on / so cannot be unmounted. A system reboot is the only way to force the processing of this dataset's delete queue. Alternatively wait for ZFS to reap the data in its own time.
Frequently Asked Questions - FAQ
Why don't du and df report correct values for ZFS space usage?
On UFS, the du command reports the size of the data blocks within the file. On ZFS, du reports the actual size of the file as stored on disk. This size includes metadata as well as compression. This reporting really helps answer the question of "how much more space will I get if I remove this file?" So, even when compression is off, you will still see different results between ZFS and UFS.
The GNU version of du has an option to omit this overhead from the calculation, --apparent-size. This can be useful if the files will be copied to a non-ZFS filesystem that does not have the same overhead, e.g. copying to HSFS on a CD, so that the total size of the files within a directory structure can be determined. Below is an example for a directory containing a number of files on ZFS - standard du with the overhead included and GNU du without. GNU du is included with Solaris 11.
# du -sh /var/tmp
20M /var/tmp
# /usr/gnu/bin/du -sh --apparent-size /var/tmp
19M /var/tmp
20M /var/tmp
# /usr/gnu/bin/du -sh --apparent-size /var/tmp
19M /var/tmp
When you compare the space consumption that is reported by the df command with the zfs list command, consider that df is reporting the pool size and not just filesystem sizes. In addition, df doesn't understand descendent datasets or whether snapshots exist. If any ZFS properties, such as compression and quotas are set on filesystems, reconciling the space consumption that is reported by df might be difficult.
Consider the following scenarios that might also impact reported space consumption:
For files that are larger than recordsize, the last block of the file is generally about 1/2 full. With the default recordsize set to 128KB, approximately 64KB is wasted per file, which might be a large impact. You can work around this by enabling compression. Even if your data is already compressed, the unused portion of the last block will be zero-filled and thus compresses very well.
On a RAIDZ-2 pool, every block consumes at least 2 sectors (512-byte chunks) of parity information. The space consumed by the parity information is not reported and because it can vary and be a much larger percentage for small blocks, an impact to space reporting might be seen. The impact is more extreme for a recordsize set to 512 bytes, where each 512-byte logical block consumes 1.5KB (3 times the space). Regardless of the data being stored, if space efficiency is your primary concern, you should leave the recordsize at the default (128 KB), and enable compression.
Why doesn't ZFS space used as reported by zpool list and zfs list match?
The SIZE value that is reported by the zpool list command is generally the amount of physical disk space in the pool, but varies depending on the pool's redundancy level. The zfs list command lists the usable space that is available to filesystems, which is the disk space minus ZFS pool redundancy metadata overhead, if any. A non-redundant storage pool created with one 136GB disk reports SIZE and initial FREE values as 136GB. The initial AVAIL space reported by the zfs list command is 134GB, due to a small amount pool metadata overhead.
# zpool create tank c0t6d0
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 136G 95.5K 136G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 72K 134G 21K /tank
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 136G 95.5K 136G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 72K 134G 21K /tank
A mirrored storage pool created with two 136GB disks reports SIZE as 136GB and initial FREE values as 136GB. This reporting is referred to as the deflated space value. The initial AVAIL space reported by the zfs list command is 134GB, due to a small amount of pool metadata overhead.
# zpool create tank mirror c0t6d0 c0t7d0
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 136G 95.5K 136G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 72K 134G 21K /tank
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 136G 95.5K 136G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 72K 134G 21K /tank
A RAIDZ-2 storage pool created with three 136GB disks reports SIZE as 408GB and initial FREE values as 408GB. This reporting is referred to as the inflated disk space value, which includes redundancy overhead, such as parity information. The initial AVAIL space reported by the zfs list command is 133GB, due to the pool redundancy overhead.
# zpool create tank raidz2 c0t6d0 c0t7d0 c0t8d0
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 408G 286K 408G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 73.2K 133G 20.9K /tank
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 408G 286K 408G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 73.2K 133G 20.9K /tank
Another major reason of different used space reported by zpool list and zfs list is refreservation by a zvol. Once a zvol is created, zfs list immediately reports the zvol size (and metadata) as USED. But zpool list does not report the zvol size as USED until the zvol is actually used.
# zpool create tank c0t0d0
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 278G 98.5K 278G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 85K 274G 31K /tank
# zfs create -V 128G tank/vol
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 278G 174K 278G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 132G 142G 31K /tank
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 278G 98.5K 278G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 85K 274G 31K /tank
# zfs create -V 128G tank/vol
# zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tank 278G 174K 278G 0% 1.00x ONLINE -
# zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 132G 142G 31K /tank
NOTE: In some rare cases, space reported may not add up. In such situations, space may be hidden in the ZFS delete queue or there may be an unlinked file still consuming space. Normally, rebooting the server should clear the problem. You can find out if there is some object listed in ZFS DELETE queue by running:
# zdb -dddd <zpool>/<dataset> 1 | grep DELETE
Example:
# zdb -dddd rpool/ROOT/zfsroot/var 1 | grep DELETE
Why creating a snapshot immediately reduces the available pool size
If a snapshot is taken for a dataset (filesystem or zvol) that refreservation is set, the pool usage immediately increases depending on the size of usedbydataset even though the snapshot itself does not allocate any space.
For example, here is a dataset created with refreservation=10G and 3G is actually used.
For example, here is a dataset created with refreservation=10G and 3G is actually used.
# zfs list -o space -t all -r tank
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
tank 264G 10.0G 0 32K 0 10.0G
tank/testfs 271G 10G 0 3.00G 7.00G 0
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
tank 264G 10.0G 0 32K 0 10.0G
tank/testfs 271G 10G 0 3.00G 7.00G 0
Since refreservation is set to 10G on this dataset, this dataset has 10G-3G=7G usedbyrefreservation.
If a snapshot is taken, the used space increases by 3G as blow.
# zfs snapshot tank/testfs@snap1
# zfs list -o space -t all -r tank
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
tank 261G 13.0G 0 32K 0 13.0G
tank/testfs 271G 13.0G 0 3.00G 10G 0
tank/testfs@snap1 - 0 - - - -
# zfs list -o space -t all -r tank
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
tank 261G 13.0G 0 32K 0 13.0G
tank/testfs 271G 13.0G 0 3.00G 10G 0
tank/testfs@snap1 - 0 - - - -
The dataset uses 3G and this 3G contents are now reserved by snapshot. So the 3G data cannot be freed while the snapshot is available.
At the same time, since the dataset has refreserved 10G, it's guaranteed that the dataset can use 10G anytime.
Hence the zpool must add another 3G refreservation to the dataset so that the dataset can completely chanage the 3G contents in addition to 7G reserved space.
Therefore, the USED by tank/testfs immediately increases from 10G to 13G when a snapshot is taken although the snapshot itself does not allocate any space.
In addition, swap or dump device has a special feature called preallocation.
When the swap or dump device is configured with swap(1M) or dumpadm(1M) command, all blocks are allocated in advance.
Since the full size of zvol is used by dataset due to this feature, taking a snapshot of a swap or dump device will increase the used space by the size of zvol immediately.
No comments:
Post a Comment