This article explains the following.

* First we explain the different output of “btrfs fi df” & “btrfs fi show” & “df” and how they compare to each other.
* Explanation of btrfs fi df & allocation
btrfs fi usage (new command in kernel 3.18)
* Explanation of btrfs balancing and how it fixes bad allocation
* Example of BTRFS running out of freespace due to bad allocation (example of the enospc bug)
* Explanation of profiles (a.k.a. raid types)
* Information about Free Space requirement for COW filesystems is at the bottom of the article – please scroll to the bottom (this is not a lazy mans link to the bottom floor of the article)

NOTE ON ABOVE LINKS (unrelated to btrfs): after clicking on the link scroll up a little (a few lines), the links dont autoscroll to where I put the link anchors to (for some mystical reason, probably a theme bug)


BTRFS FI DF and BALANCE and DF

Note: after reading this, check out this visual – btrfs space readings

How to understand the output of BTRFS FI DF? And TOTAL and USED readings from it? Also the Single and DUP enteries, what are they all? – all those answers plus more.

Best way thru example:

Imagine you have a 16 TB volume you just constructed

# df
Would show Total(its called SIZE) 16 TB and used close 0 Kib

UPDATE 2015-06-11: When I say “SIZE” I really mean “Total Capacity”.

# btrfs fi df
Data, single: total=8.00MiB, used=0.00
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=1.00GiB, used=96.00KiB
Metadata, single: total=8.00MiB, used=0.00

NOTE: I always confused TOTAL in “btrfs fi df” as it never added up to 16 TB, so then I realized (with some help from very smart people) that its not the same TOTAL as in “df”. And also then I realized that the TOTAL that im used to in “df” is actually called “SIZE”. So SIZE is the total amount space on the device, free + used, TOTAL is allocated space as you will soon learn, and USED is used space from the allocated space – which is the same as just used space from any space.

What is DATA & METADATA & SYSTEM?

BTRFS has 3 datatypes that it keeps track of, any byte or block on the filesystem can only belong to 1 of the 3 types If they dont belong to one of those types then they are marked as UNALLOCATED. So technically each byte or block belong to 1 of 4 types (DATA, METADATA, SYSTEM or UNALLOCATED)

1) DATA: Your main data (this should be the largest once you start filling it up)
2) METADATA: Data about your Data, inodes, snapshot info, attributes, all that happy stuff
3) SYSTEM: What makes BTRFS run

4) Also there is another type called UNALLOCATED: this type of space can be taken for use for DATA or METADATA or SYSTEM when needed. No data is written here, when data is written here it will actually be part of another DATA TYPE.


Whats this BTRFS Total & ALLOCATED space & USED space?

TOTAL (as seen in btrfs fi df) is actually the same thing as ALLOCATED Space (so when you see TOTAL think of ALLOCATED). And USED is just space used by data (data that is taken up snapshots is counted towards this USED value). ALLOCATED space is simply space set aside by BTRFS for future writes – however & obviously ALLOCATED space also includes current data (which is USED space. so USED space is a subset of ALLOCATED space. meaning USED is less than or equal to ALLOCATED).

BTRFS can allocate space for 1 of the 3 chunk types DATA or for METADATA or for SYSTEM. So if you think about it USED space is a subset of ALLOCATED space (i just said that in the previous parenthesis).

BTRFS automatically tries to ALLOCATE space to its 3 data types, and sometimes it gets it wrong – so you might run out of space even though technically you have plenty space left (plenty of freespace left in the wrong data type). Unfortunatly BTRFS cant unallocate the data automatically (well maybe in future versions it can), but currently to fix space thats allocated incorrectly, one must “balance” the filesystem. I recommend scheduled light balances of the system to prevent issues from happening.

I just covered lots of the basic ground, now to go into more detail.

Wait what? Whats Allocated Space?

Think of “ALLOCATED Space” as “Freespace for Data Type” (DATA, SYSTEM or METADATA – this is set aside for future writes) + “Usedspace for the same Data type” (DATA, SYSTEM or METADATA – this is current data). So technically each DATATYPE has as much freespace as ALLOCATED minus USED (but that doesnt matter, as that can grow up and down with more ALLOCATED space). So in the above Case DATA currently has only 8 MIB that it can allocate to USED data (but guess what it can actually do more) as you start nearing 8 MiB or use, the TOTAL/ALLOCATED space will get taken from UNALLOCATED Space (But not from other DATA TYPES, if another DATA TYPE has more ALLOCATED SPACE then it needs, you can fix that with a BALANCE)

So currently BTRFS has ALLOCATED 8 MiB for DATA and 1 Gig for METADATA. As more writes happen more ALLOCATED space will get assigned from UNALLOCATED SPACE.

BTRFS cannot borrow/take/steal space from other data types for ALLOCATION. So if the DATATYPE is FULL (USED is close to TOTAL/ALLOCATED and there is no more space to take from UNALLOCATED SPACE) then

I notice that the DATA type is more freecaring at taking space from UNALLOCATED SPACE, and its ALLOCATED SPACE/TOTAL is usually 2x that of USED. So Example If I were to write 1 TB of DATA. DATA USED would go up to 1TB and ALLOCATION would prepare for more writes by going up to 2 TB. This is just a pattern I noticed, I might be wrong and this might be different in the future (this is all based on the BTRFS ALLOCATOR)

NOTE: ALLOCATED SPACE IN DATA TYPE minus USED SPACE IN DATA TYPE is ALLOCATED FREESPACE IN DATA TYPE. I talk about this later on.

Still dont get it?

Think of it this way when BTRFS makes the entire 16 TB filesystem, it DOES NOT say “oh hey this full 16 TB is for DATA”, because then where will METADATA and SYSTEM go? Instead it RESERVES (allocates) a certain amount of space for DATA and for METADATA and for SYSTEM. And it only takes space from the UNALLOCATED DATA area when it needs. When there is no more UNALLOCATED DATA left.
How much UNALLOCATED SPACE?
You can get all of the UNALLOCATED SPACE by totalling up all of the TOTALS and SUBTACTING from TOTAL VOLUME SIZE (where you get from DF)

 

New command: Btrfs Filesystem Usage

The new linux kernel 3.18 (and newer) comes with many new brtfs changes, including allowing new output for the “btrfs” command, specifically “brtfs filesystem usage” (which I will call “btrfs fi usage” – also you get another one called “btrfs device usage”). This new “fi usage” combines “fi show” and “fi df” into 1 command. Which is awesome because it was annoying to write both. Also it shows alot more meaningful data, and it actually shows you this idea of “Unallocated” data chunks that were there with the “system” & “metadata” & “data” chunks. You will also notice “Global Reserve” chunks, these chunks exist to help btrfs against the enospc issue (no space due to bad allocation). You can read more about them here.

For more information & the look & feel of brtfs filesystem usage, check out this link:

https://plus.google.com/106656581624455712024/posts/d4T4AXuXWHw


What does a balance do?
It takes in blocks of data, runs it thru allocator and puts them somewhere else. It can also run a conversion function. And convert from 1 raid type to another. However if your not using BTRFS raiding ability it can do other things, like rearrange data (UNALLOCATED and ALLOCATED space better).

BTRFS can’t steal space allocated for one purpose and use it for another purpose. The reason that a balance is helpful, is because it may join multiple mostly-empty chunks into a singe one, and then free the newly-emptied chunks so btrfs can then allocate them for another purpose.

Best way to run a balance?
Many things, one thing that it does is balance data between devices (so if one device has more data then another, it will balance that out.) However some filesystem only have 1 device in there filesystem, so what does balance do there? Well it could still help with the ALLOCATION situation (if there is no unallocated space left, and every datatype is close to used – example: Cant write a file because FS thinks its full because
If you just run
# btrfs balance start /data
And you have alot of data
This process might take for ever (days or weeks)
Instead run it on mostly empty chunks of data with the argument/filter option “-dusage=5”
It will only balance (or run thru btrfs-allocator) chunks that are empty or close to empty (0% to 5% full)
# btrfs balance start -dusage=5 /data
The command will hang until its done.
NOTE: if the operation finished with 0 chunks moved then you should raise the -dusage percentage up a notch until you get movement
In another shell you can run
# btrfs balance status /data
Or for more info
# btrfs balance status -v /data
You can also pause and cancel the operation
# btrfs balance pause /data
Note canceling a simple btrfs operation like above (where your not converting it from one raid type to another) is okay. Im not sure if its okay if your converting a raid type.
# btrfs balance cancel /data
You can save an operation for later as they are performance heavy
# btrfs balance pause /data
And resume it later
# btrfs balance resume /data

Rerunning the operation you will see you have less and less chunks as the BTRFS is combining chunks..


Whats DUP, SINGLE, RAID5, and RAID6?

These DUP, SINGLE, RAID5 words come from the PROFILE selection for the datatype:

When you create a BTRFS system it asks you for the profile/raid type for the DATA (with -d option) and for the METADATA (with -m option). METADATA profile selection also affects for SYSTEM.

-d, –data type: Specify how the data must be spanned across the devices specified. Valid values are raid0, raid1, raid5, raid6, raid10 or single.

-m, –metadata profile: Specify how metadata must be spanned across the devices specified. Valid values are raid0, raid1, raid5, raid6, raid10, single or dup. Single device will have dup set by default except in the case of SSDs which will default to single. This is because SSDs can remap blocks internally so duplicate blocks could end up in the same erase block which negates the benefits of doing metadata duplication.

What are these profiles (in quick definitions):
RAID5: 1 drive worth of protection with parity
RAID6: 2 drive worth of protection with parity
RAID10: Half drive worth of protection with Mirroring and Stripping (most IO)
RAID1: All but 1 Drive worth of Protection worth of Mirroring
RAID0: Kind of like SINGLE, but every device should be the same size, data is stripped.
SINGLE: Data is not stripped but its linearly concattenated from one device to another. (This is the default for DATA as it gives most space)
DUP: works only for METADATA (and thus SYSTEM as well) and it duplicates all of the metadata for sake of redundancy (most filesystems store copies of the superblock so BTRFS wanted to stick to that tradition by giving this option)

Lets look at some examples:

EX1: Here the DATA was made as SINGLE profile (so just a linear Filsystem without RAID), none raid, and METADATA (thus also SYSTEM) made with DUP – So the single enteries on DATA matter, but for METADATA and SINGLE they can be ignored and balanced out (optionally):

# btrfs fi df /DATA
Data, single: total=8.00MiB, used=0.00
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00 <—- These extra Single enteries can be ignored
Metadata, DUP: total=1.00GiB, used=96.00KiB
Metadata, single: total=8.00MiB, used=0.00 <—- These extra Single enteries can be ignored

In every EXAMPLE, The extra single enteries can be ignored as they come up normally from the process of making the FS. You can Balance them out with a special balance option/filter. I just saw that in a forum.

The single enteries only make sense if you selected SINGLE mode profile for the datatype. So for EX1 and for EX3 the SINGLE entry for DATA makes sense. However the SINGLE ENTRY on EX2 for DATA can be ignored. In each of these examples the SINGLE entrys can be ignored for SYSTEM and METADATA as these were not constructed with

EX2: Here DATA was made as RAID5 profile, and METADATA was made also with RAID5 profile. So all of the single enteries can be ignored and balanced out with proper command (but its optional):

# btrfs fi df /DATA
Data, single: total=8.00MiB, used=0.00
Data, RAID5: total=63.96GiB, used=33.29GiB
System, single: total=4.00MiB, used=0.00
System, RAID5: total=16.00MiB, used=16.00KiB
Metadata, single: total=8.00MiB, used=0.00
Metadata, RAID5: total=2.00GiB, used=1010.09MiB

EX3: Here the DATA was made with SINGLE profile, and METADATA was made with the default DUP (just like EX1):

# btrfs fi df /data
Data, single: total=8.00MiB, used=0.00
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=1.00GiB, used=96.00KiB
Metadata, single: total=8.00MiB, used=0.00

IN EX3: The DUP statements, mean that System and Metadata are actually stored 2 times for redundancy. So you would multiply by 2. So in reality System has allocated 16 MiB and used 32 KiB. Metadata allocated 2 GiB and used 192 KiB. DUP, duplicate means that the datatype is actually going to go two times on the device/filesystem.

So To summarize:
There are 3 data types (DATA & SYSTEM & METADATA). Each datatype can be made as a certain profile (like a certain raid type). Besides METADATA and SYSTEM (they both have to be the same profile at all times). Each datatype is allocated freespace that it can use, and data is written into those areas that its allocated. As a datatype needs more space it can borrow more space from unallocated space. When unallocated space goes to 0. The Balance can reallocated space correctly using the BTRFS ALLOCATOR.


Where is my freespace example – Example from Link: http://askubuntu.com/questions/170044/btrfs-and-missing-free-space

This is also called the ENOSPC issue: and its documented here pretty well

DF shows SIZE as the SIZE OF THE VOLUME. It shows USED as all of the USED values properly added up together from “btrfs fi df” (remember to multiply DUP by 2). Notice that Available space is saying 852 GB

$ df -ha
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 75G 60G 852M 99% /

Notice that the Space used is 60 G out of 75 G (from DF output) yet it thinks its 99% when in reality 60/75 is 80%. So where is that missing 15 G or 19% to 20% at? Well its allocated incorrectly to SYSTEM or DATA.

$ btrfs fi df /
Data: total=50.00GB, used=49.17GB (the difference here is 849 MB)
System: total=32.00MB, used=4.00KB
Metadata: total=24.50GB, used=9.86GB

Notice that to solve the issue we should just grab more UNALLOCATED SPACE and put it into the ALLOCATE/TOTAL space for DATA (thus giving it more free space). HOWEVER its not that simple in this case because there is no more UNALLOCATED SPACE left.

Looking at the totals of DATA and SYSTEM and METADATA, we get the total amount of ALLOCATION.
50 GB + 32 MB + 24.50 GB = close to 75 GB.
So we have allocated 75 GB of the volume. Thus there is no more UNALLOCATED space left.

So since we cant grab any UNALLOCATED SPACE how do we properly give more space to the DATA.
Notice that METADATA is using 9.86 GB but has allocated 24.50 GB (thats 24.5 – 9.8 = about 16 GB) – maybe BTRFS can use some of that extra unused 16 GB ( but not all of it, as writes also need METADATA not just DATA ) towards data.

Running a balance here would be ideal
# btrfs balance start /
Would fix this situation (since its only 75 G doing a balance without -dusage should only take a few mintues)

More info/links:
https://community.oracle.com/thread/2459838?tstart=0
http://comments.gmane.org/gmane.comp.file-systems.btrfs/34581
http://askubuntu.com/questions/170044/btrfs-and-missing-free-space
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space
https://btrfs.wiki.kernel.org/index.php/Balance_Filters
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
http://www.spinics.net/lists/linux-btrfs/msg32066.html
http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/

SUMMARY BETWEEN DF and BTRFS FI DF

I kept calling DATA, SYSTEM and METADATA (and UNALLOCATED – not too sure if it falls in the same category, but it doesnt hurt to include it into the same category) datatypes of btrfs, but in reality they are called chunktypes.

# df
* SIZE: Total size of volume (Unallocated Freespace + All Allocated Space)
* USED: All of the Used space (USED_PER_DATATYPE) from every data type added up (Count up DUP twice, as it only shows you the single value, its up to you to multiply by 2)
* AVAIL: AVAIL = SIZE – USED – FREE_ALLO_METADATA_AND_SYSTEM
— Note: dont forget to do *2 if DUP, so
–Note: dont forget to count all of the single enteries, as they do take up some insignificant space (its not so insignificant when coming close to percentages)
— MiniExample: How to count up FREE_ALLO_METADATA_AND_SYSTEM, here is the formula:
FREE_ALLO_METADATA_AND_SYSTEM = (DUP_META_ALLOC-DUP_META_USE)*2 + (DUP_SYS_ALLOC-DUP_SYS_USE)*2 + (SINGLE_META_ALLOC-SINGLE_META_USE) + (SINGLE_SYS_ALLOC-SINGLE_SYS_USE)
So then AVAIL = SIZE – USED – FREE_ALLO_METADATA_AND_SYSTEM
* USE%: (USED+FREE_ALLOCATED_SPACE_FROM_DATA)/SIZE <—- Didnt work on 1 calculation so not too sure on this one

# btrfs fi df
TOTAL_PER_DATATYPE: This is the Allocated Space for the data type (How much freespace it has currently reserved + the Used space) – if it needs more freespace it can grab it from Unallocated space
USED_PER_DATATYPE: This is the amount of data used from that allocated space (or equally same thought, the amount of data used from all the space)
FREE_ALLOCATED_SPACE_PER_DATATYPE: not showing here, but its TOTAL – USED
UNALLOCATED_SPACE: not shown here in btrfs fi df, but you can get it from ADDING up all of the TOTALS and SUBTRACTING that value from the SIZE in DF (Remember to count up DUPs 2 times)

NOTE: the way “df” sees the data is not really important if you take into account the BTRFS and NEXT-GEN filesystems Mentality

NOTE: all this math is probably wrong for RAID5/RAID6/RAID10 profile types for DATA or METADATA. Its easier to figure out when DATA and METADATA & SYSTEM are either SINGLE or DUP (note data cant be DUP but METADATA & SYSTEM can sure be SINGLE or DUP or any of the RAIDs – DATA can only be SINGLE or any of the RAIDs)

 

Free Space Needed for BTRFS & ZFS – COW FILESYSTEMS

The reason your not given such percise numbers with btrfs fi df, is because freespace shouldn’t matter as much (but dont worry – there are ways to get the more precise numbers). You should always have about 20% freespace (minimum) (or max be 80% used). COW filesystems live off freespace, so give it to them. If you need a LUN or SHARE thats 5 TB, then dont put it on a 5.1 TB volume, put it on a 6.2 TB volume. Always give BRTFS & ZFS free space (and for that matter – any COW filesystem).

Give Next Generation (Copy on Write – also known as – COW) Filesystems some breathing space of 20% of the volume, in other words always have at least 20% freespace, its a Good rule-of-thumb round number that works!!! – You can check out this article on why that is needed: http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance#freespace . Even though its a ZFS article, both ZFS and BTRFS are COW filesystems – therefore many of the “ways” mentioned on the site apply not only to ZFS but to BTRFS as well – The Freespace rule is one of the ones that apply to both.

Depending on what kind of write pattern your storage system receives, that number can go from 20% recommended freespace to 25% to 10% to 5% – Like I said & the denmark site says 20% freespace is just a good round rule of thumb number – that happens to work.

My opinion: Id like to note that all filesystems dont behave well once they are full or close to it, COW filesystems are just more sensitive to it because of the nature of COW. You never want to run a filesystem at 99% or 100%, I mean some can run at that, but you will be missing writes etc. Same with COW filesystems when you approach 80%, you are in reality approaching 100% due the nature of other writes also being on the disk that are not accounted for in the percentage you are looking at.

4 thoughts on “btrfs allocation & freespace & enospc bug & commands

  1. Hi!

    this is one of the most enlightening article on how BTRFS handles the space on disk that I’ve ever read!

    Thanks a lot for sharing such great stuff!

    1. Thank you.

      FYI as I understand newer BTRFS version ** do some tricks to chunk allocations which make it so that balances dont have to run as often. I’m assuming in the future, perhaps, the need for manual balancing will be gone and it will all be automatic and behind the scenes. Also bigger companies are starting to use BTRFS in their data centers and they are adding in many of their fixes and efforts into making BTRFS amazing.

      ** To those that are new to BTRFS: to get newer version of BTRFS one must update their kernel. BTRFS is updated into the kernel. The BTRFS-PROGS program which can be downloaded (such as via apt-get) is not the main “BTRFS” code but its the userspace tools that work on the BTRFS side of things (userspace tools such as asking to make the BTRFS filesystem on a block device, or mounting the fileystem via mount.btrfs – but then the kernel side actually does all of the writing and reading – the userspace tools can do some extra analyzing/modifications such as fixing the filesystem with btrfsck or getting more info via btrfs-show-super)

  2. I have used BTRFS with one disk and with DATA, METADATA & SYSTEM, all in DUP mode.

    Then i had saved a file with a ‘specific’ pattern to do one safety test.
    With BTRFS not mounted i did:
    1. Sesrch that pattern on the disk and find it twice, take note of physical sectors.
    2. Overwrite only one of thoose.
    3. Repeat the search and obviously find it once
    4. Mount the BTRFS and read the file with such pattern, it is correct.
    5. Unmount BTRFS
    6.- Repeat the search and found it twice, so BTRFS had recovered data

    Then i force the error, with BTRFS unmounted i overwrite both sectors, so now file is damaged on both copies, remount BTRFS and tried to read the file, BTRFS informs the file is corrupted and can not be recovered.

    So DUP profile can also be used on DATA… buti n your very good and excellent guide say it can’t… maybe because a bad ‘info’ on ‘help’.

    If you use convert -d DUP it works and after it will show DATA as DUP, not single, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *