Getting the actual size of a sparse file

Hi,

How do you get the true size of a sparse file? Using /var/log/lastlog
on Ubuntu as an example I see this with “ls -lh”

287K lastlog

With “ls -sh” I see this:

40K lastlog

A File.stat call reveals this:

#<File::Stat
dev=0x801,
ino=5249695,
mode=0100664 (file rw-rw-r–),
nlink=1,
uid=0 (root),
gid=43 (utmp),
rdev=0x0 (0, 0),
size=292876,
blksize=4096,
blocks=80,
atime=Mon Jan 03 16:03:24 -0700 2011 (1294095804),
mtime=Thu Oct 21 11:34:51 -0600 2010 (1287682491),
ctime=Thu Oct 21 11:34:51 -0600 2010 (1287682491)>

Multiplying blocks * blksize doesn’t seem to match up, either.

How do I arrive at 40k?

Also, how would one go about detecting a sparse file?

Regards,

Dan

I would go find the source for Ubuntu’s ls and see what does it do for
the -s option.

Note that -s is output in blocks.

On Jan 5, 2011, at 08:37 , Daniel B. wrote:

Hi,

How do you get the true size of a sparse file? Using /var/log/lastlog
on Ubuntu as an example I see this with “ls -lh”

If you’re on ubuntu you should be able to provide extra options to du:

On Jan 5, 10:04am, Ryan D. [email protected] wrote:

On Jan 5, 2011, at 08:37 , Daniel B. wrote:

Hi,

How do you get the true size of a sparse file? Using /var/log/lastlog
on Ubuntu as an example I see this with “ls -lh”

If you’re on ubuntu you should be able to provide extra options to du:

Sparse file - Wikipedia

True, but I’d like to use pure Ruby (not system calls) if possible, at
least for *nix systems. Or will this require an extension?

Regards,

Dan

On 05/01/11 17:27, Daniel B. wrote:

If you’re on ubuntu you should be able to provide extra options to du:

Sparse file - Wikipedia

True, but I’d like to use pure Ruby (not system calls) if possible, at
least for *nix systems. Or will this require an extension?

dd if=/dev/zero bs=1 seek=100M count=0 of=out2

irb(main):011:0> stat=File.stat(“out2”)
=> #<File::Stat dev=0xfc0c, ino=160, mode=0100644, nlink=1, uid=1006,
gid=1006, rdev=0x0, size=104857600, blksize=4096, blocks=0, atime=Wed
Jan 05 18:02:08 +0000 2011, mtime=Wed Jan 05 18:02:08 +0000 2011,
ctime=Wed Jan 05 18:02:08 +0000 2011>

irb(main):013:0> [stat.blocks*stat.blksize, stat.size]
=> [0, 104857600]

Gives you allocated size & filesystem size.

On Wed, Jan 5, 2011 at 5:37 PM, Daniel B. [email protected]
wrote:

size=292876,
blksize=4096,
blocks=80,
atime=Mon Jan 03 16:03:24 -0700 2011 (1294095804),
mtime=Thu Oct 21 11:34:51 -0600 2010 (1287682491),
ctime=Thu Oct 21 11:34:51 -0600 2010 (1287682491)>

Multiplying blocks * blksize doesn’t seem to match up, either.

See stat(2):

   The st_blocks field indicates the number of  blocks  allocated 

to the
file, 512-byte units. (This may be smaller than st_size/512
when the
file has holes.)

   The st_blksize field gives the "preferred" blocksize for 

efficient file
system I/O. (Writing to a file in smaller chunks may cause an
ineffi-
cient read-modify-rewrite.)

So “blksize” has nothing to do with the size of the “blocks”. They are
always counted in 512-byte units.

/Johan H.

On Jan 5, 9:58am, Perry S. [email protected] wrote:

I would go find the source for Ubuntu’s ls and see what does it do for
the -s option.

Note that -s is output in blocks.

Yeah, looks like ls -s defaults to a block size of 1.

Hm, how does this look?

class File
def self.sparse?(file)
stats = File.stat(file)
stats.size > stats.blocks * stats.blksize
end
end

On Jan 5, 12:39pm, Johan H. [email protected] wrote:

40K lastlog
rdev=0x0 (0, 0),

The st_blocks field indicates the number of blocks allocated to the
file, 512-byte units. (This may be smaller than st_size/512 when the
file has holes.)

The st_blksize field gives the “preferred” blocksize for efficient file
system I/O. (Writing to a file in smaller chunks may cause an ineffi-
cient read-modify-rewrite.)

So “blksize” has nothing to do with the size of the “blocks”. They are
always counted in 512-byte units.

Oh, wow, I don’t think I knew that. It’s strikes me as particularly
bizarre that they would return some notion of a “preferred block size”
instead of the actual block size. Seriously, what’s the use of that?

Now I need to check other platforms (Solaris, HP-UX) to see if they
use 512 byte convention.

Is this something that’s universal? Or is it something I can get via a
C call somewhere?

Regards,

Dan

On Jan 6, 2011, at 3:44 PM, Daniel B. wrote:

Oh, wow, I don’t think I knew that. It’s strikes me as particularly
bizarre that they would return some notion of a “preferred block size”
instead of the actual block size. Seriously, what’s the use of that?

Now I need to check other platforms (Solaris, HP-UX) to see if they
use 512 byte convention.

Is this something that’s universal? Or is it something I can get via a
C call somewhere?

I think you’ll want to read up on the stat() system call. The POSIX
standard leaves a bit of wiggle room though since while it does specify
that st_blocks must be returned it doesn’t specify the size of the
blocks.

I’m not sure I understand your concern about ‘actual’ vs. ‘preferred’.
I’m guessing they would be the same in almost any rational
implementation
but the main reason for having the information is to perform I/O in
efficiently sized chunks. In that case, the ‘preferred’ block size
would
seem to be what you want even if the ‘actual’ block size was different.

Gary W.

On Thu, Jan 6, 2011 at 9:44 PM, Daniel B. [email protected]
wrote:

cient read-modify-rewrite.)

So “blksize” has nothing to do with the size of the “blocks”. They are
always counted in 512-byte units.

Oh, wow, I don’t think I knew that. It’s strikes me as particularly
bizarre that they would return some notion of a “preferred block size”
instead of the actual block size. Seriously, what’s the use of that?

I think the two fields “st_blocks” and “st_blksize” just happens to
use the same word (“block”) in two slightly different meanings. To
count the “st_blocks” in 512-byte units seem to be an arbitrary
convention, unrelated to the “physical block size” used for files.

Now I need to check other platforms (Solaris, HP-UX) to see if they
use 512 byte convention.

Is this something that’s universal? Or is it something I can get via a
C call somewhere?

I looked in “Advanced UNIX Programming, 2nd ed” by Rochkind, and there
the “st_blocks” field is described as the number of 512-byte blocks
allocated for a file. So I guess this is a universal thing for UN*X
(Linux, Mac OS X, Solaris, etc.).

The Rochkind book also mentions that “st_blksize is in the stat
structure so that an implementation can vary it by file if it chooses
to do so”.

Regards,
/Johan H.