OpenAFS Master Repository branch, master, updated. BP-openafs-stable-1_8_x-647-g4498bd8
Gerrit Code Review
gerrit@openafs.org
Fri, 24 Jul 2020 12:06:38 -0400
The following commit has been merged in the master branch:
commit 4498bd8179e5e93a33468be3c8e7a30e569d560a
Author: Andrew Deason <adeason@sinenomine.net>
Date: Mon Jun 22 22:54:52 2020 -0500
volser: Don't NUL-pad failed pread()s in dumps
Currently, the volserver SAFSVolDump RPC and the 'voldump' utility
handle short reads from pread() for vnode payloads by padding the
missing data with NUL bytes. That is, if we request 4k of data for our
pread() call, and we only get back 1k of data, we'll write 1k of data
to the volume dump stream followed by 3k of NUL bytes, and log
messages like this:
1 Volser: DumpFile: Error reading inode 1234 for vnode 5678
1 Volser: DumpFile: Null padding file: 3072 bytes at offset 40960
This can happen if we hit EOF on the underlying file sooner than
expected, or if the OS just responds with fewer bytes than requested
for any reason.
The same code path tries to do the same NUL-padding if pread() returns
an error (for example, EIO), padding the entire e.g. 4k block with
NULs. However, in this case, the "padding" code often doesn't work as
intended, because we compare 'n' (set to -1) with 'howMany' (set to 4k
in this example), like so:
if (n < howMany)
Here, 'n' is signed (ssize_t), and 'howMany' is unsigned (size_t), and
so compilers will promote 'n' to the unsigned type, causing this
conditional to fail when n is -1. As a result, all of the relevant log
messages are skipped, and the data in the dumpstream gets corrupted
(we skip a block of data, and our 'howFar' offset goes back by 1). So
this can result in rare silent data corruption in volume dumps, which
can occur during volume releases, moves, etc.
To fix all of this, remove this bizarre NUL-padding behavior in the
volserver. Instead:
- For actual errors from pread(), return an error, like we do for I/O
errors in most other code paths.
- For short reads, just write out the amount of data we actually read,
and keep going.
- For premature EOF, treat it like a pread() error, but log a slightly
different message.
For the 'voldump' utility, the padding behavior can make sense if a
user is trying to recover volume data offline in a disaster recovery
scenario. So for voldump, add a new switch (-pad-errors) to enable the
padding behavior, but change the default behavior to bail out on
errors.
Change-Id: Ibd6e76c5ea0dea95e3354d9b34536296f81b4f67
Reviewed-on: https://gerrit.openafs.org/14255
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
doc/man-pages/pod8/voldump.pod | 14 +++++++-
src/volser/dumpstuff.c | 60 +++++++++----------------------
src/volser/vol-dump.c | 78 ++++++++++++++++++++++-----------------
3 files changed, 74 insertions(+), 78 deletions(-)
--
OpenAFS Master Repository