l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2003 May 28 11:32

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] Linux Block Layer is Lame (it retries too much)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] Linux Block Layer is Lame (it retries too much)

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

  Last week I was having problems with the ide layer... basically it retries
way too many times.  I was trying to read 512 byte blocks from a dying
/dev/hda (using dd_rescue which calls pread), for each bad sector the
kernel would try 8 times, at time 4 and 8 it would reset the IDE bus
(turning off things like DMA mode), and for every other failed attempt it=
would seek to track 0...

  If that wasn't bad enough, for some reason the kernel was often trying
to fetch 8 sectors worth of information for a single sector read.  The=20
8 sector "chunk" being fetched was somehow related to the modulus of
the actual sector being requested, so if you had an 8 sector bad region...
  So if you requested any one of the 8 bad sectors from a chunk, each
of the 8 would have 8 read attempts made... 64 read attempts all will
fail before you can even move to the next sector, when you request the=20
next bad sector the process would begin again... even if you wanted the=20
last of the 8 sector chunk.

  Normally that is good... it's a best effort attempt to read a disk.
However I had hundreds of bad sectors on this drive and just wanted as
much of the filesystem as possible and didn't have days to wait.

  One bizarre thing is that this 8 sector chunk read didn't always happen,
it appears that if one of the 8 sectors was good the other 7 would only
be tried one batch of times.

  I found that linux/include/linux/ide.h has the following three defines:
 * Probably not wise to fiddle with these
#define ERROR_MAX       8       /* Max read/write errors per sector */
#define ERROR_RESET     3       /* Reset controller every 4th retry */
#define ERROR_RECAL     1       /* Recalibrate every 2nd retry */

  These settings are not configurable via /proc or sysctl... so
I changed them and recompiled such that only 3 attempts would be made=20
on any given block no resets or re-calibrations were done.  Still=20
reading *each* sector in a bad 8 chunk region was taking 100 seconds
(about 14 minutes to move to the next chunk of 8 sectors).

  Even better because the process is inside a system call, it is not
killable and so there is no practical way to speed up the process.

  I still do not know what is causing the 8 sector "chunk" to be read.
It seems that sys_pread calls mm/filemap.c: generic_file_read ->
do_generic_file_read, which seems like it might be expanding the request
size based on some read ahead parameters, it figures out what
"max_readahead" is by calling get_max_readahead on the inode.

  I tried most everything with hdparm and fiddled with sysctl=20
(vm/max-readahead and vm/min-readahead)... but there was no change in=20
behavior.  I tried obvious things like "hdparm -m 0 -a 0 -A 0 -m 0 -P 0",
I also tried 1's, all with no noticeable effect.

I want to be more ready next time...

- How does a 1 sector read is expanded to an 8 sector chunk?

- How this chunk reading behavior can be turned off=20
  (via command line or custom kernel patch)?

- Any other ideas on how to pull the disk blocks?

      Mike Simons

  Basically I spent 20 hours trying to read from a failing drive, I=20
got about half way through the drive before time was up.  Of the 30 million
sectors, I read only about 500 were bad.  It is likely that the NTFS=20
filesystem on the drive would have been recoverable, if the pull had=20
finished... because I could read only mount the filesystem from the=20
drive itself, but what I have of the image has an error at mount time.

  I was using a custom knoppix boot floppy and a standard knoppix CD to=20
boot a laptop with the bad drive, NFS mounting a local machine, where I=20
was dd_rescue sending the blocks that could be read.=20

GPG key: http://simons-clan.com/~msimons/gpg/msimons.asc
Fingerprint: 524D A726 77CB 62C9 4D56  8109 E10C 249F B7FA ACBE

Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org


vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Sunset Systems
Who graciously hosts our website & mailing lists!