info ddrescue


File: ddrescue.info,  Node: Algorithm,  Next: Output,  Prev: Important advice,  Up: Top

4 Algorithm
***********

GNU ddrescue is not a derivative of dd, nor is related to dd in any way
except in that both can be used for copying data from one device to another.
The key difference is that ddrescue uses a sophisticated algorithm to copy
data from failing drives causing them as little additional damage as
possible.

   Versions of ddrescue prior to 1.19 used a divide-and-conquer strategy to
rescue the difficult parts of the drive. But that caused a lot of head
movement, which is bad for the drive. Therefore, newer versions try to
minimize head movement to minimize drive damage.

   Ddrescue manages efficiently the status of the rescue in progress and
tries to rescue the good parts first, scheduling reads inside bad (or slow)
areas for later. This maximizes the amount of data that can be finally
recovered from a failing drive.

   The standard dd utility can be used to save data from a failing drive,
but it reads the data sequentially, which may wear out the drive without
rescuing anything if the errors are at the beginning of the drive.

   Other programs read the data sequentially but switch to small size reads
when they find errors. This is a bad idea because it means spending more
time at error areas, damaging the surface, the heads, and the drive
mechanics, instead of getting out of them as fast as possible. This behavior
reduces the chances of rescuing the remaining good data.

   The algorithm of ddrescue is divided in four phases: copying, trimming,
scraping, and retrying. Each phase is described below. Disc sectors are
marked successively as non-tried, non-trimmed, non-scraped, and bad-sector
until they are successfully read and marked as finished. The user may
interrupt the process at any point, but a bad drive can block ddrescue for a
long time until the kernel gives up.

   The amount of work remaining for a given phase can be calculated by
comparing the current size of the corresponding areas with their size at the
end of the previous pass. Namely the size of non-tried while copying, the
size of non-trimmed while trimming, the size of non-scraped while scraping,
and the size of bad-sector while retrying. *Note Output::.

   The steps of the algorithm are:

   1) Optionally read a mapfile describing the status of a multi-part or
previously interrupted rescue. If no mapfile is specified, or is empty, or
does not exist, mark all the rescue domain as non-tried.

   2) (First phase; Copying) Copying is done in up to five passes. The first
pass reads the non-tried parts of the input file, marking the failed blocks
as non-trimmed and skipping beyond them. The second pass runs in the
opposite direction as the first pass and delimits the blocks skipped by the
first pass. The first two passes also skip beyond slow areas. The areas
skipped are tried later in one or three additional passes (before trimming).
The copying direction is reversed after each pass until all the rescue
domain is tried.

   The third and fourth passes read the blocks skipped due to slow areas
(if any) by the first two passes, in the same direction that each block was
skipped. For each block, passes 2 to 4 skip the rest of the block after
finding the first error in the block. The last pass is a sweeping pass, with
skipping disabled. The purpose of the multiple passes is to delimit large
bad areas fast, recover the most promising areas first, keep the mapfile
small, and produce good starting points for trimming.

   Only non-tried areas are read in large blocks. Trimming, scraping, and
retrying are done sector by sector. Each sector is tried at most two times:
the first in the copying phase as part of a large block read, the second in
one of the phases below as a single sector read.

   3) (Second phase; Trimming) Trimming retries sector by sector the edges
of the large block reads failed during the copying phase. Trimming is done
in one pass as follows. For each non-trimmed block, read forwards one
sector at a time from the leading edge of the block until a bad sector is
found. Then read backwards one sector at a time from the trailing edge of
the block until a bad sector is found. Then mark the bad sectors found (if
any) as bad-sector, and mark the rest of the block as non-scraped without
trying to read it. If any edge is already adjacent to a bad sector, it is
considered as already trimmed and is not trimmed again.

   4) (Third phase; Scraping) Scrape together, sector by sector, the data
not recovered by the copying or trimming phases. Scraping is done in one
pass. Each non-scraped block is read forwards, one sector at a time. Any bad
sectors found are marked as bad-sector.

   5) (Fourth phase; Retrying) Optionally try to read again the bad sectors
until the number of retry passes specified is reached. The direction is
reversed after each pass. Every bad sector is tried only once in each pass.
Ddrescue can't know if a bad sector is unrecoverable or if it will be
eventually read after some retries.

   6) Optionally write a mapfile for later use.


   When ddrescue finishes the steps above, any areas marked as bad-sector
will remain untouched in the output file. If the output file is a regular
file created by ddrescue, the areas marked as bad-sector will contain
zeros. If it is a device or a previously existing file, the areas marked as
bad-sector will still contain the data previously present there.

   The mapfile is periodically saved to disc, as well as when ddrescue
finishes or is interrupted. A backup copy of the mapfile with the extension
'.bak' is also periodically created (if possible). So in case of a crash
you can resume the rescue with little recopying. The default interval
between saves varies from 30 seconds to 5 minutes depending on mapfile size
(larger mapfiles are saved at longer intervals), but may be overriden.
*Note --mapfile-interval::.

   The same mapfile can be used for multiple commands that copy different
areas of the input file, and for multiple recovery attempts over different
subsets. See this example:

Rescue the most important part of the disc first.
     ddrescue -i0 -s50MiB /dev/sdc hdimage mapfile
     ddrescue -i0 -s1MiB -d -r3 /dev/sdc hdimage mapfile

Then rescue some key disc areas.
     ddrescue -i30GiB -s10GiB /dev/sdc hdimage mapfile
     ddrescue -i230GiB -s5GiB /dev/sdc hdimage mapfile

Now rescue the rest (does not recopy what is already done).
     ddrescue /dev/sdc hdimage mapfile
     ddrescue -d -r3 /dev/sdc hdimage mapfile