CRASH RECOVERY | ARIES, CONDITIONS, PROCEDURES

Crash recovery is the process by which the database is moved back to a consistent and usable state. This is done by rolling back incomplete transactions and completing committed transactions that were still in memory when the crash occurred. When the database is in a consistent and usable state, it has attained what is known as a point of consistency. Following a transaction failure, the database must be recovered.

Conditions that can result in transaction failure

A power failure on the machine causing the database manager and the database partitions on it to go down.
A hardware failure such as memory corruption, or disk, CPU, or network failure.
A serious operating system error that causes the DB to go down

Introduction to ARIES (Algorithms for Recovery and Isolation Exploiting Semantics)

ARIES is recovery algorithm designed to work with no-force, steal database approach. It is used by IBM DB2, MS SQL Server and many other database systems.

The three main principles that lie behind ARIES recovery algorithm:

Write Ahead Logging: Any change to an object is first recorded in the log, and then the log must be written to a stable storage before changes to the object are written to a disk.
Repeating History during Redo: On restart, after a crash, ARIES retraces the actions of a database before the crash and brings the system back to the exact state that it was in before the crash. The n it undoes the transaction still active at crash time.
Logging Changes during Undo: Change made to the database while undoing transactions are logged to ensure such an action isn’t repeated in the event of repeated restarts.

Recovery Procedure after Crash

The recovery works in three phases.

Analysis Phase: The first phase, analysis, computes all the necessary information from the log file.
REDO Phase: The Redo phase restores the database to the exact state at the crash, including all the changes of uncommitted transactions that were running at that point time.
UNDO Phase: The undo phase then undoes all uncommitted changes, leaving the database in a consistent state. After the redo phase the database reflects the exact state at the crash. However, the changes of uncommitted transactions have to be undone to restore the database to a consistent state.

Other Recovery Related to data Structure

The Write-Ahead Log Protocol: Write Ahead Logging (WAL) is family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. In a system using WAL, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log. WAL allows updates of a database to be done in one place.

Atomicity: This is the property of transaction processing whereby either all the operations of a transactions are executed or none of them are executed (all-or-nothing)

Durability: This is the ACID property which guarantees that transactions that have committed will survive permanently.

Log: A transaction log (also transaction journal, database log, binary log or audit trail) is a history of actions executed by a database management system to guarantee ACID properties over crashes or hardware failure. Physically, a log is a file of updates done to the database, stored in stable storage.

Check Pointing: Check pointing is basically consists of storing a snapshot of the current application state, and later on, use it for restarting the execution in case of failure. A check point record is written into the log periodically at that point when the system writes out to the database on disk all DBMS buffers that have been modified. This is a periodic operation that can reduce the time for recovery from a crash.

Check points are used to make recovery more efficient and to control the reuse of primary and secondary log files. In the case of crash, backup files will be used to recover the database to the point of crash.

Media Recovery: Media recovery deals with failure of the storage media holding the permanent database, in particular disk failures. The traditional database approach for media recovery uses archive copies (dumps) of the database as well as archive logs. Archive copies represent snapshots of the database and are periodically taken.

The archive log contains the log records for all committed changes which are not yet reflected in the archive copy. In the event of a media failure, the current database can be reconstructed by using the latest archive copy and redoing all changes in chronological order from the archive log.

A faster recovery from disk failures is supported by disk organizations like RAID (redundant arrays of independent disks) which store data redundantly on several disks. However, they do not eliminate the need for archive based media recovery since they cannot completely rule out the possibility of data loss, e.g when multiple disk fail.

THEORY

Discuss the concept of ARIES in crash recovery
Explain the difference between media recovery and check point.
Explain the difference between a system crash and a media failure.

Conditions that can result in transaction failure

Introduction to ARIES (Algorithms for Recovery and Isolation Exploiting Semantics)

Recovery Procedure after Crash

Other Recovery Related to data Structure

THEORY

Leave a Comment Cancel Reply