8.4. Red Hat Enterprise Linux-Specific Information

There is little about the general topic of disasters and disaster recovery that has a direct bearing on any specific operating system. After all, the computers in a flooded data center will be inoperative whether they run Red Hat Enterprise Linux or some other operating system. However, there are parts of Red Hat Enterprise Linux that relate to certain specific aspects of disaster recovery; these are discussed in this section.

8.4.1. Software Support

As a software vendor, Red Hat does have a number of support offerings for its products, including Red Hat Enterprise Linux. You are using the most basic support tool right now by reading this manual. Documentation for Red Hat Enterprise Linux is available on the Red Hat Enterprise Linux Documentation CD (which can also be installed on your system for fast access), in printed form, and on the Red Hat website at http://www.redhat.com/docs/.

Self support options are available via the many mailing lists hosted by Red Hat (available at https://www.redhat.com/mailman/listinfo). These mailing lists take advantage of the combined knowledge of Red Hat's user community; in addition, many lists are monitored by Red Hat personnel, who contribute as time permits. Other resources are available from Red Hat's main support page at http://www.redhat.com/apps/support/.

More comprehensive support options exist; information on them can be found on the Red Hat website.

8.4.2. Backup Technologies

Red Hat Enterprise Linux comes with several different programs for backing up and restoring data. By themselves, these utility programs do not constitute a complete backup solution. However, they can be used as the nucleus of such a solution.

NoteNote
 

As noted in Section 8.2.6.1 Restoring From Bare Metal, most computers based on the standard PC architecture do not possess the necessary functionality to boot directly from a backup tape. Consequently, Red Hat Enterprise Linux is not capable of performing a tape boot when running on such hardware.

However, it is also possible to use your Red Hat Enterprise Linux CD-ROM as a system recovery environment; for more information see the chapter on basic system recovery in the Red Hat Enterprise Linux System Administration Guide.

8.4.2.1. tar

The tar utility is well known among UNIX system administrators. It is the archiving method of choice for sharing ad-hoc bits of source code and files between systems. The tar implementation included with Red Hat Enterprise Linux is GNU tar, one of the more feature-rich tar implementations.

Using tar, backing up the contents of a directory can be as simple as issuing a command similar to the following:

tar cf /mnt/backup/home-backup.tar /home/

This command creates an archive file called home-backup.tar in /mnt/backup/. The archive contains the contents of the /home/ directory.

The resulting archive file will be nearly as large as the data being backed up. Depending on the type of data being backed up, compressing the archive file can result in significant size reductions. The archive file can be compressed by adding a single option to the previous command:

tar czf /mnt/backup/home-backup.tar.gz /home/

The resulting home-backup.tar.gz archive file is now gzip compressed[1].

There are many other options to tar; to learn more about them, read the tar(1) man page.

8.4.2.2. cpio

The cpio utility is another traditional UNIX program. It is an excellent general-purpose program for moving data from one place to another and, as such, can serve well as a backup program.

The behavior of cpio is a bit different from tar. Unlike tar, cpio reads the names of the files it is to process via standard input. A common method of generating a list of files for cpio is to use programs such as find whose output is then piped to cpio:

find /home/ | cpio -o > /mnt/backup/home-backup.cpio

This command creates a cpio archive file (containing the everything in /home/) called home-backup.cpio and residing in the /mnt/backup/ directory.

TipTip
 

Because find has a rich set of file selection tests, sophisticated backups can easily be created. For example, the following command performs a backup of only those files that have not been accessed within the past year:

find /home/ -atime +365 | cpio -o > /mnt/backup/home-backup.cpio
            

There are many other options to cpio (and find); to learn more about them read the cpio(1) and find(1) man pages.

8.4.2.3. dump/restore: Not Recommended for Mounted File Systems!

The dump and restore programs are Linux equivalents to the UNIX programs of the same name. As such, many system administrators with UNIX experience may feel that dump and restore are viable candidates for a good backup program under Red Hat Enterprise Linux. However, one method of using dump can cause problems. Here is Linus Torvald's comment on the subject:

From:	 Linus Torvalds
To:	 Neil Conway
Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.
Date:	 Fri, 27 Apr 2001 09:59:46 -0700 (PDT)
Cc:	 Kernel Mailing List <linux-kernel At vger Dot kernel Dot org>

[ linux-kernel added back as a cc ]

On Fri, 27 Apr 2001, Neil Conway wrote:
> > I'm surprised that dump is deprecated (by you at least ;-)).  What to
> use instead for backups on machines that can't umount disks regularly? 


Note that dump simply won't work reliably at all even in 2.4.x: the buffer
cache and the page cache (where all the actual data is) are not
coherent. This is only going to get even worse in 2.5.x, when the
directories are moved into the page cache as well.

So anybody who depends on "dump" getting backups right is already playing
Russian roulette with their backups.  It's not at all guaranteed to get the
right results - you may end up having stale data in the buffer cache that
ends up being "backed up".

Dump was a stupid program in the first place. Leave it behind.

> I've always thought "tar" was a bit undesirable (updates atimes or
> ctimes for example).

Right now, the cpio/tar/xxx solutions are definitely the best ones, and
will work on multiple filesystems (another limitation of "dump"). Whatever
problems they have, they are still better than the _guaranteed_(*)  data
corruptions of "dump".

However, it may be that in the long run it would be advantageous to have a
"filesystem maintenance interface" for doing things like backups and
defragmentation..

		Linus

(*) Dump may work fine for you a thousand times. But it _will_ fail under
the right circumstances. And there is nothing you can do about it.

Given this problem, the use of dump/restore on mounted file systems is strongly discouraged. However, dump was originally designed to backup unmounted file systems; therefore, in situations where it is possible to take a file system offline with umount, dump remains a viable backup technology.

8.4.2.4. The Advanced Maryland Automatic Network Disk Archiver (AMANDA)

AMANDA is a client/server based backup application produced by the University of Maryland. By having a client/server architecture, a single backup server (normally a fairly powerful system with a great deal of free space on fast disks and configured with the desired backup device) can back up many client systems, which need nothing more than the AMANDA client software.

This approach to backups makes a great deal of sense, as it concentrates those resources needed for backups in one system, instead of requiring additional hardware for every system requiring backup services. AMANDA's design also serves to centralize the administration of backups, making the system administrator's life that much easier.

The AMANDA server manages a pool of backup media and rotates usage through the pool in order to ensure that all backups are retained for the administrator-dictated retention period. All media is pre-formatted with data that allows AMANDA to detect whether the proper media is available or not. In addition, AMANDA can be interfaced with robotic media changing units, making it possible to completely automate backups.

AMANDA can use either tar or dump to do the actual backups (although under Red Hat Enterprise Linux using tar is preferable, due to the issues with dump raised in Section 8.4.2.3 dump/restore: Not Recommended for Mounted File Systems!). As such, AMANDA backups do not require AMANDA in order to restore files — a decided plus.

In operation, AMANDA is normally scheduled to run once a day during the data center's backup window. The AMANDA server connects to the client systems and directs the clients to produce estimated sizes of the backups to be done. Once all the estimates are available, the server constructs a schedule, automatically determining the order in which systems are to be backed up.

Once the backups actually start, the data is sent over the network from the client to the server, where it is stored on a holding disk. Once a backup is complete, the server starts writing it out from the holding disk to the backup media. At the same time, other clients are sending their backups to the server for storage on the holding disk. This results in a continuous stream of data available for writing to the backup media. As backups are written to the backup media, they are deleted from the server's holding disk.

Once all backups have been completed, the system administrator is emailed a report outlining the status of the backups, making review easy and fast.

Should it be necessary to restore data, AMANDA contains a utility program that allows the operator to identify the file system, date, and file name(s). Once this is done, AMANDA identifies the correct backup media and then locates and restores the desired data. As stated earlier, AMANDA's design also makes it possible to restore data even without AMANDA's assistance, although identification of the correct media would be a slower, manual process.

This section has only touched upon the most basic AMANDA concepts. To do more research on AMANDA, start with the amanda(8) man page.

Notes

[1]

The .gz extension is traditionally used to signify that the file has been compressed with gzip. Sometimes .tar.gz is shortened to .tgz to keep file names reasonably sized.