An introduction to Unix archiving with tar and cpio

One of the principles of backup and recovery is that you should back up your files in as many formats as possible; that way if one format is discontinued, or the new version is incompatible with the old version, you don’t lose the ability to recover files from your backups. In this post I will share my knowledge of two Unix archiving utilities: tar and cpio. Both of these are non-interactive programs that can be run from the command line, and many more complex backup utilities (including graphical ones) are actually frontends for these and other command line utilities.

First of all, what is an archive? An archive is the result of taking a directory or a hierarchy of directories and merging it into a single file. That’s all it is (aside from the headers and footers of course). If you do a dump of an archive file with a program like less, you will see the contents of your files concatenated together. An archive is just a bunch of files mushed together into one huge file – no compression or encryption involved. Archiving is useful when you want to compress a directory or directory tree, or when you want to encrypt a bunch of files all at the same time (but these steps are of course separate from archiving).

Part 1: tar:

First I will go over tar. tar archives files in the TAR format, which stands for Tape ARchive. TAR files are typically compressed using the GNU Zip utility so they become .tar.gz or .tgz files. This format is used for distributing software in source form. It is also used as the main package format for some Linux distributions, including Slackware.

You create an archive with the -c option. Optionally, you can add -v for verbose output.

bash-3.2$ tar -cv Screenshots > Screenshots.tar
a Screenshots
a Screenshots/Arch Linux Startup.png
a Screenshots/Arch+Linux+top.png
a Screenshots/Arch+Setup+CLI.png
a Screenshots/Arch+Setup+MDI.png
a Screenshots/Arch-Linux-top.png
a Screenshots/Arch-mc.png
a Screenshots/Arch-top.png
a Screenshots/Arch_Linux_top.png
a Screenshots/crontab.png
a Screenshots/crontab~.png
a Screenshots/Cyberdogs-Level2.png
a Screenshots/Device_manager.png
a Screenshots/Elinks+Arch+Linux.png
a Screenshots/graphics-driver.png
a Screenshots/Installing ReactOS 1.png
a Screenshots/Installing ReactOS 10.png
a Screenshots/Installing ReactOS 11.png
a Screenshots/Installing ReactOS 12.png
a Screenshots/Installing ReactOS 13.png
a Screenshots/Installing ReactOS 14.png
a Screenshots/Installing ReactOS 2.png
a Screenshots/Installing ReactOS 3.png
a Screenshots/Installing ReactOS 4.png
a Screenshots/Installing ReactOS 5.png
a Screenshots/Installing ReactOS 6.png
a Screenshots/Installing ReactOS 7.png
a Screenshots/Installing ReactOS 8.png
a Screenshots/Installing ReactOS 9.png
a Screenshots/irix-3.3-img2.gif
a Screenshots/Log_file_troubleshooting_Slackware.png
a Screenshots/Lynx.png
a Screenshots/mc-menu.png
a Screenshots/mc-mono.png
a Screenshots/memtest.png
a Screenshots/Notepad.png
a Screenshots/pkgtool-1.png
a Screenshots/pkgtool-2.png
a Screenshots/ReactOS 1.png
a Screenshots/ReactOS 2.png
a Screenshots/ReactOS 3.png
a Screenshots/ReactOS 4.png
a Screenshots/ReactOS 5.png
a Screenshots/ReactOS_Command_prompt.png
a Screenshots/ReactOS_grey.png
a Screenshots/sc.png
a Screenshots/screen.png
a Screenshots/Screensaver.png
a Screenshots/serial.png
a Screenshots/solitaire.png
a Screenshots/Spash_screen.png
a Screenshots/Task_manager_1.png
a Screenshots/Task_manager_2.png
a Screenshots/Wheat_theme.png
a Screenshots/Wordpad-bug.png
a Screenshots/WordPad.png

Alternatively, you could type tar -cvf Screenshots.tar Screenshots for the same result.

Afterward, this file can be zipped using gzip.

Files are extracted from an archive with the -x option.

bash-3.2$ tar -xvf Screenshots.tar -C .
x Screenshots/
x Screenshots/Arch Linux Startup.png
x Screenshots/Arch+Linux+top.png
x Screenshots/Arch+Setup+CLI.png
x Screenshots/Arch+Setup+MDI.png
x Screenshots/Arch-Linux-top.png
x Screenshots/Arch-mc.png
x Screenshots/Arch-top.png
x Screenshots/Arch_Linux_top.png
x Screenshots/crontab.png
x Screenshots/crontab~.png
x Screenshots/._Cyberdogs-Level2.png
x Screenshots/Cyberdogs-Level2.png
x Screenshots/Device_manager.png
x Screenshots/Elinks+Arch+Linux.png
x Screenshots/graphics-driver.png
x Screenshots/Installing ReactOS 1.png
x Screenshots/Installing ReactOS 10.png
x Screenshots/Installing ReactOS 11.png
x Screenshots/Installing ReactOS 12.png
x Screenshots/Installing ReactOS 13.png
x Screenshots/Installing ReactOS 14.png
x Screenshots/Installing ReactOS 2.png
x Screenshots/Installing ReactOS 3.png
x Screenshots/Installing ReactOS 4.png
x Screenshots/Installing ReactOS 5.png
x Screenshots/Installing ReactOS 6.png
x Screenshots/Installing ReactOS 7.png
x Screenshots/Installing ReactOS 8.png
x Screenshots/Installing ReactOS 9.png
x Screenshots/._irix-3.3-img2.gif
x Screenshots/irix-3.3-img2.gif
x Screenshots/Log_file_troubleshooting_Slackware.png
x Screenshots/Lynx.png
x Screenshots/mc-menu.png
x Screenshots/mc-mono.png
x Screenshots/memtest.png
x Screenshots/Notepad.png
x Screenshots/pkgtool-1.png
x Screenshots/pkgtool-2.png
x Screenshots/ReactOS 1.png
x Screenshots/ReactOS 2.png
x Screenshots/ReactOS 3.png
x Screenshots/ReactOS 4.png
x Screenshots/ReactOS 5.png
x Screenshots/ReactOS_Command_prompt.png
x Screenshots/ReactOS_grey.png
x Screenshots/sc.png
x Screenshots/screen.png
x Screenshots/Screensaver.png
x Screenshots/serial.png
x Screenshots/solitaire.png
x Screenshots/Spash_screen.png
x Screenshots/._Task_manager_1.png
x Screenshots/Task_manager_1.png
x Screenshots/Task_manager_2.png
x Screenshots/Wheat_theme.png
x Screenshots/Wordpad-bug.png
x Screenshots/WordPad.png

This will create a directory called Screenshots in the current directory containing all the files archived in the TAR file.

Another option is to list a table of contents for the archive, without extracting it. This is done as follows:

bash-3.2$ tar -tf Screenshots.tar
Screenshots/
Screenshots/Arch Linux Startup.png
Screenshots/Arch+Linux+top.png
Screenshots/Arch+Setup+CLI.png
Screenshots/Arch+Setup+MDI.png
Screenshots/Arch-Linux-top.png
Screenshots/Arch-mc.png
Screenshots/Arch-top.png
Screenshots/Arch_Linux_top.png
Screenshots/crontab.png
Screenshots/crontab~.png
Screenshots/._Cyberdogs-Level2.png
Screenshots/Cyberdogs-Level2.png
Screenshots/Device_manager.png
Screenshots/Elinks+Arch+Linux.png
Screenshots/graphics-driver.png
Screenshots/Installing ReactOS 1.png
Screenshots/Installing ReactOS 10.png
Screenshots/Installing ReactOS 11.png
Screenshots/Installing ReactOS 12.png
Screenshots/Installing ReactOS 13.png
Screenshots/Installing ReactOS 14.png
Screenshots/Installing ReactOS 2.png
Screenshots/Installing ReactOS 3.png
Screenshots/Installing ReactOS 4.png
Screenshots/Installing ReactOS 5.png
Screenshots/Installing ReactOS 6.png
Screenshots/Installing ReactOS 7.png
Screenshots/Installing ReactOS 8.png
Screenshots/Installing ReactOS 9.png
Screenshots/._irix-3.3-img2.gif
Screenshots/irix-3.3-img2.gif
Screenshots/Log_file_troubleshooting_Slackware.png
Screenshots/Lynx.png
Screenshots/mc-menu.png
Screenshots/mc-mono.png
Screenshots/memtest.png
Screenshots/Notepad.png
Screenshots/pkgtool-1.png
Screenshots/pkgtool-2.png
Screenshots/ReactOS 1.png
Screenshots/ReactOS 2.png
Screenshots/ReactOS 3.png
Screenshots/ReactOS 4.png
Screenshots/ReactOS 5.png
Screenshots/ReactOS_Command_prompt.png
Screenshots/ReactOS_grey.png
Screenshots/sc.png
Screenshots/screen.png
Screenshots/Screensaver.png
Screenshots/serial.png
Screenshots/solitaire.png
Screenshots/Spash_screen.png
Screenshots/._Task_manager_1.png
Screenshots/Task_manager_1.png
Screenshots/Task_manager_2.png
Screenshots/Wheat_theme.png
Screenshots/Wordpad-bug.png
Screenshots/WordPad.png

Part 2: cpio:

cpio is different from tar. Unlike tar, it creates archives in the PAX format. PAX stands for Portable Archive eXchange. Also unlike tar, cpio reads the file list from standard input and writes to standard output.

Here is a typical cpio command for archiving a directory:

bash-3.2$ ls | cpio -oacvB > Screenshots.pax
Arch Linux Startup.png
Arch+Linux+top.png
Arch+Setup+CLI.png
Arch+Setup+MDI.png
Arch-Linux-top.png
Arch-mc.png
Arch-top.png
Arch_Linux_top.png
Cyberdogs-Level2.png
Device_manager.png
Elinks+Arch+Linux.png
Installing ReactOS 1.png
Installing ReactOS 10.png
Installing ReactOS 11.png
Installing ReactOS 12.png
Installing ReactOS 13.png
Installing ReactOS 14.png
Installing ReactOS 2.png
Installing ReactOS 3.png
Installing ReactOS 4.png
Installing ReactOS 5.png
Installing ReactOS 6.png
Installing ReactOS 7.png
Installing ReactOS 8.png
Installing ReactOS 9.png
Log_file_troubleshooting_Slackware.png
Lynx.png
Notepad.png
ReactOS 1.png
ReactOS 2.png
ReactOS 3.png
ReactOS 4.png
ReactOS 5.png
ReactOS_Command_prompt.png
ReactOS_grey.png
Screensaver.png
Screenshots.paxcpio: Screenshots.pax: Can't add archive to itself

Spash_screen.png
Task_manager_1.png
Task_manager_2.png
Wheat_theme.png
WordPad.png
Wordpad-bug.png
crontab.png
crontab~.png
graphics-driver.png
irix-3.3-img2.gif
mc-menu.png
mc-mono.png
memtest.png
pkgtool-1.png
pkgtool-2.png
sc.png
screen.png
serial.png
solitaire.png
3484 blocks

Both the command structure and the output look different from those of tar. Here, the output of ls is piped into the cpio program, which then has its output redirected to the file Screenshots.pax. The verbose output shows not just a list of files, but also the number of blocks transferred.

There are three basic options for cpio: -o tells the program to produce an archive file as output; -i tells it to take an archive file as input; and -p tells it to read a list of files from standard input and copy them to a specified directory.

cpio has a very nifty feature – it gives you the option to not change the atime (access time) values of the files you are archiving. This is accomplished with the -a switch. cpio does this by saving the old access times of each of the files, then resetting them to those values when it is done archiving the directory.

The -c option tells cpio to use the ASCII header format. This makes the archive more portable.

-v is of course the switch for verbose output. Without this, the program basically operates silently, with no indication of what it’s currently doing.

-B tells cpio to use blocks that are ten times the size of the blocks used by the operating system. This can make archiving and unarchiving more efficient. If you want to specify another size for the blocks, you can use the -C switch, followed by the number of bytes in each block.

Another difference between cpio and tar is that cpio uses the same basic switch for both extracting an archive and displaying a table of contents.

Here is a cpio command for extracting an archive:

bash-3.2$ cat Screenshots.pax | cpio -iv

I have omitted the verbose output here, because it looks pretty much the same as for the other cpio command. Here I have piped the contents of the archive into cpio from the cat command, and I’ve used the -i switch to tell it to take the archive file as input. There is no need to redirect the output of cpio because it essentially has no output.

Finally, here is the command for displaying the contents without extracting:

bash-3.2$ cat Screenshots.pax | cpio -it

That’s all for now.

NOTE: Feedback and corrections to this tutorial are welcome. I will be happy to correct any mistakes people point out. Also, if anyone can tell me how to get rid of those little black rectangles above and below the code blocks, that would be great. EDIT: Fixed

Advertisements