come in and find out ...

Check RapidCRC (Windows) md5 checksum files using md5sum (Unix)

I use RapidCRC Unicode (0.3.11) for Windows to generate md5 checksum files for various source files, mainly huge video footage, but it can be anything of course. Footage files, since large, can easily be subject to data rot due age or other electromagnetic effects influencing data consistency. RapidCRC can also be used to check data integrity – this works flawlessly. But as soon as the footage file including md5 checksum file is copied to a Unix system, md5 checksum checks get complicated since the Unix md5sum command doesn’t cope very well with RapidCRC created checksum files. The reason for this is that Windows uses CR (carriage return, 0x0d) followed by LF (line feed, 0x0a) to terminate a line while Ubuntu only uses LF (line feed, 0x0a). No big deal, but md5sum simply doesn’t cope with (see below).

Despite that the fact that md5 states that the generated checksum file is md5sum compatible, this doesn’t seem to be the case.

RapidCRC

This is a RapidCRC generated md5 checksum file (text and hex code).
Notice the 0x0d and 0x0d at the end of the (only) line.

root@ganymede:/media/raid-main/# transfer# cat "movie 1.mpg rapidcrc.md5"
f724a9b3aa7ab2efdf8528daff90a400 *movie 1.mpg
root@ganymede:/media/raid-main/# transfer# xxd -c16 -a "movie 1.mpg rapidcrc.md5"
0000000: 6637 3234 6139 6233 6161 3761 6232 6566  f724a9b3aa7ab2ef
0000010: 6466 3835 3238 6461 6666 3930 6134 3030  df8528daff90a400
0000020: 202a 6d6f 7669 6520 312e 6d70 670d 0a     *movie 1.mpg..

Now the md5sum details (text and hex code).
Notice the 0x0a at the end of the (only) line.

root@ganymede:/media/raid-main/# transfer# cat "movie 1.mpg md5sum.md5"
f724a9b3aa7ab2efdf8528daff90a400  movie 1.mpg
root@ganymede:/media/raid-main/# transfer# xxd -c16 -a "movie 1.mpg md5sum.md5"
0000000: 6637 3234 6139 6233 6161 3761 6232 6566  f724a9b3aa7ab2ef
0000010: 6466 3835 3238 6461 6666 3930 6134 3030  df8528daff90a400
0000020: 2020 6d6f 7669 6520 312e 6d70 670a         movie 1.mpg.

Apart from the above mentioned line termination, an * in front of the source file name is also a difference. However this doesn’t seem to disturb md5sum.

The test directory shows the source file itself, the RapidCRC generated md5 checksum file and, for movie 1.mpg only, the md5sum created md5 checksum file.

root@ganymede:/media/raid-main/# transfer# ls -l
total 28
-rw-r--r-- 1 root root 22 May  2 23:15 movie 1.mpg
-rw-r--r-- 1 root root 47 May  2 23:17 movie 1.mpg rapidcrc.md5
-rw-r--r-- 1 root root 46 May  2 23:42 movie 1.mpg md5sum.md5
-rw-r--r-- 1 root root 33 May  2 23:15 movie 2.mpg
-rw-r--r-- 1 root root 47 May  2 23:17 movie 2.mpg rapidcrc.md5
-rw-r--r-- 1 root root 31 May  2 23:16 movie 3.mpg
-rw-r--r-- 1 root root 47 May  2 23:17 movie 3.mpg rapidcrc.md5

Now … if md5sum is applied to the RapidCRC md5 checksum file, an error shows. The md5sum md5 checksum file is accepted without errors.

root@ganymede:/media/raid-main/# transfer# md5sum -c "movie 1.mpg rapidcrc.md5"
: No such file or directory
: FAILED open or read
md5sum: WARNING: 1 listed file could not be read
root@ganymede:/media/raid-main/# transfer# md5sum -c "movie 1.mpg md5sum.md5"
movie 1.mpg: OK

The following sed command parses the directory for md5 checksum files, deletes the 0x0d and 0x0a at each line end which are replaced by a single 0x0a

root@ganymede:/media/raid-main/# transfer# sed -s '$s/\r$//' *.md5
f724a9b3aa7ab2efdf8528daff90a400  movie 1.mpg
f724a9b3aa7ab2efdf8528daff90a400 *movie 1.mpg
93adea3dc473dd7ae029d630050fd9b8 *movie 2.mpg
b56962d703c9de6bf87f8bcc69614e6a *movie 3.mpg
root@ganymede:/media/raid-main/# transfer# sed -s '$s/\r$//' *.md5 > output
root@ganymede:/media/raid-main/# transfer# xxd -c16 -a output
0000000: 6637 3234 6139 6233 6161 3761 6232 6566  f724a9b3aa7ab2ef
0000010: 6466 3835 3238 6461 6666 3930 6134 3030  df8528daff90a400
0000020: 2020 6d6f 7669 6520 312e 6d70 670a 6637    movie 1.mpg.f7
0000030: 3234 6139 6233 6161 3761 6232 6566 6466  24a9b3aa7ab2efdf
0000040: 3835 3238 6461 6666 3930 6134 3030 202a  8528daff90a400 *
0000050: 6d6f 7669 6520 312e 6d70 670a 3933 6164  movie 1.mpg.93ad
0000060: 6561 3364 6334 3733 6464 3761 6530 3239  ea3dc473dd7ae029
0000070: 6436 3330 3035 3066 6439 6238 202a 6d6f  d630050fd9b8 *mo
0000080: 7669 6520 322e 6d70 670a 6235 3639 3632  vie 2.mpg.b56962
0000090: 6437 3033 6339 6465 3662 6638 3766 3862  d703c9de6bf87f8b
00000a0: 6363 3639 3631 3465 3661 202a 6d6f 7669  cc69614e6a *movi
00000b0: 6520 332e 6d70 670a                      e 3.mpg

If the stdout of sed command is piped to md5sum command, the RapidCRC md5 checksum files can be used to check data integrity.

root@ganymede:/media/raid-main# sed -s '$s/\r$//' *.md5 | md5sum -c
movie 1.mpg: OK
movie 1.mpg: OK
movie 2.mpg: OK
movie 3.mpg: OK
root@ganymede:/media/raid-main# sed -s '$s/\r$//' *.md5 | md5sum -c --quiet
root@ganymede:/media/raid-main#

References:
String operations (ubuntu forums)
RapidCRC Unicode


Check files using md5sum

Data rot is reality. Even in ECC environments, RAID and all other data redundancy structures.

md5sum can easily be used to check data integrity e.g. after file transfers.

Checksum creating comes prior to checksum verification. The following script generates an md5sum compliant file, which can be used to check integrity at a later stage.

root@ganymede:~# cat ./generate-md5sum.sh
#!/bin/sh
find $1 -type f -exec md5sum {} \;
root@ganymede:~# ./generate-md5sum.sh archive/
d6e02966dc93d4b6bbd3a651acea0176  archive/jre-8-ea-bin-b106-linux-i586-05_sep_2013.tar.gz
e868ab86df2eb20a1d98c11e8564e52c  archive/inadyn-mt.v.02.24.38.tar.gz
13a91d9e50695dbfa086ffbacf81cfa6  archive/spigot.jar
6d790745b95d0ece9d1b717c8b4f1d15  archive/cli32
58a014a5a4f2fc3596caf40e60584db0  archive/archttp32
root@ganymede:~# ./generate-md5sum.sh archive/ > archive.md5
root@ganymede:~#

A script (generate-md5sum.sh) generates md5 checksums, which can be redirected into a file (archive.md5) for subsequent md5 checksum verification. It uses the path as parameter where to recursively start calculating md5 checksums.

root@ganymede:~# md5sum -c archive.md5
archive/jre-8-ea-bin-b106-linux-i586-05_sep_2013.tar.gz: OK
archive/inadyn-mt.v.02.24.38.tar.gz: OK
archive/spigot.jar: OK
archive/cli32: OK
archive/archttp32: OK
root@ganymede:~# md5sum -c --quiet archive.md5
root@ganymede:~#

md5sum checks data integrity using the -c option. This shows all results. If only errors are supposed to be displayed, additionally use –quiet option.