I use RapidCRC Unicode (0.3.11) for Windows to generate md5 checksum files for various source files, mainly huge video footage, but it can be anything of course. Footage files, since large, can easily be subject to data rot due age or other electromagnetic effects influencing data consistency. RapidCRC can also be used to check data integrity – this works flawlessly. But as soon as the footage file including md5 checksum file is copied to a Unix system, md5 checksum checks get complicated since the Unix md5sum command doesn’t cope very well with RapidCRC created checksum files. The reason for this is that Windows uses CR (carriage return, 0x0d) followed by LF (line feed, 0x0a) to terminate a line while Ubuntu only uses LF (line feed, 0x0a). No big deal, but md5sum simply doesn’t cope with (see below).
Despite that the fact that md5 states that the generated checksum file is md5sum compatible, this doesn’t seem to be the case.
This is a RapidCRC generated md5 checksum file (text and hex code).
Notice the 0x0d and 0x0d at the end of the (only) line.
root@ganymede:/media/raid-main/# transfer# cat "movie 1.mpg rapidcrc.md5" f724a9b3aa7ab2efdf8528daff90a400 *movie 1.mpg root@ganymede:/media/raid-main/# transfer# xxd -c16 -a "movie 1.mpg rapidcrc.md5" 0000000: 6637 3234 6139 6233 6161 3761 6232 6566 f724a9b3aa7ab2ef 0000010: 6466 3835 3238 6461 6666 3930 6134 3030 df8528daff90a400 0000020: 202a 6d6f 7669 6520 312e 6d70 670d 0a *movie 1.mpg..
Now the md5sum details (text and hex code).
Notice the 0x0a at the end of the (only) line.
root@ganymede:/media/raid-main/# transfer# cat "movie 1.mpg md5sum.md5" f724a9b3aa7ab2efdf8528daff90a400 movie 1.mpg root@ganymede:/media/raid-main/# transfer# xxd -c16 -a "movie 1.mpg md5sum.md5" 0000000: 6637 3234 6139 6233 6161 3761 6232 6566 f724a9b3aa7ab2ef 0000010: 6466 3835 3238 6461 6666 3930 6134 3030 df8528daff90a400 0000020: 2020 6d6f 7669 6520 312e 6d70 670a movie 1.mpg.
Apart from the above mentioned line termination, an * in front of the source file name is also a difference. However this doesn’t seem to disturb md5sum.
The test directory shows the source file itself, the RapidCRC generated md5 checksum file and, for movie 1.mpg only, the md5sum created md5 checksum file.
root@ganymede:/media/raid-main/# transfer# ls -l total 28 -rw-r--r-- 1 root root 22 May 2 23:15 movie 1.mpg -rw-r--r-- 1 root root 47 May 2 23:17 movie 1.mpg rapidcrc.md5 -rw-r--r-- 1 root root 46 May 2 23:42 movie 1.mpg md5sum.md5 -rw-r--r-- 1 root root 33 May 2 23:15 movie 2.mpg -rw-r--r-- 1 root root 47 May 2 23:17 movie 2.mpg rapidcrc.md5 -rw-r--r-- 1 root root 31 May 2 23:16 movie 3.mpg -rw-r--r-- 1 root root 47 May 2 23:17 movie 3.mpg rapidcrc.md5
Now … if md5sum is applied to the RapidCRC md5 checksum file, an error shows. The md5sum md5 checksum file is accepted without errors.
root@ganymede:/media/raid-main/# transfer# md5sum -c "movie 1.mpg rapidcrc.md5" : No such file or directory : FAILED open or read md5sum: WARNING: 1 listed file could not be read root@ganymede:/media/raid-main/# transfer# md5sum -c "movie 1.mpg md5sum.md5" movie 1.mpg: OK
The following sed command parses the directory for md5 checksum files, deletes the 0x0d and 0x0a at each line end which are replaced by a single 0x0a
root@ganymede:/media/raid-main/# transfer# sed -s '$s/\r$//' *.md5 f724a9b3aa7ab2efdf8528daff90a400 movie 1.mpg f724a9b3aa7ab2efdf8528daff90a400 *movie 1.mpg 93adea3dc473dd7ae029d630050fd9b8 *movie 2.mpg b56962d703c9de6bf87f8bcc69614e6a *movie 3.mpg root@ganymede:/media/raid-main/# transfer# sed -s '$s/\r$//' *.md5 > output root@ganymede:/media/raid-main/# transfer# xxd -c16 -a output 0000000: 6637 3234 6139 6233 6161 3761 6232 6566 f724a9b3aa7ab2ef 0000010: 6466 3835 3238 6461 6666 3930 6134 3030 df8528daff90a400 0000020: 2020 6d6f 7669 6520 312e 6d70 670a 6637 movie 1.mpg.f7 0000030: 3234 6139 6233 6161 3761 6232 6566 6466 24a9b3aa7ab2efdf 0000040: 3835 3238 6461 6666 3930 6134 3030 202a 8528daff90a400 * 0000050: 6d6f 7669 6520 312e 6d70 670a 3933 6164 movie 1.mpg.93ad 0000060: 6561 3364 6334 3733 6464 3761 6530 3239 ea3dc473dd7ae029 0000070: 6436 3330 3035 3066 6439 6238 202a 6d6f d630050fd9b8 *mo 0000080: 7669 6520 322e 6d70 670a 6235 3639 3632 vie 2.mpg.b56962 0000090: 6437 3033 6339 6465 3662 6638 3766 3862 d703c9de6bf87f8b 00000a0: 6363 3639 3631 3465 3661 202a 6d6f 7669 cc69614e6a *movi 00000b0: 6520 332e 6d70 670a e 3.mpg
If the stdout of sed command is piped to md5sum command, the RapidCRC md5 checksum files can be used to check data integrity.
root@ganymede:/media/raid-main# sed -s '$s/\r$//' *.md5 | md5sum -c movie 1.mpg: OK movie 1.mpg: OK movie 2.mpg: OK movie 3.mpg: OK root@ganymede:/media/raid-main# sed -s '$s/\r$//' *.md5 | md5sum -c --quiet root@ganymede:/media/raid-main#
References:
String operations (ubuntu forums)
RapidCRC Unicode