Cross-Platform Data Archives

Zipping up a directory is one of the easiest ways to back up data and transport them via flash storage or network. But unexpected things happen when you use different operating systems (Windows, Linux, and OS X) and come to find out that ZIP archives aren't always cross-platform.

I've learned over the last few years through painful lessions that:

  • VFAT and FAT32 are different file systems, only VFAT seems to consistently work everywhere.
  • If you are using flash storage, there are many that will not mount properly under Linux, even if they come from the same vendor.
  • Different file systems (NTFS, HFS, ext4, etc.) support different character sets. In other words, file name that is valid on one file system may not work on another.
  • Linux kernel source tree especially, have file names that use the same letters but different cases. This breaks on other file systems because while they often preserve case, they consider file names with the same sequence of letters the same, case-insensitive. When you decompress an archive from Linux you may end up losing files. Similarly, if you sync a source repo from one platform to another, you may end up with missing files, or files that are always modified because the second file with the same sequence of letters overwrote the first one.
  • If you create a compressed ZIP archive under OS X, and it happens to be large and contains those Linux files that share the same sequence of letters, it may not decompress properly under Linux and Windows.
  • Large ZIP archive is just a bad idea. Files are easily corrupted, rebuilding the archive using another utility has low chance of success.

Self, always remeber:

  1. Use an archive format that allows recovery.
  2. Use common letters instead of extended characters or Unicode.
  3. Back up your file in many places, uncompressed if you have enough storage.
  4. Putting all eggs in one basket is bad idea.