File systems for backup and long term storage

I save my work onto WD Black hard disk drives inside the MacPro...

b8bbc582679d46c287fe42432d01d2a1.jpg


.. and then all of the stuff on those drives would be saved automatically using CCC to a couple of RAIDs, etc...

44449b9d44ce414b985ff51a089b2918.jpg


.. which all have HGST Enterprise hard disk drives inside.

-
Creating images to tell a story... just for you!
Cheers,
Ashley.
 
Giving SSDs to the kids is a great idea.
Is SSD Good for Long Term Storage?

"SSDs are reliable when inserting into the machines, it is regularly powered on and is not left unpowered for a long time. So, for SSD long term storage out of the computer, it is not recommended. If you want to take an SSD as the long-term storage device on the shelf, you should store it in proper temp, and power it on regularly for the firmware to do house-keeping work."

HDDs are reliable at storing data for long periods of time without being supplied power and are a preferred method of storage for backups.

The Best External Hard Drives - by Scott Gilbertson on Wired.

-
Creating images to tell a story... just for you!
Cheers,
Ashley.
With SSDs, electrical charges leak when not powered on for long time, which can lead to data corruption.

With HDDs, the data stored by using magnetic charges, which can decay, and demagnetize at bit level, making data corrupt.

HDDs and SSDs also suffers from URE (Uncorrectable Read Errors) though it happens rarely.

Prevention: Power on the HDD / SSD at least once a year, leaving it on for a day maybe. SSDs will do the garbage collection / trim if needed, HDDs will do SMART checks and keep everything in check.

Buy quality HDD / SSD: Do not buy cheap QLC SSDs, they do not last long. Best solution is to buy enterprise grade SSDs, they come in 1.92 / 3.84 / 7.68TB sizes, only slightly more expensive than standard SSDs. They have high quality cells, have a higher tolerance to URE, and have exceptional write endurances, and likely to outlast 2-3 times the life of any consumer SSDs. Intel, Micron, Samsung they make good enterprise drives.

Same goes for HDDs. Buy enterprise grade hard drives, they are very reliable and lasts a long time. They have quality components, high workload ratings (for example: 55TB for a normal desktop drive, 180TB for a NAS drive, 550TB for a enterprise drive), high MTBF.
Agreed and I got Seagate Ironwolf NAS drives at a pretty reasonable price to start out with my first NAS box. They got pretty good reviews as well in NAS drive roundups. It really doesn't have to be super expensive to get started with.
 
We've had lot of productive discussions about backup and image data storage. Some people have brought up data corruption. WRT that, not all file systems are created equal. I use ZFS for much of my storage needs. ZFS has features that mitigate some of the issues that others have raised.

From Wikipedia's ZFS article:
  • Designed for long-term storage of data, and indefinitely scaled datastore sizes with zero data loss, and high configurability.
  • Hierarchical checksumming of all data and metadata, ensuring that the entire storage system can be verified on use, and confirmed to be correctly stored, or remedied if corrupt. Checksums are stored with a block's parent block, rather than with the block itself. This contrasts with many file systems where checksums (if held) are stored with the data so that if the data is lost or corrupt, the checksum is also likely to be lost or incorrect.
  • Can store a user-specified number of copies of data or metadata, or selected types of data, to improve the ability to recover from data corruption of important files and structures.
  • Automatic rollback of recent changes to the file system and data, in some circumstances, in the event of an error or inconsistency.
  • Automated and (usually) silent self-healing of data inconsistencies and write failure when detected, for all errors where the data is capable of reconstruction. Data can be reconstructed using all of the following: error detection and correction checksums stored in each block's parent block; multiple copies of data (including checksums) held on the disk; write intentions logged on the SLOG (ZIL) for writes that should have occurred but did not occur (after a power failure); parity data from RAID/RAID-Z disks and volumes; copies of data from mirrored disks and volumes.
  • Native handling of standard RAID levels and additional ZFS RAID layouts ("RAID-Z"). The RAID-Z levels stripe data across only the disks required, for efficiency (many RAID systems stripe indiscriminately across all devices), and checksumming allows rebuilding of inconsistent or corrupted data to be minimized to those blocks with defects;
  • Native handling of tiered storage and caching devices, which is usually a volume related task. Because ZFS also understands the file system, it can use file-related knowledge to inform, integrate, and optimize its tiered storage handling which a separate device cannot;
  • Native handling of snapshots and backup/replication which can be made efficient by integrating the volume and file handling. Relevant tools are provided at a low level and require external scripts and software for utilization.
  • Native data compression and deduplication, although the latter is largely handled in RAM and is memory hungry.
  • Efficient rebuilding of RAID arrays—a RAID controller often has to rebuild an entire disk, but ZFS can combine disk and file knowledge to limit any rebuilding to data which is actually missing or corrupt, greatly speeding up rebuilding;
  • Unaffected by RAID hardware changes which affect many other systems. On many systems, if self-contained RAID hardware such as a RAID card fails, or the data is moved to another RAID system, the file system will lack information that was on the original RAID hardware, which is needed to manage data on the RAID array. This can lead to a total loss of data unless near-identical hardware can be acquired and used as a "stepping stone". Since ZFS manages RAID itself, a ZFS pool can be migrated to other hardware, or the operating system can be reinstalled, and the RAID-Z structures and data will be recognized and immediately accessible by ZFS again.
  • Ability to identify data that would have been found in a cache but has been discarded recently instead; this allows ZFS to reassess its caching decisions in light of later use and facilitates very high cache-hit levels (ZFS cache hit rates are typically over 80%);
  • Alternative caching strategies can be used for data that would otherwise cause delays in data handling. For example, synchronous writes which are capable of slowing down the storage system can be converted to asynchronous writes by being written to a fast separate caching device, known as the SLOG (sometimes called the ZIL – ZFS Intent Log).
  • Highly tunable—many internal parameters can be configured for optimal functionality.
Don't read all the thread, but just aware ZFS as also his problem. ZFS use lot of Ram (be defaults configuration half what you got), It's very a not a good idea to make a ZFS datapool more than 75-85% full, ZFS need to have good knowledge of the hardware and yes ZFS is highly tunable but do you really need/want that ? Because entering this world can be very painful.

About corruption of a hardrive (HDD) don't forget the controller as also good knowledge of the state of the disk (check smarttools)

I use lot of ZFS at work, but I will not use for anything.

Z
 
I had not heard of it, but this is very interesting for me, so I read your post and a couple of articles on the plusses and minuses of this advanced and very good data filing system. It is very advanced and for advanced users but solves many of the data integrity issues of the past.

I don't use RAID and am a simple Windows user.

Jim, what do you think the chances are of me having some corrupted files using my regular old system with 5TB of tens of thousands of image files, copied and synced over and over and over for many years?
Approaching unity.
I don't know what that means,
A probability of one is certainty.
but I follow computer stuff (desktops and laptops) very closely. If this is something that will eventually help the great masses of us who don't run servers, NAS or certainly not RAID, then I am pulling for it to catch on with the masses (and Windows and Apple).
Windows had an improved file system under development many years ago, and cancelled the project when it ran into difficulties and delays. I haven't heard of anything along those lines since. NTFS is pretty long in the tooth by now.
But if I took 2 years and opened every image and checked it, it either opens or not, right? there is no quality degradation.

I'm sure each file has been copied many times. I change disks as the get old or new ones come out. My current main drive is the best 8TB SSD money can buy. But everything on it has been copied to it, so thus the question.

Thanks!
The *BEST* way if you want to maximize the chance you got your data is to increase the number of copy (as someone say also in other location in case of disaster). If you can increase also the technology (meaning not only on a windows NTFS, but also Linux/MacOS/BSD). If you're able to keep you data on different kind of support (HDD/SDD/...tapes).

About the sentence “long term” don't forget also the technology evolve, so from time to time you need to switch. You need to take in account in ten years you still need the technology to read your data.

Z
 

Keyboard shortcuts

Back
Top