Back

The future of data storage is still magnetic tape (2018)

97 points3 yearsspectrum.ieee.org
tzs3 years ago

I wish tape drives weren't so expensive. I'd love to have on each personal computer a backup system that just mirrors every disk write to tape.

According to the SMART data for my current system I write about 9 TB per year. That would fit on one LTO 8 cartridge.

klodolph3 years ago

When I last calculated the break-even point for a thrifty home user, it was around 50TB of data. Above 50TB and there was a decent chance that tape is cheaper than disk, below 50TB it's unlikely.

I'll just say that the exact tradeoff will depend on a lot of factors, it's just as a rule of thumb, you can forget about tape below about 50TB.

The media itself is cheap and durable, but the drives are $1,000 or more and sometimes break--in other words, if you really must have your data, you want two drives.

zozbot2343 years ago

> The media itself is cheap and durable, but the drives are $1,000 or more and sometimes break--in other words, if you really must have your data, you want two drives.

Newer drives will also drop support for older LTO standards, so you end up having to rely on flimsy old hardware to recover your older tapes. For most home users, it just doesn't seem worth the hassle. Maybe it starts making sense for casual/non-professional users in the 100TB+ range, but even the most committed data hoarders are nowhere near that.

jacquesm3 years ago

> but even the most committed data hoarders are nowhere near that

Not all that committed, well over that.

ClumsyPilot3 years ago

"even the most committed data hoarders are nowhere near that."

Eh, i do meet people with ~50TB archives

Bluestein3 years ago

> Eh, i do meet people with ~50TB archives

Film? Curious ...

agumonkey3 years ago

Russian librarians say hi

ineedasername3 years ago

Depends on the needs. Tape will keep longer unpowered. A traditional HDD is a bit riskier if left offline and unpowered for a while. Even if they're all powered in a tower, I'd still recommend something like Backblaze for redundancy.

klodolph3 years ago

If you're storing 50TB of data in B2, that's $3k/year. For that price, you can make 2x copies of your data on tape, and then buy two tape drives to read them.

+1
ineedasername3 years ago
+2
gruez3 years ago
userbinator3 years ago

in other words, if you really must have your data, you want two drives.

On the bright side, since it's removable media, anyone with a working drive can read your tapes for you.

ineedasername3 years ago

They used to be a lot cheaper. When I did admin work for a small company ~2004 they weren't as cheap as a DVD R/W, but they were still pretty inexpensive, and I kept rotating 7-day backups on them along with a monthly rotation, and IIRC the system itself was RAID 3. When I upgraded the server I think the DAT tape drive, 4GB per tape, only added a a few hundred dollars.

These days a new tape drive goes into the thousands. It's a shame because it's still very difficult to beat tape for high capacity unpowered offline storage. Some optical formats have longer shelf lives but lower capacities, and also aren't cheap.

I'm glad I don't have to worry about backup/restore in my job anymore. I had to make use of those daily tapes more than once and the stress of that was not fun.

matheusmoreira3 years ago

It's not just the price either. LTO has backwards compatibility for the previous two generations only. There's a risk data stored on LTO cartridges will not be readable by future tape drives.

johnklos3 years ago

Used tape drives with not many hours on them are easy to find on eBay and other places for a fraction of the cost of new drives, and LTO-5 and 6 tapes, for instance, are pretty inexpensive.

srtjstjsj3 years ago

How would you make use of data that was "every write", which could just be blocks of subfiles?

tzs3 years ago

You can restore the state of the drive to an arbitrary time in the past by replaying past writes.

For example, suppose something bad happens (a bug deletes important files, someone installs ransomware, etc) and you want to roll back the system three days.

Restore the last full backup you have that is older than the desired rollback time. Then replay from the write log every write that occurred between the time of that backup and the time you are trying to rollback to.

Another use for it is restoring or reverting files that were originally created more recently than your last regular (full or incremental) backup.

Example: I do incremental cloud backups once a week, on Saturday. Suppose I create a file on Monday, make various changes to it on Tuesday and Wednesday, accidentally delete it or corrupt it on Thursday, and realize this on Friday and want to recover the file.

With the "every write" log, I could find the file in the write log and recover it.

You might at first think that I would not be able to find it because unlike most backup programs the write log does not explicitly record file information. But it doesn't have to explicitly record file information because when you create a file or modify it the OS stores that information by writing to the disk, and those writes end up in your tape write log.

If I'm trying to restore /home/tzs/important_file to the state it was two days ago, the restore process would scan the write log starting far enough back to find the latest earlier write of the root inode (let's assume it was a classic Unix filesystem). Once it has that it can find /home's inode, then can find the correct contents of that on the target date, and so on until it has the inode of important_file. Then it can find what blocks comprised the file, and pull the right versions of those blocks from the write log.

Note that for this to work, the restore program must intimately understand the filesystem whose write log it is working with. This is one of the reasons you want to still do a more normal full/incremental period backup system with a backup program that explicitly stores files. The write log is best used to supplement the traditional approach.

PS: there was a backup program in whatever AT&T Unix was current in the early '80s (System III) that used the write log approach, although it was not continuous and it wrote redundant data. It was used for incremental backups. You started with a full block level backup. Then when you wanted to do an incremental backup it did this:

1. Make an empty list of blocks.

2. Add the block number of the superblock to the list.

3. Scan all the inodes looking for any inode that has a create or modify time later than the previous backup. For any such inode:

3a. Add the number of the block that contains that inode to the list.

3b. Add the block numbers of the file that the inode refers to to the list, including any indirect blocks.

4. Sort the block number list.

5. Copy all the blocks that are referenced in the list to the tape.

(There was probably a "write some metadata about the backup to the tape" step in there somewhere, too).

Note that this approach was redundant, in that if you changed part of a file the whole file would be written to the incremental backup. A pure write log would only write the changed blocks, but most systems don't record sufficiently granular modification information to allow that.

On the plus side, it meant that restoring a file from an incremental backup was easy. The file was either in that incremental entirely or not at all.

justahuman743 years ago

What's the ball park cost on a drive+tape (say for a desktop)?

klodolph3 years ago

Kind of like asking the cost of a car + groceries.

You can pick up an old tape drive, like an LTO5, for about $1,000. You can LTO-5 tapes for about $15 each, with a capacity of 1.5 TB.

This setup puts the cost at around $10/TB, compared to somewhere around $30/TB for disk. Since you save $20/TB by using tape, it takes you 50TB before the total setup is less expensive than disk. Plug in your own numbers if you like.

You'll want to get a fiber channel card too, but that's easy enough.

Newer generations of LTO have marginally cheaper media (per TB) when you're buying in small quantities, but the drives are more expensive, and it's not uncommon to see poor capacity utilization in tape systems. The big players with lots of data will get newer LTO generations because they can save on library space, vault space, and drive time.

ineedasername3 years ago

The advantage of tape is that you can easily and cheaply keep a daily rotation as well, slot in a monthly, and then an annual to really keep a good cold storage backup. I had to restore from a previous night's tape often enough that for anything critical I'd be paranoid about adequate backup protocols. I don't know the failure rate for tapes, but I think they're lower than mechanical HDD, and definitely last longer unpowered.

Karunamon3 years ago

If you're in LTO5 land, you can get away with U320 SCSI instead of FC, and this can cut your costs in half. (Chances are if you're asking this question, you're the kind of user who won't be impacted much by the slower R/W speeds).

Also, library units are often cheaper than standalone drives for reasons I don't quite understand. I purchased a 12 tape autostacker for around $250 shipped, paid another company about $50 for the upgrade to make it 24 tapes, and another $50 for the HBA.

selfhoster113 years ago

It's worth pointing out that these media prices are very US-centric. In the UK, the prices for the media are exorbitant, even for used tapes.

+1
toast03 years ago
adrian_b3 years ago

Even in UK, you can buy them from Amazon Germany or from Amazon France, at only EUR 7 per TB for LTO-8 or around EUR 8 per TB for the older standards.

Previously there were no customs taxes but after Brexit I assume that there might be some taxes, but they should not be so high as to increase much this low price.

d110af5ccf3 years ago

I'm seeing LTO-5 drives on Ebay for far less than $1000 (more like $150 to $300). Am I missing something obvious?

+1
adrian_b3 years ago
klodolph3 years ago

Used drives. I would be worried about how clean the heads are and whether they are worn down and need to be replaced.

cameldrv3 years ago

However, if you buy external USB hard drives, it's only about $20/TB, you have full random access, you don't need a fiber channel card, and if you buy a bunch of USB hubs, you don't have to change tapes every 1.5 TB.

IMO the niche where tape still has an advantage is pretty small these days.

+1
klodolph3 years ago
bombcar3 years ago

If you drop back to used LTO4 you can get an entire setup for less than $500.

+1
klodolph3 years ago
dabiged3 years ago

I work with a lot of tape in the oil and gas sector and one of the major issues we see is that industry wide file formats for data exchange are still based on old tapes. These formats have fixed block sizes that haven't changed since the 1950's. Modern tapes have exponentially larger capacities and bandwidth but due to the use of these file formats we rarely see 20% of the theoretical throughput on a brand new tape drive and cartridge. The industry has no desire to change the file format so we are stuck in this situation where tape read/write times grow exponentially with every new tape generation.

kstrauser3 years ago

Can you give an example? What could change if we stopped optimizing for legacy stuff?

dabiged3 years ago

The main issue is that the standard was written back when file formats and tape formats were essentially the same thing. Data was read from tape, processed and written to a second tape. The memory of machine at the time were of the order of 1 block.

Decoupling tape formats from file format effectively resolves the issue, that is: read the tape in 6k blocks, write the file to a modern file system. Set the tape block size to 10mb. Write the file back to tape as a tar. You get the bandwidth limit of the tape drive.

The issue is almost all legacy industry software is still designed to read and write directly to tape objects so your tape isn't readable in all the software this data is designed for.

skissane3 years ago

Couldn’t this be addressed with virtual tape devices? The application gets presented with something that is indistinguishable in behaviour from a physical tape drive, but actually backed by a file on SSD or hard disk. Then you can copy that file to a real tape device separately

newswasboring3 years ago

I can't say for oil and gas but I encountered the same thing in semiconductor manufacturing. CMP machines still use tape formats even if the file is now written to different media. The challenge during transition was in proving reliability and transition costs. Sometimes letting engineers deal with the hassle is cheaper than letting them fix it.

dabiged3 years ago

I agree. This would work, but you still need to get the data off the tape, and onto the SSD. The only way to do this is 60MB/sec.

macjohnmcc3 years ago

It helps that those tapes can be read in the future easily as well. HP at one point asked the company I worked for at the time if they could use their tape copy program with their new drives so that customers could copy older DAT tapes to a newer version so they didn't have to support older media formats in their newer drives.

The newer media is typically on a new very fast tape drive and sometimes it is difficult to saturate those anyway and keep stopping and starting and having to reposition over and over.

tyingq3 years ago

That's interesting. So the overhead to assemble the file is just so much that you can't stream output to the tape fast enough? That seems like something that could be overcome.

I get that a better format would make it easier, but it seems like you should be able to, with some optimization or disk cache, stream the output fast enough to outpace the tape drive.

dabiged3 years ago

I am talking about bottlenecks on reading from the tape. Sending the data to /dev/null eliminates any write bottlenecks and we still max out at 60mb/sec on a 250mb/sec drive.

tyingq3 years ago

Hrm. That is odd, as I assume these old formats would be something that compresses well, which gets higher read/write speeds.

Edit: Hrm. I believe you can disable hardware compression on some LTO drives. Maybe that would help?

dabiged3 years ago

The data is binary formatted floating point data, generally using the IBM floating point format, though IEEE floats are also allowed. Standard compression like gzip basically does nothing. All tape manufacturers claim their hardware based compression gives 2/2.5/3 times the tape capacity and it is all baloney. A 1TB tape holds 1TB of data.

There have been multiple attempts over the years to implement all sorts of algorithms but they have all failed to gain traction. For example Wavelet based compression works fantastically well for some data (i.e. real signals with band limited frequencies), and catastrophically badly for others (earth models defined with piecewise functions containing non-differentiable points). Some people are happy with lossy compression, others are not.

The only compression I have seen worth anything is in deep water, where there is literally no data in the water column (i.e. 500 consecutive zeroes in a 6000 byte block), gzip gives you the that section free.

lisper3 years ago

Might it be possible to embed the legacy format witbin a more modern format that gets written to the tape?

lizknope3 years ago

The article briefly mentions HAMR (Heat Assisted Magnetic Recording) hard drives. Seagate has been shipping them since 2020 to some customers. Seagate's roadmap is 120TB drives in 2030.

https://www.anandtech.com/show/16544/seagates-roadmap-120-tb...

If you are a business then tape is great. But as a consumer with around 50TB of data the problem is that current generation LTO tape drives are in the over $3,000 range. The tapes are reasonably priced but the drive cost is so high that you can buy about 200TB of hard drives for the cost of just the tape drive.

dang3 years ago

Discussed at the time (of the article, not the future):

Why the Future of Data Storage Is Still Magnetic Tape - https://news.ycombinator.com/item?id=17864392 - Aug 2018 (155 comments)

8bitsrule3 years ago

People are still working to use lasers to write data much faster. TU/e paper from a year ago:

[https://phys.org/news/2020-07-ultra-fast-laser-based-storage...] "Ultra-fast laser-based writing of data to storage devices"

"researchers ...have demonstrated a new approach that can achieve deterministic single pulse writing in magnetic storage materials, making the writing process much more accurate." (on a 3-layer, 15nm surface)

Paper: [https://www.nature.com/articles/s41467-020-17676-6] "Deterministic all-optical magnetization writing facilitated by non-local transfer of spin angular momentum"

Still writing to 2D. I guess laser-write/read to 3D cubes is still a pipe dream.

luma3 years ago
8bitsrule3 years ago

OK, cool. My browser's blind to MS, but I found an alternative: https://techxplore.com/news/2020-09-microsoft-holographic-so...

bb1013 years ago

A competing but emerging technology to keep an eye on is holographic storage.

"IBM has already demonstrated the possibility of holding 1 TB of data in a crystal the size of a sugar cube and of data access rates of one trillion bits per second"[1]

[1] https://en.wikipedia.org/wiki/Holographic_Data_Storage_Syste...

amelius3 years ago

But not rewritable.

relyks3 years ago

Magnetic tape is typically used for backup or archival purposes most of the time though, so that might not be such a big issue and they'll be used for a similar purpose

adrian_b3 years ago

That is a feature, not a bug, for long-term archival media.

WalterBright3 years ago

The other advantage of tape is that ransomware shouldn't be able to encrypt your backups (if written in append mode).

selfhoster113 years ago

This is what gets me about read-write online backup media. How do you ensure it's not compromised by ransomware? How do you ensure that your snapshots aren't compromised? It boggles the mind to to think that defense in depth and software write protection is considered fine these days.

WalterBright3 years ago

I regularly bring up the subject of physical write-enable switches (that hard drives used to have). Inevitably, someone responds that it's a great idea, and their software has a write-enable setting.

The other mind-boggling thing is I argue that remote updating shouldn't be allowed unless there's a physical switch to enable it. The response is that remote updating is necessary to keep systems secure from malicious remote updates.

I have a paper sleeve that goes over my webcam when not in use.

mhh__3 years ago

I would probably pay extra for a switch although a big heavy mechanical switch in the middle of the signal path for a modern SSD would probably not be as cheap as you might expect.

+1
WalterBright3 years ago
deepstack3 years ago

Floppy days! Way back then I remember there are bootable linux or freebsd server running off a DVD disk. The idea is that the server data then is immutable.

unethical_ban3 years ago

Yes, you said read write, but to answer your question at an industrial level, set your AWS IAM permissions to write only for your normal backup role. No delete or modify.

selfhoster113 years ago

Solving this issue at higher levels of abstraction is precisely what I am against. This should be a feature of the media itself.

kwdc3 years ago

I feel like there's a place for home users to rent a tape drive, write everything to tape and return the drive.

Uber for tape-based storage. You don't need to own the drive. Just the tapes. Maybe not even the tapes. Dropbox using tape storage? Maybe. Personally I'd like to own the actual medium.

Ycombinator! Make it happen.

dabiged3 years ago

Just use a cloud provider. If you are worried about privacy encrypt it.

For example, AWS Glacier deep archive would be years of storage for the cost of shipping a tape drive. If you have bandwidth issues use snowcones.

guerby3 years ago

Proxmox Backup Server has LTO tape support now:

https://pbs.proxmox.com/docs/tape-backup.html

IIRC they rewrote the tape driver in userspace in rust.

avhception3 years ago

Huh. I glossed over the article but found no explanation for why that was necessary. Do you happen to know by any chance?

blakesterz3 years ago

This is from 2018, I'd be curious to hear if anything has changed in the past few years.

solresol3 years ago

Anecdatum:

In 2018 I was running a consulting business helping customers with backup to tape, but even back then most customers were doing backup to a dedupe (magnetic disk) library like the HP StoreOnce, but then often copying that spooled-to-disk data to tape one per week or once per month.

Now (2021) the flavour-of-the-month is backing up to cloud storage (s3 or Azure blobstore), sometimes directly, sometimes as a copy from deduplication storage. This morning's email summary tells me that one of my customers has just recently stopped using their on-site tape library; I have a quote out for some work with another customer to decommission theirs and replace it with writing to a cloud-hosted provider.

So tape is still getting used (assuming that's what Deep Glacier actually is), but it isn't owned by the customer. But if you are backing up to Google's blob storage, then no, it isn't tape, it's just magnetic storage again, just in low-speed access sections of disk.

The product I was working (DataProtector) has gone through a bit of churn as it changed hands from HP to Microfocus, so this could cloud the numbers a bit.

Kuinox3 years ago

Magnetic tape have simply a far bigger working area than HDDs. An HDD have only it's disk area, multiplied by the number of disks. Magnetic tape is wrapped which make the total surface way way higher.

Dylan168073 years ago

Not "simply" when the densities are so different.

They both happen to be similar in price, but if the density gap grew by 3x then suddenly tape would be more expensive and only used for certain types of archival.

Kuinox3 years ago

The areal density of tape is far lower, but the volume density is higher, simply because HDDs plays only on 2D planes while tapes are wrapped. If the density gap grew, magnetics tapes will try to put more tape in a single enclosure, like HDDs try to put more disk in a single enclosure.

+1
Dylan168073 years ago
chunkyks3 years ago

Where I work we moved to a fancy backup solution a few years ago. Which one doesn't matter but suffice to say the name implies the focus isn't on metal

Not a single one of our metal machines ever successfully backed up with it. Tens of TBs per system seems to be where it just fails every single time. We put those systems back on the official-retiring tape system and now all it stuff is backed up without a hitch every time.

axx3 years ago

Kind of semi related: What do you buy if you want to do tape backups at home?

scheme2713 years ago

Probably a LTO drive a few generations old and a SAS card. Current generations cost too much and you aren't going to be able to keep up with the data rate that they prefer.

kmeisthax3 years ago

From personal experience you are not going to be able to saturate even a horrifically obsolete LTO drive and SAS interface without specialized software or purely sequential data. Most of the cheap/free ways of running tape drives aren't optimized for parallel I/O and will horrifically shoe-shine (that's the rev-up and rev-down sound you hear when the tape drive isn't getting enough data), which isn't great for the tapes and massively increases backup time and drive usage.

AFAIK most commercial tape deployments nowadays are disk-to-disk-to-tape arrangements. All the actual data is serialized to an archive on disk first, and then that serialized archive is written to tape at full speed. This minimizes tape wear and ensures your very expensive tape drives are being used efficiently.

adrian_b3 years ago

I am using a LTO-7 drive connected to a FreeBSD server.

FreeBSD always succeeds to write the tape continuously at about 300 MB per second, which is the maximum speed for LTO-7.

All the files send to the tape are grouped into large archive files and for the dd command that writes to tape I use a block size of 128 kB.

The tape commands from FreeBSD are more convenient than those from Linux, which have not seen much maintenance in recent years.

Obviously, you cannot reach tape speed when making the backup directly from a HDD or from a 1 Gb/s Ethernet.

You must write the backup to tape either from a fast SSD, or from 10 Gb/s Ethernet coming from a fast SSD at the other end, or from a RAM disk configured on the server, if you have enough memory.

To not wear unnecessarily the SSDs on my server where the tape drive is located, whenever I write the backup, I configure a large RAM disk on the server. The backup files coming through Ethernet to the server are written to the RAM disk on the server, then they are copied to the tape.

With this arrangement it is very easy to ensure that the tape drive is written at maximum speed without any hiccups.

+2
vermaden3 years ago
bombcar3 years ago

It may even be possible to find on eBay a multi tape loader that has a disk inside and appears to the system as just a dumb tape drive.

aduitsis3 years ago

Very insightful comment on the data rate, if you can't feed the LTO properly, it may be stopping and starting all the time, which could have a detrimental effect to its mechanism. But this is probably easily fixed by spooling (using bacula terminology). You buffer like 100Gbytes, write them in one batch. Plus, some drives support variable speeds.

Also, in addition to SAS, a fibre channel card would also fit the bill, albeit probably a trifle more expensive. But if you go for low speeds (e.g. 8Gbps), those cards can be less than 100$.

adrian_b3 years ago

Yes, that is how it should be done.

For example, I collect all the files that I send to backup in archive files whose size is approximately 50 GB, then I copy to the tape the 50 GB files.

Doing like this, the write speed to a LTO-7 drive is always constant at 300 MB per second, which is the maximum possible for that standard (which is much higher than what a HDD can sustain, so the 50 GB files must come from a fast SSD or from a RAM disk).

+1
at_a_remove3 years ago
Dylan168073 years ago

> Current generations cost too much and you aren't going to be able to keep up with the data rate that they prefer.

Maybe. Keeping up only really matters if your slowest data rate is going to be between 50MB/s and 120MB/s. If it's below that then you have a strong need for a buffer drive no matter what tape generation. And if it's above that then you shouldn't have any real trouble no matter what tape generation.

helloworld113 years ago

Three questions as one who is mostly ignorant of magnetic tape:

1. How expensive is this medium per unit of memory (for example, per terabyte)?

2. How well (practical effectiveness) can it connect to home devices and be used to store information such as photos, texts, documents, videos etc? Related sub-question: how well does its lasting power compare to HDD?

3. Where/how would one buy reliable magnetic tape storage devices?

In case anyone who knows the subject well wouldn't mind answering a bit.

williesleg3 years ago

Tape is great because once you write it and forget it, you think the data is safe. Until you try to read it.

rdines3 years ago

The author's bio "Mark Lantz is manager of the Advanced Tape Technologies at IBM Research Zurich."

LOL, ok sure man, the future of storage is definitely tape. :rolls eyes:

Bluestein3 years ago

"It should come as no surprise that recent advances in big-data analytics and artificial intelligence have created strong incentives for enterprises to amass information about every measurable aspect of their businesses. And financial regulations now require organizations to keep records for much longer periods than they had to in the past. So companies and institutions of all stripes are holding onto more and more.

Studies show [PDF] that the amount of data being recorded is increasing at 30 to 40 percent per year. At the same time, the capacity of modern hard drives, which are used to store most of this, is increasing at less than half that rate. Fortunately, much of this information doesn’t need to be accessed instantly. And for such things, magnetic tape is the perfect solution.

Seriously? Tape? The very idea may evoke images of reels rotating fitfully next to a bulky mainframe in an old movie like Desk Set or Dr. Strangelove. So, a quick reality check: Tape has never gone away!"

Bluestein3 years ago

I mean, I really grok tape having detractors. Granted ...

... but, to the point of downvotes? :)