Anyone who ever had a hard disk failure knows: Digital archiving is tough. Of course you promise yourself to do regular backups. Maybe you even have a complicated system in place that secures all your files every day. But when Murphy strikes, chances are that all of this won't help you and you'll end up with some digital memory loss.

Losing files sucks - but at least you're not the only one that struggles with this problem. Libraries, broadcasters and other types of archivivists increasingly fight an uphill battle anaginst data loss. Some even preseve technology itself in order to presevere data, as the LA Times reported recently:

"The difficulty and cost of the process prompted WGBH, Boston's public broadcasting television station, to hedge its bets. It purchased 6-foot-tall, 1960s-era video recorders and shrink-wrapped them in cold storage to ensure a way to play back a unique collection of Boston Symphony concerts from 1955 and an interview series hosted by Eleanor Roosevelt, featuring such luminaries as then-Sen. John F. Kennedy."

US libraries now started using P2P technology for digital archiving. Some researchers at the Stanford University developed a system called LOCKSS - an acronym that pretty much explains it all: "Lots of Copies Keep Stuff Safe"

LOCKSS essentially works as a web proxy and distrubuted archive for scientific journals that are published online. Once a user of a library computer that has LOCKSS installed accesses a participating journal, the content is stored locally and then distributed over the LOCKSS network.

Interesting about this process is that LOCKSS uses some kind of quality control. The different machines of the network compare their version of a document and then vote on the integrity of the archived material. Incomplete or damaged copies are then replaced by the version that is voted best.

This picture from the LOCKSS website illustrates this process:

LOCKSS quality control

Obviously LOCKSS has tpo deal with copyright issues. Digital versions of scientific magazines are oftentimes very expensive, and most pulishers wouldn't be too happy about a free distributed archive of their content. LOCKSS ues the original login mechanisms of the publishers to deal with these aspects.

However, makers of LOCKSS also prepare for a time when publishers might not be available to grant access to their content. They are in the early stages of building a much bigger archive called CLOCKSS, that would be free to everyone under certain circumstances. From the website:

"Content archived in CLOCKSS nodes will be made available following a “trigger event” that could result in long-term disruption of availability from the publisher. Upon such a trigger event, the publishers and librarians decide collaboratively whether stored materials should be made available for a limited or an indefinite period. Materials, when available, will be available to all."


The idea apparently is to build a dark archive that will only be made available once the original archives are destroyed or inaccessible due to natural disasters - kind of like a last-resort P2P network.

Tags: , , , , , , ,