The KB, the national library of the Netherlands, has recently started a research project to archive Dutch websites. The objective is to develop a method for archiving and making the archived sites accessible. And this is no small beer as the Dutch web contains and estimated 1.4 million active websites and more than 60 million web pages.
Reason for the project is findability and preservation of digital heritage. As a website has a life expectancy of no more than 75 days, the risk is great that it will go into oblivion and yield a ERROR 404 message (the error message is ascribed to a room at CERN, in Switzerland where the world wide web was developed; in the room all the error pages were collected; forget this story, as there was no room 404 at CERN). And just as books, newspapers and magazines represent the culture of a country and are icons for a particular time, websites will show how we communicated, published and acted at this time.
It is not the first time that the KB starts up a program to preserve websites. In the start-up of internet in the Netherlands the KB played a role in making sites accessible in the national home page. The library even archived sites, asking how much the volume would be in 10 years, whether they had to preserve every site and what kind of problems they would encounter keeping it alive and accessible even when the browsers change (at that time it was still Netscape, as Microsoft had not caught up with Internet yet). Eventually, when the government grant had been spent, the project was dropped.
The KB National library has always been in the forefront of electronic preservation. In 1994 the people at the KB like Mr Jansen and Ms Van Rijswijk started to think about an electronic depot. They started to collect floppy discs, CD-ROMs and CD-Is (a Philips flirt with publishing). It was not easy fort hem to get copies of these publications as there is no legal force as with books in the Netherlands. So they missed the experimental CD-ROMs and the first generation of commercial CD-ROMs and CD-Is (they are in my museum; I did not throw them away while moving).By 2002 the KB had an operational e-Depot, where electronic publications, especially of scientific articles, and CD-ROMs are collected; the library has agreement with scientific publishers as Reed Elsevier, Springer, Blackwell.
Using the experience of the e-Depot and experimenting in the remainder of 2006 with technology, organisation and finances, the KB wants to start archiving a collection of 120 selected websites from the Dutch domain. Of course the experiment calls for questions, when we are talking about the Dutch domain. Does it mean .nl sites only; Dutch language sites only; sites produced in the Netherlands? What about a .com site produced in the Netherlands and in Dutch or will this blog be archived as it is produced by a Dutch national, but on a .com site and in English. Or even more complicated: second generation immigrants who have become Dutch citizens, with sites in Arabic, Italian or Spanish?
Another question is how often you will archive a site. The site of a Dutch newspaper will change every day, if the editors are lazy, or continually as news never stops.
Of course these are some of the questions that will pop up. But what about the legal side? In the Netherlands every publisher has to send a number of copies to the national library and can lend it upon request. But what about a website which has been closed as it was offensive to the royal house or indecent (indecent books the KB has to accept). For the legal problems the KB will co-operate with the eLaw@Leiden, Centre on Law and Information Society of the University of Leiden. Questions will be on copyright, database right, privacy and archive law. This study will result in a model agreement between the KB and the site owner.
It looks like a massive task even for such a small country with 16.5 million inhabitants and only 1,4 million websites. But it is not new. There is of course already the Internet Archive with the Wayback Machine. And an European archive is on its way. Yet the Dutch project will have many questions of its own and will have to start thinking of a Wayback Machine of its own.
Tags: preservation, digital heritage, digital library
Blog Posting Number: 499