Digital archives keep track of lost websites, lapsed domain names from the early years
Users got their first taste of power in 1994, with public-journal formats that allowed anyone to post their thoughts for the world. We didn’t even call them blogs until 1999, but by 2004, Blog was the Oxford English Dictionary’s Word of the Year.Updated: Aug 14, 2020 00:09 IST
What do you remember of the first time you used the internet? The screee-beep of the modem connecting? The joy of seeing little computer icons linking to each other in your taskbar, indicating that your dial-up was working? Or paying the cyber café guy Rs 10 to book your train ticket?
When India first got online 25 years ago, it was slow going. Pages took ages to load (with the hourglass icon turning endlessly). But once we got comfortable, we were hooked. You filled your Muzik folder with MP3 files, saw the MSN-Yahoo rivalry unfold like a repeat of the ’80s cola wars, did a web search through Ask Jeeves. And it was all free, and ad-free.
Users got their first taste of power in 1994, with public-journal formats that allowed anyone to post their thoughts for the world. We didn’t even call them blogs until 1999, but by 2004, Blog was the Oxford English Dictionary’s Word of the Year.
The internet’s first-ever website, Info.cern.ch, has been saved for posterity by the guys who built the internet. But where do you go to see the rest of it? When domain names lapse, or companies collapse, the websites can vanish without a trace. You can’t pull an old one from the shelf. That’s probably why The Internet Archive is such a gem.
Born 24 years ago — just a year after India connected to the Internet — The Internet Archive is an American non-profit organisation that digitises films, books, letters, images, audio, video and software programmes. Its key project, The Wayback Machine trawls the web, copying pages, to build a library of the internet itself.
The Machine lets you see what a page looked like when it was archived, even if the site has changed or been taken down. More than 458 billion pages have been saved so far.
It’s not nearly enough. The internet has more than 60 trillion web pages. And with social media, there’s more to archive than ever.
Meanwhile, not everyone’s thrilled about record-keeping. Because the Wayback Machine collects site caches without asking, it’s raised questions about copyright infringement and privacy. In 2017, Internet Archive was among 2,600 sites banned by the Indian government as part of the fight against digital piracy. And earlier in the coronavirus disease lockdown, publishers sued the non-profit for making its digitised library available globally.
PIECES OF THE PAST
Symbolics.com, the first domain name ever registered on the internet, was already 10 years old when India logged on in 1995. It used to be a computer programming business. But it is now the Big Internet Museum.
Meanwhile, ArchiveTeam.org has been keeping track of sites and companies about to go bust and archiving their online material, since 2009. They describe themselves as “a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage”, and also offer advice on how to archive your own data.
Some governments are taking big steps too. In Sweden, every web page that ends in .se has been saved by the National Library’s Web Archive division. The British Library has archived 6 billion pages.
India doesn’t have such an archive, except for election-related data. But the Wayback Machine has been paying attention to India almost from the start. There are more than 34,000 captures for just the Hindustan Times site, for instance, some dating back to 2001.