Abstract
Many of the challenges inherent in archiving dynamic web content revolve around the capture and playback of ever-evolving social media sites. However, older dynamic web content that has ceased to evolve continues to elude the tools and conventions most widely available to web archivists. This presentation will share work conducted to preserve an online digital humanities project that defied preservation via Archive-It and other crawler-based web archiving tools. The presentation will, moreover, offer an exploration of alternative options for web archivists struggling to preserve sites dependent on server-side processes for their essential functionality.
Suda On Line is a collaborative digital translation of a 10th-century encyclopedia. The interactive project was initially developed in 1998 and constructed as a website using CGI scripts, a bespoke database suite, and server rules that, taken as a whole, required user input via search fields and clicks to dynamically generate pages. With over 30,000 entries compiled by more than 200 scholars from 20 countries across five continents, the last entry in the Suda was translated on the site 16 years after it launched. As the primary collaborators responsible for the site retired from academic work, they found an entity willing to temporarily host their server, but that arrangement is winding down. It has no apparent long-term home willing to maintain and update the site in perpetuity. As a result, one of the project’s original authors, with the permission of his collaborators, asked their university archives to archive this collective work of scholarship, and the archives accepted the task.
The university’s archives discovered that tools such as Archive-It and WebRecorder were capable of mimicking the look of the Suda online interface, but none of its core functionality. As a secondary means of preserving the site, the archives were granted a copy of the site’s full Ubuntu server image, packaged as two virtual machines within an OVF file. This presentation will unpack the various paths and cul-de-sacs explored in the process of preserving and providing access to the Suda On Line, including tools such as ReproZip-Web and Oracle VM VirtualBox. It will also reflect on the feasibility and merit of providing offline access to tools, research, and scholarship that in their active lives of creation and purpose were so fundamentally online.
Document Type
Presentation
Publication Date
4-25-2024
Repository Citation
McDonnell, Andrew, "Preserving the Uncrawlable: Serving the Server" (2024). Library Presentations. 273.
https://uknowledge.uky.edu/libraries_present/273
Included in
Archival Science Commons, Byzantine and Modern Greek Commons, Digital Humanities Commons
Notes/Citation Information
International Internet Preservation Consortium (IIPC) Web Archiving Conference, National Library of France, Paris