Start Date
8-11-2016 2:10 PM
Description
This article outlines one in-house model for archiving and providing access to HTML-based news in the Kentucky Digital Newspaper Program (KDNP) at the University of Kentucky (UK). To allow for search and retrieval of HTML-based news in the KDNP which already contains news content digitized from analog sources, the encapsulation of HTML content using XML encoded CDATA strings read by a prototype open-source PHP viewer is described.
Notes
The downloadable item is a presentation-based article published in the conference proceedings. It has a different title (Archiving and Accessing HTML-Based Newspapers Using XML and CDATA Strings) and its copyright information is as follows:
Copyright © 2016 by Eric Weig. This work is made available under the terms of the Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
Harvesting and Parsing an HTML-based Newspaper
This article outlines one in-house model for archiving and providing access to HTML-based news in the Kentucky Digital Newspaper Program (KDNP) at the University of Kentucky (UK). To allow for search and retrieval of HTML-based news in the KDNP which already contains news content digitized from analog sources, the encapsulation of HTML content using XML encoded CDATA strings read by a prototype open-source PHP viewer is described.