Project Description
Scope and motivation
Enhancing Services to Preserve New Forms of Scholarship aimed to investigate a variety of enhanced digital scholarly publications to identify which of their features can be preserved at scale using tools currently available, and which are likely to be lost over time. All of the publications were book-like, in that they were comparable in scope to scholarly books, although publishers and users may not consider some of the publications to be books.1
The project included two main activities. First, publishers transferred digital assets and metadata for scholarly publications representing varying content types to the participating preservation service organizations. The preservation practitioners analyzed the materials sent in order to determine whether their existing processes could be adapted to reliably preserve a publication as a whole using tools currently available. The content selected for this project ranged from formats such as EPUB with audio or video supplements, to bespoke web publications with complex interactive features, annotations, and/or dependencies on third-party platforms. The transfer and analysis of publications happened in three three-month sprints in which publications were grouped by complexity and likely methods for preservation.
Second, these findings were reviewed by an invited team of librarians, archivists, publishers, and technologists in order to gather inputs for a set of guidelines. The guidelines provide advice to publishers and scholars for creating enhanced digital publications that are more likely to be preservable, or at least ensure that the implications of adding certain features are clear so that alternative paths can be taken when possible.
Partners
Project participants represented scholarly publishers, preservation services organizations, and libraries that may provide publishing services, preservation services, or both. Publishers included NYU Press, Michigan Publishing, the University of Minnesota Press, UBC Press and Stanford University Press. Four out of five of the participating publishers also participated as platform developers: NYU Press for Open Square, Michigan Publishing for Fulcrum, the University of Minnesota Press for Manifold, and RavenSpace at UBC Press. Preservation service organizations included CLOCKSS, Portico, and the libraries of the University of Michigan and NYU.
At NYU, a project manager was hired to oversee day-to-day activities for the project. They organized the content transfer sprints as well as the assessment reviews with invited experts. At Portico, a senior research developer analyzed content, assessed publications for preservation readiness within Portico, and contributed to the preservation guidelines for publishers. The CLOCKSS technology development was carried out by the LOCKSS team at Stanford University. They likewise analyzed content and assessed publications for success within the CLOCKSS preservation service.
How the work was organized
Our 18-month–long project was divided into three sprints, with publications assigned to the sprints according to their technical features. The team processed the least complex publications in the first sprint and the most complex publications in the third sprint.
During the first sprint, Portico and CLOCKSS worked with EPUB-based publications from Michigan Publishing’s Fulcrum platform (https://www.fulcrum.org) and NYU Press’ Open Square (https://opensquare.nyupress.org). These publications have been ingested into web-based reading systems, and include a variety of multimedia and supplementary material either within the EPUB itself or as a platform-level resource
During the second sprint, Portico modeled solutions for preserving web publications with a linear, text-based structure on Manifold (https://manifoldapp.org) and digital publications from Michigan Publishing not on their Fulcrum platform. Like the publications in the first sprint, these publications allow for many web-based interactions, but are limited to a predictable set of interactions. However, Manifold publications allow for a broader range of added digital resources, both alongside and apart from the main text. Many of the publications in both the first and second sprints support enhanced features such as annotations, embedded multimedia, and data visualizations.
The third sprint covered the most complex, media rich, and nonlinear publications for which an interactive experience is at the forefront. In most of these more dynamic works, third party dependencies are an integral component. In this sprint, Portico and CLOCKSS worked with publications from UBC Press’s RavenSpace platform (https://ravenspacepublishing.org), Stanford University Press (https://www.sup.org/digital/), and Michigan Publishing.
What we did
The workflow within each of the sprints was designed to capture data from the participants during each phase of submission and evaluation for a publication. (The template used to record this data has been reproduced in Appendix B.)
To begin, publishers submitted publications to be considered for processing. The project manager organized and assigned these to sprint project teams allowing for several publications to be in flight at the same time. During an initial evaluation phase, the assigned publishers and preservation partners collaborated to perform a detailed review of each publication. Together they defined the core intellectual components of the publication — those that must be preserved for future audiences to fully understand the work’s substance and arguments. Reviews included detailed instructions for the playback or reading experience of the material submitted. They described what an intended audience should be able to do when the archived content is made available in the future. These core intellectual components served as acceptance criteria for the success of the work done in subsequent phases. In addition, description and documentation of these components gave preservation providers a more complete understanding of the context and dependencies for a work.
Pre-Transfer Activities
Pre-transfer activities included a determination of how the publication would be transferred to the preservation partner followed by a detailed description of the content made available. For submission information packages sent via file transfer, publishers provided a full description of the file types, what each file or group of files represent, and how the files together form the work to be preserved. Included in these files was any available metadata and information about how the metadata is mapped to corresponding files. For web transfers, publishers described in detail the content made available to a web harvester. The publishers and preservation providers worked together to define content sets and starting points. And for works that were to be emulated, the publishers described the content made available so that the publication could be recreated on a virtual machine that could run in an emulation environment. In all of these scenarios, publishers noted any external dependencies on media or software that were not part of the content package marked for submission.
Preservation Actions
In the preservation action phase, the preservation partners evaluated the submitted materials and either (a) attempted to preserve the work as outlined during the initial evaluation or (b) created a detailed mockup of the proposed preservation process. They documented iterative and final decisions about preservation actions, as well as any concessions made. Any questions, roadblocks, or tasks that warranted further exploration were noted and spun off into issue tickets, to be worked on separately from the primary work they were generated from. The publisher and preservation partner collaborated to ensure that the publication could be successfully recovered from the preservation copy according to the agreed-upon acceptance criteria. Works that progressed through the preservation actions were moved forward for assessment.
Evaluation
At the end of each cycle, the publisher answered questions related to the playback experience of the preservation copy of a work. Their answers captured the degree to which the archived content available matched with the preservation goals and expectations about what would be preserved. The preservation provider responded to prompts that aimed to capture what was preservable using current tools. They recorded any constraints such as technical limitations or limits on what was feasible in the time frame provided.
Together, the project team recorded lessons learned from each work, which form the basis of guidelines for better preservability. We made note of modifications that a publisher could have made during the creation of the original work to improve the preservability of the material while maintaining the essential aspects of the content.
Enhancing Services to Preserve New Forms of Scholarship came on the heels of a wave of projects from university presses and academic libraries that built infrastructure and capacity for digital monographs (Waters, 2016). While the work of these projects was of particular interest to us, and we focused solely on long-form outputs, we hope that our research and the resulting guidelines will apply to digital scholarly publications more generally.↩︎