“Data journalism stories are among the most innovative and original works being produced by newsrooms today,” observes Katy Boss, Librarian for Journalism, Media, Culture & Communication. Iconic examples she lists include Old Oil Wells by the Los Angeles Times, Where Harvey’s effects were felt the most in Texas, by The Texas Tribune, Are Hospitals Near Me Ready for Coronavirus? by ProPublica, as well as every COVID-19 data visualization built with Mapbox software. “Unfortunately,” she says, “they’re disappearing.”
These data- and image-rich sites are likely to be lost unless someone develops the capability to save them; current web archiving tools are not up to the task. Katy and Vicky Rampin, Librarian for Research Data Management and Reproducibility, are about to address the problem with a $249,999 grant from the IMLS. The project, Preserving the Dynamic Web, begins in September.
Previously, the IMLS had funded a 2018-19 planning project, also proposed by Katy and Vicky, to develop the first-ever emulation-based web archiving prototype capable of capturing the look, feel, and functionality of a database-reliant news app. The prototype, ReproZip-Web, is an open-source web archiving tool aimed at saving data journalism stories. The new project will extend ReproZip-Web’s capabilities, improve its usability, and build an ecosystem of tools for the preservation of and access to dynamic web applications.
There will be two main streams of work: 1) technical improvements to and expansion of ReproZip-Web and Webrecorder and 2) user experience (UX) testing with digital archivists, data journalists, and computational humanists to optimize adoption and usability. The first stream will be led by two software engineers, Rémi Rampin, lead developer on ReproZip, and Ilya Kreymer, lead developer of pywb and Webrecorder. Katy and Vicky will lead the second stream and hire two graduate student assistants, one for each facet of the project.
The Libraries’ Digital Scholarship Services Department as well as the ProPublica data desk team and the data and graphics department of the Los Angeles Times will each provide at least one app for testing as well as participate in UX sessions. Katy and Vicky expect more organizations to contribute apps as the project gets underway. Their proposal sums up the goal succinctly:
"Newsrooms like ProPublica have a decade of websites they want and need to begin archiving in a thoughtful and responsible way, and this project will make that possible. DH scholars worried about the ephemerality of their work will have a solution to keep it accessible in the long term. With a production-ready version of ReproZip-Web, data journalists in the newsroom and DH scholars will be able to pack up each project into a single archivable, distributable, and preservable .rpz file that can be transferred to libraries for long-term preservation and access."