Policy, Practice, and Law

Melissa Levine, University Library, University of Michigan

§ 1 Introduction

Memory institutions have a long history of curation: collecting, preserving, sharing, describing, and interpreting all kinds of tangible material. To address the legal issues for digital curation projects, it is important to start with the “big picture” questions: what does digital curation in the humanities mean for our collective duty as stewards of memory? How is digital curation in the humanities a fundamental concern for new invention, ideas, and expression? Asking and answering the big questions help shape the way legal issues are addressed in the context of humanities projects. Collecting digital creative or original works prompts concerns about a wide range of possible legal issues – copyright, contract, privacy to name but a few. These legal responsibilities are significant but should be considered in the larger context of the socially-valuable stewardship responsibilities libraries, archives, and museums engage in or new ways of teaching and approaching education.

Where to begin? Start with core principles and encourage decision makers to learn to spot the legal issues, address them in written policy and practice, and rely on the growing body of standards as appropriate to your effort. Is anyone harmed? What is lost if we fail to act? How is our role different from the private sector? How can we collaborate across funding opportunities (whether public, private, or philanthropic) to meet the potential of digital curation in the humanities? Different approaches to legal issues may be needed for different kinds of projects. Are you scanning special collections of distinct analog materials (books, sound recordings, artwork)? – or collecting and preserving digital-only web-based material? – or setting up preservation and emulation collections for digital video games? High-level principles, policies, and standards can help you develop responsible, productive projects even for very different kinds of materials or collections.

§ 2 Planning Questions and Issue Spotting

When people think about legal issues, they commonly think of contracts, copyright, privacy and similar concerns. However much of the information with legal significance is also to be found in written policies for your organization. It may be legally significant for your project to have written practices and procedures for your specific project. Such documents can be as simple or complex as the situation warrants. They provide a framework for consistency that helps with project management and orientation for staff working on the project. It is worth investing in these kinds of documents and revising them as needed, tracking version dates. Policies and procedures are important for good communication and keeping projects and people focused on a given goal. Further, they are important legal evidence – knowing these kinds of records are in place helps minimize legal exposure and helps minimize that exposure if questions do arise. This is why there is an emphasis on guides, standards, and planning in the Resource section. Community standard and practice and the specific policy and procedure associated with a project are important as a functional matter as well as a legal concern.

The following questions are meant to help project managers plan and recognize some of the matters that should be addressed – this list is by no means comprehensive. (This list is influenced by the Digital Preservation Workshop led by Nancy McGovern and Kari Smith: http://www.icpsr.umich.edu/dpm/workshops/instructors.html.)

Content in the information ecosystem

  • Did you create it? Can you share it?
  • Did you consider Creative Commons and open licenses generally as tools for communicating your intentions with regard to your own work?
  • If you did not create it, is there a copyright holder? Do you need permission to copy the work?
  • What level of access will you provide?
  • How will you document all of the known legal characteristics in metadata?
  • Are you licensing materials or software? How do the terms of the licenses affect long-term usability?
  • Did you get information when possible from creators of content e.g. copyright, privacy, donor restrictions?

Collecting or depositing in an archive or digital repository

  • Do you have the legal right to collect, preserve, and or provide access or use the materials?
  • Are there any copyright concerns in the material being collected or deposited? What are they and how are they documented?
  • Are there privacy concerns related to the material being collected?
  • Have you looked at examples of submission agreements – will you use a model or tailor it to your project? (See Resources for agreements in the TAPER project, ICPSR, Data-PASS, MetaArchive, Deep Blue (UM), HathiTrust.)


  • Who is in charge? Who is the lead? Who are the parties?
  • Is there a steering committee, an operations committee – how are they composed?
  • Who are the authorized representatives for all roles, approvals for replacements; how will you communicate and document these matters?
  • How will new members be handled?
  • What rights to members have to contribute data? Do they retain any copyright or intellectual property rights? Do they have information about copyright, privacy, or other rights and is it included in metadata or some other manner?
  • How can members withdraw or be removed? If this is necessary, will their contributions remain with the corpus of the collection? Will they have access to their contributions after their membership ends?
  • How will disputes be handled?
  • How will changes to agreements be handled (in writing)?


  • Is the content your creation or someone else’s?
  • Is there any impetus to secure the resulting collection or body of material or can it be shared and remixed?
  • Do any of the exceptions to copyright apply? — Section 107, Fair Use;
    Section 108, Exceptions for Libraries and Archives; Section 110, TEACH Act.
  • Even if there are no copyright concerns, there may be other legal, ethical, or practice reasons for managing collections and access to collections.

Privacy and Human Subjects

  • Are there any privacy or confidentiality requirements such as medical, financial, or other personal information included in the material?
  • How will private information be protected? Should you collect it at all – if so can you manage the security and access control needed?

Access and Security

  • Are there qualifications or requirements for users of the collected materials or
    “terms of use”?
  • Who will have access to the materials online? Who will have access to the servers/storage?
  • What security restrictions will be needed, over and above the normal security for a data installation?
  • Did you get information for the desired level of access in terms of duration (short, mid, or long term); who will have access?
  • How will you control the material to conform with any rights you have cleared, ensure metadata is accurate over time?
  • Have you addressed accountability in terms of responsibilities and documentation? Do you need to plan for audits for financial matters, security or other areas of concern? Will you do periodic self-audits or self reviews?
  • How will you deal with risk of loss as a legal matter, as a financial matter, and as a technical or practical matter (e.g. redundant backups)?

Derivative research

  • How will derivative research be managed?
  • How will derivative research be made available to others inside and outside your organization?
  • How will publication of derivative research be managed legally and logistically?
  • How will the “remixed” data be maintained with regard to changes made to source data in a digital repository or collection?

Policy considerations

  • Are policies and procedures in writing, publicly available where appropriate?
  • Are there multiple participants? If so, responsibilities should be stated explicitly in agreements with and among the participants.
  • Are policies and procedures reviewed on an ongoing basis to confirm they are accurate, being followed, and to consider needed updates?
  • Are there policies and procedures addressing copying, need for redundant data, authentication systems, firewalls, backup, and disaster preparedness, staff training?

Additional Considerations

  • Is your project consistent with your organization’s policies and procedures?
  • Is your project documented in a way to ensure transparency? Is transparency appropriate in some areas e.g. privacy, security matters? Can you distinguish these?
  • Are there opportunities for open design and if so what are the implications for practical policy and procedure?

§ 3 University of Michigan Library: A Case Study

The environment at the University of Michigan Library of is one of “policy in action” in support of the campus, research and scholarship generally. The culture is proactive with attention to both opportunities and challenges. There is high value placed on collaboration, cooperation, and shared responsibilities. Many projects are related by an overall culture of creative thinking and possibility combined with nuts and bolts pragmatism (“how” do we do it?). This culture is further supported by vital help from the University’s office of general counsel, which is able and willing to think creatively with the Library to find ways to address new and evolving legal scenarios in a manner consistent with their overarching duty to the University as a whole. In this environment, seemingly distinct projects and services are thematically or functionally related because of this institutional culture.

Deep Blue is the University of Michigan’s institutional repository service. To participate in Deep Blue, authors (as the creators and copyright holders of their work) must enter an agreement. These standard agreements and the intellectual property policy that governs the service are easily accessed at the Deep Blue website. (See Resources.) The agreements are very simple and straightforward. They are significant because they only require authors to grant permission to the repository for us to do necessary tasks – no transfer of copyright is required or needed. Authors provide a simple non-exclusive grant to the repository to “display and distribute the submission including its abstract and descriptive information in electronic format in accordance with the Repository’s policies, copy, convert or migrate the submission to any medium or format for the purpose of preservation and access, keep more than one copy of the submission for purposes of security, back-up, and preservation.” Deep Blue’s approach to author agreements reflects the need for permanence and consistency as well as a commitment to open access to the scholarly work product of the University.

The HathiTrust is another manifestation of this commitment to preservation and access. HathiTrust presents more possible permutations than Deep Blue regarding agreements and copyright. As a partnership of major research institutions and libraries, the HathiTrust works to preserve the cultural record and ensure that it is available in the future. The HathiTrust’s website reflects overarching policy and culture as an informative, transparent, living record of its policy and governance documents which facilitate collaboration among partner institutions .

HathiTrust preserves works as, for example, a book as an artifact and as a cohesive object to be read– but also makes it possible to treat the same content as data. This in turn allows for non-consumptive research. Non-consumptive research allows researchers to use “computational analysis of one or more books without the researcher having the ability to reassemble the collection. Rather than reading the material, researchers use specialized algorithms to analyze text as a massive data set…” The Sloan Foundation is funding a research project by Indiana University’s Data To Insight Center (D2I) on non-consumptive research with HathiTrust as a large mass digitized collection. D2I is partnering with the HathiTrust Research Center (HTRC) and the University of Michigan’s Department of Electrical Engineering and Computer Science on the project. This kind of research opens new possibilities for scholarship and discovery.

For books “as books”, the HathiTrust provides full public access to read works in the public domain — and for books that for which permission is obtained from the copyright holder to share publicly. (We provide Creative Commons options available for interested copyright holders.) For the public domain determination, we use conservative cutoff – 1923 for US works, 1870 for non-US works. But there are many works beyond that category that are likely in the public domain.

To research more with more depth and identify books that may be in the public domain, the IMLS generously funded the Copyright Review Management System (CRMS). In CRMS, the University of Michigan Library partnered with other research libraries to ascertain the copyright status of works published in the US between 1923 and 1963. In that time frame, there were formalities that were required to maintain a copyright in the US; failure to do so meant a work entered the public domain. The analysis is complex, so conservative cutoff dates are typically efficient. However, by developing a transparent, documented process executed in collaboration with other libraries, every day brings new information about the copyright status of books. Reviewers trained in the process examine works in the HathiTrust – their access is secured and limited to the review process because of the need to protect scans of books that are subject to copyright. In thinking about how legal status (and thus metadata) changes, there is a more complex sequence for providing access and for determining the rights status of works in HathiTrust than, for example, Deep Blue. The IMLS will provide another grant to the Library (the initial grant period concludes November 2011). The new grant allows us to work with a large number of partner institutions to continue our revies of books published in the US from 1923 to 1963 and to develop a process for learning more about the copyright status of books published in the UK, Canada, Australia and Spain. These grants are opportunities to think further about legal mechanisms to ensure access and thus relevance for books in our collections. Copyright-related information is an invaluable component to thinking about how we can improve and innovate.

These examples demonstrate the way different perspectives and roles can be fluid and influence each other. A general observation: experience with analog collections offer lessons for ways to approach the multidimensional issues of digital curation and associated legal issues. For example, the context of a collection of analog material may covey meaning as a body of material. Further, there may be significance in each of the items that make up the collection and further significance in how those things those things were made as well as the materials they are made from. All of these considerations exist in the realm of digital curation, making for even more multidimensional research opportunity as well as growing complexity. Each of those perspectives may have different legal implications. Get comfortable thinking about your work as continuous problem solving because there are rarely clear or definitive answers; and if there are, they will change over time. Institutional and professional culture as well as overarching policy are key aspects of working through legal questions.

Resources: General

Deep Blue. University of Michigan Library. Web. 9 Dec. 2014.

“Deep Blue is the University of Michigan’s permanent, safe, and accessible service for representing our rich intellectual community. Its primary goal is to provide access to the work that makes Michigan a leader in research, teaching, and creativity.”

Deep Blue Agreements. University of Michigan Library. 22 March 2005. Web. 9 Dec. 2014.

UM Institutional Repository Intellectual Property Policy, Version 1.0, 22 March 2005.

HathiTrust Digital Library HathiTrust. Web. 9 Dec. 2014.

“HathiTrust is a partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future. There are more than fifty partners in HathiTrust, and membership is open to institutions worldwide” (http://www.hathitrust.org/about).

Data To Insight Center. Indiana University Pervasive Technology Institute. Web. 9 Dec. 2014.

The Sloan Foundation is funding a research project by Indiana University’s Data To Insight Center (D2I) on non-consumptive research with HathiTrust as a large mass digitized collection. D2I is partnering with the HathiTrust Research Center (HTRC) and the University of Michigan’s Department of Electrical Engineering and Computer Science on the project. The project will ensure the security of the works in HathiTrust. “The HathiTrust repository contains almost 8.6 million digitized volumes, and about 2.2 million of those — roughly 26 percent — are in the public domain and currently available for non-consumptive research.” This means that knowing what is in the public domain is crucial. See also: “IU Data to Insight Center to lead Sloan-funded investigation into non-consumptive research,” Educause blog, August 10, 2011: http://www.educause.edu/blog/rmcdonal/IUDatatoInsightCentertoleadSlo/233661.

Copyright Review Management System (CRMS). University of Michigan Library. Web. Dec 9 2014.

Funded with generous support from the IMLS (2008-2011).

Erway, Ricky, ed. Well-intentioned practice for putting digitized collections of unpublished materials online. OCLC 5 May 2010. PDF.

From OCLC, this is a clear, straightforward set of principles to consider guiding reasonable decision-making for digitizing unpublished materials. It distills key concerns to a succinct statement that can be used as a model for thinking concisely about copyright concerns in the context of digital humanities projects. This provides a common-sense approach to making collections available online.

Undue Diligence: Seeking Low-risk Strategies for Making Collections of Unpublished Materials More Accessible (audio recordings of presentations). OCLC, 11 March 2010. Web. 9 Dec. 2014.


p class=”bibl-note”>Event on 11 March 2010, 
10 a.m – 4 p.m. Pacific Time. Includes audio recordings of presentations.

§ 4 Copyright

Remember that copyright status changes over time, making it a real challenge. When you can get intake agreements directly from creators that allow you to preserve and use over time, you ease the circulation of material subject to copyright. Agreements to ingest content – like those used for Deep Blue – obtain from the author the rights needed for repository preservation and access while leaving the author with their copyright. The field of copyright is so large and nuanced that I am only providing a few resources here – there are many other comprehensive copyright guides. You should be familiar with basic copyright principles including exceptions under US law. There is active discussion through IFLA and other library groups regarding copyright on a global scale and its intersection with digital collections of all sorts. There are no absolute solutions. It is important to be comfortable with indefiniteness in this arena and critical to act in the letter and spirit of the law. Given this indefiniteness, your written policies and community practices are legally significant.

Resources: Copyright

Besek, June M. CLIR pub 144: Copyright and Related Issues Relevant to Digital Preservation and Dissemination of Unpublished Pre-1972 Sound Recordings by Libraries and Archives. Washington: Council on Library and Information Resources, 2009. Print.

Abstract: “This report addresses the question of what libraries and archives are legally empowered to do to preserve and make accessible for research their holdings of unpublished pre-1972 sound recordings. The report’s author, June M. Besek, is executive director of the Kernochan Center for Law, Media and the Arts at Columbia Law School.”

Besek, June M. CLIR pub 135: Copyright Issues Relevant to Digital Preservation and Dissemination of Pre-1972 Commercial Sound Recordings by Libraries and Archives. Washington: Council on Library and Information Resources, 2005. Print.

Abstract: “This report addresses the question of what libraries and archives are legally empowered to do to preserve and make accessible for research their holdings of pre-1972 commercial recordings, the large aural legacy that is not protected by federal copyright. As the first in-depth analysis by a nationally known expert in copyright law, this report will also be a timely and authoritative aid to the many librarians and archivists who face decisions daily about how to establish priorities for sound preservation.”

Covey, Denis Troll. CLIR pub 134: Acquiring Copyright Permission to Digitize and Provide Open Access to Books. Washington: Council on Library and Information Resources, 2005. Print.

Abstract: “What are the stumbling blocks to digitization? Is copyright law a major barrier? Is it easier to negotiate with some types of publishers than with others? To what extent does the age of the material influence permission decisions? This report, by Denise Troll Covey, principal librarian for special projects at Carnegie Mellon University, responds to many of these questions. It begins with a brief, cogent overview of U.S. copyright laws, licensing practices, and technological developments in publishing that serve as the backdrop for the current environment. It then recounts in detail three efforts undertaken at Carnegie-Mellon University to secure copyright permission to digitize and provide open access to books with scholarly content.”

Hirtle, Peter B. “Digital Preservation and Copyright.” Copyright and Fair Use. Stanford University Libraries, 10 Nov. 2003. Web. 9 March 2015.

§ 5 Privacy

Think about privacy concerns associated with the content being collected – medical or financial information are flags. These should be identified prior to collecting and on an ongoing basis. It may need to be documented in written agreements with subjects. Can information be depersonalized? Consider necessary substantive administrative metadata and how to handle them as private information.

Resources: Privacy

“Institutional Review Board.” Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 2 Nov. 2014. Web. 9 March 2015.

Excellent overview of complex subject. Universities have their own “IRBs”; contact yours if you plan any human subject research. Oral histories and similar research like ethnography typically is excepted from this kind of review as they are generally distinct from the kind of research IRBs were designed to address (like biomedical research). Every university interprets these a bit differently.

United States. Department of Health and Human Services. HIPAA: Health Insurance Portability and Accountability Act of 1996. Web. 9 March 2015.

From the introduction: “A major goal of the Security Rule is to protect the privacy of individuals’ health information while allowing covered entities to adopt new technologies to improve the quality and efficiency of patient care. Given that the health care marketplace is diverse, the Security Rule is designed to be flexible and scalable so a covered entity can implement policies, procedures, and technologies that are appropriate for the entity’s particular size, organizational structure, and risks to consumers’ e-PHI [electronic protected health information].”

United States. Department of Health and Human Services. HHS announces proposal to improve rules protecting human research subjects. 22 July 2011. Web. 9 March 2015.

Subtitle of press release: “Changes under consideration would ensure the highest standards of protections for human subjects involved in research, while enhancing effectiveness of oversight.”

United States. Department of Health and Human Services. “Human Subjects Research Protections: Enhancing Protections for Research Subjects and Reducing Burden, Delay, and Ambiguity for Investigators. Federal Register Notice Number: 2011-18792.” Web. 9 March 2015.

This notice provides a comprehensive explanation of the history and structure of IRB rules. The notice is a solicitation for public comment on how to simplify and update the current system.

§ 6 Agreements

Contracts are legally enforceable promises to do or not do something based on some reliance and/or exchange. A license, a memorandum of understanding, and agreement – all of these may be legally binding contracts regardless of what they are called. Approach contracts as an opportunity for communication, which may lessen the risk or need for legal action (or the need to respond to legal action) once your project is underway. Agreements may be needed between you and other organizations that are collecting or contributing to the body of material, licenses for software to support the whole, intake agreements and more. Some general comments on contracts: state what each party is expected to do and reference any relevant procedures or practice standards. They should address duties, how warranties will be made, indemnifications if any (that is, what happens if the party making the promise fails to meet its duties or breaches), how that will be funded (insurance requirements). Sometimes contracts are silent about certain issues depending on the nature of what is at stake, whether it could be cured, and the nature of the legal entities who are party to the contract – but that should be a conscious choice not an oversight. Think about how agreements can help address rights or duties from the moment of creation of the content as well as through a lifecycle.

Resource: Agreements

“Scholar’s Copyright Addendum Engine.” Science Commons. Creative Commons, n.d. Web. 10 March 2015.

From the description: “The Scholar’s Copyright Addendum Engine will help you generate a PDF form that you can attach to a journal publisher’s copyright agreement to ensure that you retain certain rights.”

§ 7 Data

In the digital realm, content is data. You can look at an object in many ways – as the whole, as the sum of its parts. It quickly becomes rather metaphysical. The sciences provide many resources for scholars and project managers useful for digital curation in the humanities because scientists are accustomed to thinking about data. In the humanities, one needs to think about the same material from multiple perspectives.

Resources: Data

MIT Libraries subject guide: Data Management and Publishing, Ethical and Legal Issues: Massachusetts Institute of Technology Libraries.

Introduction and resources for issues of confidentiality and intellectual property.

The Need for Clear Data Licenses: Andrew Turner.

A succinct explanation of and argument for clear licenses for data.

Open Data Commons. Open Knowledge Foundation, n.d. Web. 10 March 2015.

Discusses legal solutions for open data. Site has interesting section discussing law versus society norms and how they differ.

§ 8 Open Ideas, Open Access

Ideas associated with open access and tools like Creative Commons licenses may help shape the way you approach projects. Applying these concepts in different arena may help with interoperability, access, long-term preservation to name a few areas. As a legal matter, “open” are more fluid and easier to administer. Open access policies functionally remove copyright concerns, easing transactional costs and expanding opportunity in situations where sharing is desired by the copyright holder (typically the author) That said, one needs to also keep in mind privacy responsibilities. You can have content with no copyright issue that still needs careful management or access limits for privacy reasons. Its important to keep these different strands in mind and distinct from one another. Resources are provided as background reading.

Resources: Open access

ARL-SPARC news: News and announcements from the ARL Scholarly Publishing and Academic Resources Coalition: Scholarly Publishing and Academic Resources Coalition.

Recent announcements and site updates from Scholarly Publishing and Academic Resources Coalition (SPARC), developed by the Association of Research Libraries (ARL).

CIC Environmental Scan Final Report: October Ivins, Judy Luther.

Survey of faculty practices and attitudes about open access at universities in the Committee of Institutional Cooperation (CIC), a consortium of the Big Ten universities and the University of Chicago.

Open Access News: Peter Suber.

A good resource for happenings within the Open Access movement. The blog stopped updated in 2010, and the author began working on the Open Access Tracking Project (see below).

Open Access Overview : Peter Suber.

An introduction to the concept of Open Access geared for those who have little to no familiarity.

Open Access Tracking Project (OATP): Editors and administrators: Robin Peek , Jean-Claude Guedon, David Goodman, Athanasia Pontika, Terry Plum. Editorial board: Charles W, Leslie Chan, Heather Joseph, Melissa Hagemann, Peter Suber, Alma Swan, John Wilbanks.

Community-maintained catalog of Open Access-related web resources, hosted at the Graduate School of Library and Information Science, Simmons College.

Resources: Articles Addressing the Benefits of Open Access

Citation Advantage of Open Access Articles: G. Eysenbach.

From the abstract: “Articles published as an immediate OA article on the journal site have higher impact than self-archived or otherwise openly accessible OA articles. We found strong evidence that, even in a journal that is widely available in research libraries, OA articles are more immediately recognized and cited by peers than non-OA articles published in the same journal. OA is likely to benefit science by accelerating dissemination and uptake of research findings.”

Open access publishing, article downloads, and citations: randomised controlled trial: Davis PM, Lewenstein BV, Simon DH, Booth JG, Connolly MJL.

From the abstract: “Conclusions: Open access publishing may reach more readers than subscription access publishing. No evidence was found of a citation advantage for open access articles in the first year after publication. The citation advantage from open access reported widely in the literature may be an artefact of other causes.”

Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact: Hajjem, C., Harnad, S., Gingras, Y.

The authors trace over 1,300,000 in ten academic disciplines to find a correlation between open access designation and higher frequency of citations.

An Approach to Open Access Author Payment: Donald W. King.

Abstract: “There have been hundreds of articles in recent years exhorting the strengths and warning of the weaknesses of Open Access through author payment. This article discusses a few of the favorable and unfavorable issues and proposes an approach that takes advantage of the favorable aspects and overcomes some of the unfavorable ones. It requires extensive government support, which may or may not be feasible, but the approach is presented here nevertheless. Some evidence is given for the potential savings that would be achieved by scientists, publishers and libraries in the US.”

A Defense of Textbooks: Joshua Kim.

Discusses reasons to use textbooks in combination with OER; also see comments.

Science Commons: Author’s Addendum: Creative Commons.

A form for use in changing an author’s agreement with a publisher to retain some freedoms to use an article and to post it online.