Standards

Deborah Anderson, University of California, Berkeley

§ 1 Introduction

For digital humanities data, a standard has been defined as “codified rules and guidelines for the creation, description and management of digital resources” (Gill and Miller, 2002). Standards can be classified as a de jure standard, which may be mandated by law (or may be used to designate a formal standard), or de facto standards, such as the Text Encoding Initiative, which enjoys widespread use and acceptance.

There are a number of introductions to standards that provide a background on standards and how to participate. Guides are frequently available from national and international standards organizations. In terms of data curation, the basic underlying theme is that standards encourage interoperability, although most guides are written from the viewpoint of business, and not humanities projects.

Those standards of interest to digital curation projects are largely in information and communication technology (ICT). Today, ICT standards are developed by formal Standards Developing Organizations (SDOs), such as the International Organization for Standardization (ISO), as well as national and regional standards organizations.

Finding the relevant standards for humanities data curation may be difficult, because the number of ICT standards is growing and can be fragmented across industry, consortia, academic groups, and standards organizations. Most standards organizations have search portals (such as those provided by ISO, ANSI, and NISO), but a more effective retrieval method might be to locate the list of standards maintained by a relevant community, such as the list for librarians and archivists as maintained by Library of Congress (or the portal developed for archivists, maintained by Society of American Archivists).

There many reasons for abiding by such standards for data curation. Standards make data interchange possible across different programs, application software, or computer systems, especially if the standard has been widely adopted by industry and the academy. Standards also help preserve data for the long-term because the data does not follow not an ad hoc system, which may not be recoverable in the future. As a standard matures through time and eventually become outdated, migration to the newer standard is easier when a project initially followed a standard.

Resources: Introduction to Standards

ANSI. StandardsLearn.org: American National Standards Institute.
This website offers three free short e-learning courses with very basic information on standards and tales from the history of standardization, as described from the American viewpoint and stressing the value in terms of global competitiveness in business. ANSI is the national standards body that accredits other standards organizations in the U.S.

Citation: American National Standards Institute. StandardsLearn.org.

ANSI. The Handbook of Standardization: A Guide to Understanding Standards Development Today: American National Standards Institute.
This 15-page booklet is freely available and provides an overview of standards development via the American perspective, as provided by the American National Standards Institute

Citation: American National Standards Institute. The Handbook of Standardization: A Guide to Understanding Standards Development Today.

Re-inventing the Wheel? Standards, Interoperability and Digital Cultural Content: Gill, Tony, and Paul Miller.
This brief article, though dated, articulates the reasons for the use of standards on projects in the humanities, and identifies some of the problems – such as, too many standards to choose from, thus creating problems in interoperability.

Citation: Gill, Tony, and Paul Miller. “Re-inventing the Wheel? Standards, Interoperability and Digital Cultural Content.” D-Lib Magazine., January 2002, vol 8, no. 1.

The Tao of IETF: A Novice’s Guide to the Internet Engineering Task Force: Hoffman, Paul.
A “gentle” guide to the standards group Internet Engineering Task Force, specifically geared for newbies. IETF is broadly defined as a group whose mandate is to produce technical documents that make the Internet work better. While there are several areas of current work, a few may be applicable to those in data curation, including a Working Group on IRIs, Internationalized Resource Identifiers, which are used to identify resources. This guide discusses the history of IETF, its structure, the standards process, and useful tidbits, such as how Working Groups reach consensus and the dress code at meetings. The guide mentions the value to participation for computer science students and faculty, but no reference is made to those working in the humanities. It serves as a background document on various facets of standardization, and could act as a useful template for guides to standards development in other organizations. Other guides to IETF standardization are available on the IETF website.

Citation: Hoffman, Paul. “The Tao of IETF: A Novice’s Guide to the Internet Engineering Task Force: draft-hoffman-tao4677bis-13.” Internet-Draft. October 2011.

My ISO Job: Guidance for delegates and experts: International Organization for Standardization.
This introductory level booklet is a guide for delegates to the International Organization for Standardization (ISO), so it is geared specifically to those participating in ISO-level standards work. For those who may be first-time ISO participants, the booklet usefully summarizes the different stages of its projects, which can be confusing for newcomers, and also explains the business (or “market”) aspect of standards development, which is important to recognize, even for those coming from academia or libraries.

Citation: International Organization for Standardization. My ISO Job: Guidance for delegates and experts. Geneva: ISO, 2008.

Joining in: Participating in International Standardization.: Updegrove, Andrew.
This booklet expands beyond the information in the booklet My ISO Job, by providing additional details on ISO-level standards work for its delegates. It would be primarily of interest for those attending ISO meetings.

Citation: Updegrove, Andrew. Joining in: Participating in International Standardization. Geneva: ISO, 2007.

The Essential Guide to Standards: Updegrove, Andrew.
This freely available handbook on standards is primarily written from a business perspective. Chapter 2, “Participating in Standards Setting Organizations: Value Propositions, Roles And Strategies,” contains only a short section on universities (3.2.2.). A humanities project might, for example, use for its character encoding Unicode (an international standard), provide metadata in the header according to the TEI P5 guidelines (a de facto standard), and use a Cascading Style Sheet (developed by World Wide Web Consortium, a standards organization).

Citation: Updegrove, Andrew. The Essential Guide to Standards. ConsortiumInfo.org, 2007.

The Unicode Standard: Tne Unicode Consortium.
The Unicode Standard is an international character encoding standard. For data curation, relying on Unicode as the basis for the letters and symbols used in text will guarantee interoperability, because this standard is widely adopted in computers and software and in various projects. Unicode contains the same characters as ISO/IEC 10646, which is its ISO counterpart. The “unicode.org” website is the most readily accessible way to view the standard today. Properly speaking, The Unicode Standard also includes additional specifications beyond character codepoints, just as line-breaking information.

Citation: Tne Unicode Consortium. The Unicode Standard.

Preservation: Smith, Abby.
This article is a very useful resource focusing on the value of standards for humanities data curation. Several funding agencies, including NEH and NSF, are now requiring a data management plan, which includes documentation of the data, metadata format, and content standards being used.

Citation: Smith, Abby. “Preservation.” A Companion to Digital Humanities. ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.

International Organization for Standardization (ISO): International Organization for Standardization.
ISO is the largest standards body today. The work on information technology, which touches data curation, is carried on in by its Joint Technical Committee, JTC1. JTC1 is ISO’s main organ for standards development in information technology. Under JTC1 are various Working Groups (WG), Special Working Groups (SWG), and Subcommittees (SC), whose work covers topics relevant to humanities data curation including coded character sets, digitally recorded media for information interchange and storage, document description and processing languages, computer graphics, image processing.

Citation: International Organization for Standardization. ISO – International Organization for Standardization.