Library

Guidelines Digital Reformatting Projects Glossary

Glossary

An effort has been made to write this document in plain language. There are some terms, however, that are becoming common in the context of digital librarianship but which still may require explanation. This is not an exhaustive glossary of "digital library" terms; rather, it highlights terms that are used just within this document. Also, there are terms in this glossary that do not appear in these guidelines, but that are helpful to know nevertheless because they will most likely crop up during a digital project.

Access: As addressed in this document, access refers to enabling a computer user to "get at" digitized materials.

Access architecture: Access architecture generally refers to the construction and structure of a Web site through which a computer user would view and/or interact with digitized material.

Analog: In the context of this document, analog refers to materials that have a physical existence in the real world, such as photographs (paper and gel), books (paper and ink), paintings (canvas and paint), and other objects.

Archival file: This is a digital version of an object designated for long-term maintenance and use. Archival files are usually created at the largest file size and resolution possible to ensure a high-quality copy.

Bandwidth: This is the capacity that a telecommunications medium has for carrying data.

Batch processing: To execute a series of non-interactive jobs simultaneously is batch processing. Usually batch jobs are stored up during working hours and then executed during the evening or whenever computers are idle.

Browser: A browser is software that allows computer users to read and/or interact with documents on the World Wide Web; examples are Netscape Navigator and Internet Explorer.

Convert: In the context of this document, convert refers to taking physical materials and making digital versions of them.

Deliver: In the context of this document, to deliver something means to present a digital object through an electronic interface, usually a Web site.

Delivery copy: As addressed in this document, a delivery copy of something is the digital image that will be presented on the Web site. For example, you may have a very high-quality digital copy made of an item, but it may be so large in file size that it would take too long to download on a Web site. Therefore, a smaller copy of that file is made as the delivery copy for the Web site.

Depositing agent: The depositing agent is the organization responsible for depositing digital materials (and related metadata) to a digital repository. It may also be the scanning facility responsible for the actual reformatting of materials to digital form. The depositing agent may be the owner of the materials, but often the agent is an imaging lab working on behalf of the owning organization (such as the HCL Digital Imaging Group).

Digital: In the context of this document, digital refers to something that is in a computer.

Digital library: The meaning of the phrase digital library varies tremendously, but one simple definition is the use of computers to store library materials appearing in electronic (digital) format.

Digital Repository Service (DRS): The DRS is a product of the Library Digital Initiative at Harvard University and offers a place for libraries, museums, and archives to store their digital objects. The DRS operates as a part of the Office for Information Systems (OIS).

Digitization: In the context of this document, digitization refers to the process of taking a physical item and making a digital copy of it.

Document type definition (DTD): A DTD provides a standard way to define the SGML or XML tags used within a document. For example, if an XML scheme uses the tag <name> Martin </name>, then the DTD defines what "name" in this context means. As an example of how this tagging can be used, to get a list of all names within a document, a program can be written to compile a record of everything that appears within the <name></name>tags.

DPI: An acronym for Dots Per Inch, dpi is a measure of resolution for printers, scanners, and displays. The typical laser printer reaches 300 dpi, though 600 dpi is becoming more common.

GIF: An acronym for Graphics Interchange Format, GIF is an image file format. GIF images are limited to only 256 colors, whereas JPEG images can contain up to 16 million colors.

Grayscale: In the context of this document, a grayscale refers to a graphical image composed of shades of gray. This contrasts to two-tone images, which consist only of shades of black and white.

Interface: In the context of this document, an interface is the method used by a computer user to interact with a software application.

JPEG: JPEG, an acronym for Joint Photographic Experts Group, refers to an image compression format used to transfer color photographs and images over computer networks. Along with GIF, JPEG is the most common way photographs are delivered over the Web. JPEG compresses graphics of photographic color depth better than competing file formats like GIF, and it retains a high degree of color fidelity, which make JPEG files smaller and therefore quicker to download.

Metadata: Metadata is the structured description of an object or collection of objects and is similar to what is found in a standard cataloging record. There are three kinds of metadata:

Descriptive metadata is information that describes the item, such as title, author, publisher, subject, physical dimensions, et cetera.

Administrative metadata may include information about acquisition, access restrictions, provenance, preservation, and treatment decisions, et cetera. Other types of administrative metadata about a digital item may include resolution, bit depth, type of equipment used, and so forth.

Structural metadata is information about how the item is put together or arranged such as the table of contents page, individual page numbers, illustration and plates pages, et cetera. It basically describes the structure of an item, such as a book, so that all of the pages of that item can be displayed in the correct order. Structural metadata may also include information that supports navigation among the components of a complex object. Examples include, turning pages of a book, jumping to a particular chapter or page, or switching between images and corresponding text.

Migration: Migration is the periodic transformation of files to ensure continuing compatibility between file formats and associated applications -- for management and delivery -- as technologies change.

Name: As presented in this document, a name refers to the name of a digital file. Naming is important because file names become persistent, location-independent identifiers for digital files. Name resolution is the process of mapping from a given name to a URL that represents that particular resource.

Object: In the context of this document, the term object can sometimes be used to refer to the digitized version of a physical item.

OCR: An acronym for Optical Character Recognition, OCR scanners and/or software "read" print and transform it into malleable text. OCR scanning differs from regular scanning in that it results in text that can be manipulated and searched, like a Word document, rather than just a "digital photocopy" as with regular scanning.

Outsource: In the context of this document, outsourcing is the practice of contracting with an outside company in order to provide a service or product that otherwise might be too expensive, complicated, or time-consuming for the institution to do internally. A common example of outsourcing is that of copy machines, which are usually rented and/or maintained by an outside agency.

PDF: An acronym for Portable Document Format, PDF is a file format developed by Adobe Systems. Documents that have been formatted in PDF appear on any computer monitor exactly as originally intended. This is especially useful for sharing documents between computers with different operating systems, such as PCs and Macs.

Refresh: To refresh files means to make sure that they are still viable and haven't been corrupted. This is usually accomplished by copying the data onto new media and then copying it back again.

Resolution: In the context of this document, resolution refers to a measure of graphics used to describe what a printer can print, what a scanner can scan, and what a monitor can display. In printers and scanners, resolution is measured in DPI (dots per inch): the number of pixels a device can fit into an inch of space. A monitor's resolution refers to the number of pixels in the entire image since the number of dots per inch can vary depending on the size of the screen.

Scan: In the context of this document, scanning refers to the process of creating a "digital photocopy" of a document. Scanning works much like photocopying in that a document is placed between a sheet of glass and the lid of the scanner (in the case of flat-bed scanners,) but the result is a digital copy of the document instead of a paper copy of it.

Scanner: In the context of this document, a scanner is a piece of hardware used to scan a document, i.e., create a digital copy. Although flatbed scanners are the most common type and operate much like a photocopy machine, there are many types of scanners, including some that never touch the document itself.

Search engine: A search engine is a computer program that receives search requests and returns data (answers) that are formatted and displayed on a user's monitor.

Serve: In the context of this document, to serve something means to present it, usually through a Web site. It is synonymous with "deliver."

Server: In the context of this document, a server is the computer on which digital files are saved and usually served up from.

Service copy: In the context of this document, a service copy of something is the copy that is presented on a Web site. It is synonymous with "delivery copy."

SGML: SGML, an acronym for Standard Generalized Markup Language, is a system for organizing and tagging elements of a document for representation in electronic form. SGML does not specify any particular formatting (such at HTML); rather, it specifies the rules for tagging elements. These tags can then be interpreted to format elements in different ways using a DTD.

Source material: In the context of this document, source material refers to the original, physical material, such as a book or painting, from which digital copies are made.

Storage: As addressed in this document, storage refers to the digital storage of computer files.

Surrogate: In the context of this document, surrogate refers to the digital copy of physical material.

Thumbnail: A thumbnail image is a small version of an image that is used to give the viewer an idea of what the full-sized imaged is like. Typically, clicking on the thumbnail image will cause the full-sized image to download.

URL: A URL, an abbreviation of Uniform Resource Locator, describes the address of documents and other resources on the Internet.

XML: XML, the Extensible Markup Language, is a system for designing custom-made markup languages. Whereas HTML is a pre-defined markup language that primarily dictates the display of information on the Web, XML allows users to define a markup language for themselves. For example, to display names that occur within a document in bold, one would enclose each name with the HTML tags <strong></strong>. In XML names within a document can be defined with their own designation, such as <name></name>, and then designated so that every time something falls within the <name></name> tags, it will be displayed in bold type. Additionally one could also designate that those names be displayed in italics, or in the color green, or that they be compiled into a list of names. In this way, XML is a much more powerful and versatile tool than HTML in that it defines an item and not just specifies how it should be displayed.

Some information for this glossary was taken from the "OIS Glossary of Terms" http://hul.harvard.edu/ois/services/pubs/glossary.html and the "DRS Policy Guide" http://hul.harvard.edu/ois/systems/drs/policyguide.html#glossary.