Matt F's Wiki Report
From LIS5313
Contents |
[edit] Introduction To Digitzation
Digitization is the process by which a physical object, often analog media, is re-presented as a digital object. Most commonly, digitization is performed for the purpose of making a singular object available to multiple users, as well as for its restoration and preservation capabilities. (Jones 2001) Libraries and archives are increasingly turning to digitization as a means of promoting the educational value of their unique holdings.This report will look at the unique ways - both the process of digitization, and the media being digitized - that various institutions are utilizing digitization tools and using the web (whether it be through blogs, wikis, podcasts, etc.) to promote these materials to the public.
[edit] Why Digitization?
We live in a society that is constantly promoting a digital culture (whether it is digital television, digital phone service, DSL, the list goes on). There are much more valuable reasons to digitize something besides "I want to put a home movie I shot on VHS onto a DVD so I can watch it in my DVD player," although that certainly is a valid exercise in digitization for the average person.
The most apparent reason to digitize an item is for the exponential increase in accessibility. Imagine that a library in a small town in Arkansas holds a diary kept by President Bill Clinton as a child. Since only one copy of this original document exists, one would have to travel to this town to view the contents. There very well could be restrictions on being able to photocopy pages, and almost definitely one would be unable to check out this document. Now, if a carefully skilled technician were to digitize this document with a high-resolution scanner and place the file on a public web server, the entire world could access this important historical journal. One could view, print, copy and paste, enlarge, or whatever one would like to, without ever needing to see or handle the original materials. While President Clinton's diary might make for an entertaining read, obviously the much more important use here is for researchers who can obtain rare materials without the need to travel from archive to archive, or library to library.Another example: If a researcher is working on a report about the average number of people living in one household in 1860 in each town that was part of the Confederacy, it would take an incredible amount of time to visit each town's archive and find the materials. If properly indexed online so anyone could search that archive's database, the researcher could now find census information, limit it to 1860, and the information would instantly be available.
The second and equally important reason, especially to researchers, is the preservation and restoration characteristics that digitization provides. Everything from artwork to handwritten paper to VHS cassettes gradually deteriorates over time, whether from excess humidity, light that is too strong, or from being mishandled by a person. Having a digital copy of physical media is essentially like having a snapshot that won't fade away. Digital files should be backed up ideally on another hard drive, as CD and DVD discs deteriorate over time as well. (NFPF 2001) Once an item, take a painting for example, is available in a high-resolution digital format, one could view the work magnified hundreds of times and make color corrections, 'fix' rips and tears, perform whatever restoration work may be necessary, and then save a 'like-new' copy of what the artists originally created.
For further reading
- John Unsworth, Dean of the Graduate School of Library and Information Science at University of Illinois, Urbana-Champaign, has provided an important resource outlining when digitization is most beneficial, called The Value of Digitization for Libraries and Humanities Scholarship.
- The May 2006 issue of Government Computer News has an article about an audio digitization project at the Library of Congress that looks at the benefits, possible disadvantages, and necessity of establishing guidelines and universal standards.
[edit] Starting A Digitization Project
The Abraham Lincoln Historical Digitization Project is a fascinating digitized collection of texts written by and about Lincoln, as well as images, videos and audio clips of speeches and stories about Lincoln. While a very important resource for researchers and curious citizens, the audio and video files are only available to stream or download in low-resolution, compressed RealMedia files. As RealMedia's popularity as a format as tapered off over the last 5 years following the rise of mp3, these files already are somewhat outdated. Converting these to mp3, or even more ambitiously, going back and providing the uncompressed media files, would require a staff member to go through each and every article. A November 28th interview with Drew VandeCreek, the Director of Digital Projects at the Northern Illinois University Libraries, reveals that because there is so much more to be digitized in other projects, that there simply isn't enough time right now to revisit old ones.
Take the UCSB Cylinder Preservation and Digitization Project for example, which has digitized some 7,000 music cylinders over the past 5 years. What began as a pilot project quickly received $205,000 funding from the Institute of Museum and Library Services and has become and a very important and popular resource on the web.In a November 27th Interview with David Seubert, the curator of the collection, Mr. Seubert explained how the only real obstacle in the course of the ongoing project was having enough staff around to continue to digitize new acquisitions once the funding ran out. Obtaining the archeophone and computers was easy, and the process of removing clicks, hisses, pops and buzzes from the audio recordings is now made easy with the specialized hardware they used, the CEDAR Series X+.
[edit] If You Digitize It, Will They Come?
A digitization project is only as successful as the service it eventually provides to the public, and informing the public that the materials actually exist in today's age means being familiar with indexing, message boards, listservs, and Web 2.0. The Internet Archive, which started in July of 2002, grew from having 2 registered members its first month, to 93 its second month, to 2,311 by the end of its third month in existence, all without ever paying for an advertisement. In a time when wikipedia had only 20,000 articles (compared to the 2 million+ today) and blogs had just begun to enter the average internet user's vocabulary, the Internet Archive's founders resorted to old fashion word-of-mouth postings on various message boards and listservs.
As David Seubert pointed out in the previously mentioned interview, he believes that ensuring that every single web page that is a part of the Cylinder Project was properly indexed is a vital part to making a collection easily accessible. Search engine spiders could find a digitized recording of a vaudeville recording from the late 1890's simply searching the title of the program in a popular search engine. If a page is not indexed, one could only reach these materials by first finding the main website for this project, something that may not always be easy to do.
For those looking to use innovative ways of promoting a digitization project, the best place to do so is at MBooks - The University of Michigan's library digitization initiative. Whether or not partnering with Google has anything to do with the fact that Paul Courant, the curator of the project, and John Wilkin, the Associate University Librarian, have started blogs and maintained a wiki page, is not important. What is important is that these very simple but informative tools open up a mode of communication not just from readers (who can post comments on blog posts), but to learn more about why the leaders of this project feel so strong about it, to give them an outlet to defend themselves against criticism - and there is plenty of criticism against Google - and enlighten the public on the actual process of digitizing the materials. It also provides an outlet to discuss related issues, such as Mr. Wilkin's take on "Next Generation Library Systems," or the "Future of LIS Programs."
For more information surrounding the controversy of Google's digitization project, Corinna Baksik has an extensive report from Libraries and the Academy 6.4 (2006), pages 399-415.
RSS feeds are the one Web 2.0 technology which seems to be eluding the digitization world. The main problem, according to Paul Courant in a November 30th interview, is organization. At a collection level, it would be overwhelming and practically worthless to have a feed update every time something is digitized. In the case of MBooks, Google plans to digitize 7,000,000 volumes of text over a course of 6 years, or roughly 3,200 texts per day, every single day, over the course of the project. Conversely, at too basic of a level, for example having a feed for "Joseph Heller," who only wrote 7 novels and a few other texts, you would be looking at an average of 1 item being added to the RSS feed roughly every 150 days. Setting up an RSS feed takes time, which means having additional staff to maintain it, which means needing additional funding. At this time, RSS may not be a practical technology to incorporate into digitization projects being done on such a large scale. Of websites with smaller collections, the Prelinger Archives, a part of the Internet Archive, has a very useful, clutter-free RSS feed. Over the past 8 months, approximately 50 items have been digitized, roughly 1 to 2 every week. The feed lets the user know when the item was digitized, a brief description, and the available viewing formats. This information is presented in an easy to access, organized list, that will let the user know whenever something new is available to view, rather than checking the website every day and searching.
Like RSS feeds, podcasts remain an under-utilized tool for digitization projects. For a project such as UCSB's Cylinder Preservation, a weekly podcast that discusses various recordings, or perhaps featured interviews with authorities on specific genres of music, seems like an ideal companion. There are podcast episodes that have dealt with digitization, such as Eric Olsen on Scanning and Digitization which discusses new scanning technologies, but for the most part, funding shortcomings are to blame. One exception is a podcast from the Joint Information Systems Committee that discusses a £22 million digitization project. That podcast, however, was done in September of 2007 and has not been updated yet, despite having a subscription link to an RSS feed. (There is a great blog worth checking out that is dedicated to discussing the JISC project.)
[edit] The Future of Digitization
Over time, libraries will realize the importance of digitization for reasons previously discussed in this report. With easier access - financially, as well as in terms of availability - to the technologies that make digitization possible, it seems inevitable that projects will expand at both major and minor levels. Universities across the world who each hold original works can define standards and contribute to a very large project. At the same time, a small library in a rural town will have less trouble taking on a project to make public records accessible online. In fact, Michael Boock and Ruth Vondracek's recent essay "Organizing for Digitization: A Survey" reports that of the 40 libraries that responded to their mass e-mail seeking information on various digitization projects, "95% are involved in digitizing locally owned, print-based content." Their report also finds that 76% of these libraries created at least one new position to carry out the digitization efforts, a number that holds great promise in a time when printed material must compete with so many other forms of media. (Boock, Vondracek 2006)
Planning a digitization project for the future can be tricky, mainly because funding is not always available. For other (often unstated reasons), libraries choose not to use digitization as a means of increased accessibility and preservation. In 2005, the Florida State University Libraries released its Electronic Information and Digital Services Policy which states that "...FSU will never digitize its entire collection, the Digital Library Center will endeavor to provide a critical mass of digital information from parts of its collections." There are also the intricacies of copyright law, which will constantly change along with the ways that we reproduce and distribute copyrighted works over the internet. An excellent resource that outlines some possible changes for the future is Georgia Harper's blog on Mass Digitization hosted by the Texas Digital Library.
[edit] Gallery of Digitized Materials
The following represents a variety of digitized materials from library digitization projects that are either complete or in process. Besides a video and audio component, this gallery also presents two photographs and text-based materials that have been digitized, including sheet music and a court document. So long as technology exists to play back (in the case of audio or video) or to scan (text, artwork, photographs, etc.) there is no limit as to what can be digitized. Feel free to click-through to each of the participating library's websites - some of which have been previously discussed on this page - to see other digitized materials from the same collection.
| ||
|---|---|---|
| A 1937 film of the Hindenburg explosion. This film was digitized by the Prelinger Archives and has been downloaded over 49,000 times. | A 1919 recording of American singing legend Al Jolson, found in the 78rpm Collection at the Internet Archive. | The above digitized image is of Abraham Lincoln from 1858, during the time when his famous Debates with Stephen Douglas were ocurring. This image is part of the Abraham Lincoln Historical Digitization Project. |
| Above, a page from Joseph Labitzky's sheet music for a 1848 waltz composed for the piano, a part of the University of North Carolina at Chapel Hill's 19th Century American Sheet Music Digitization Project. | A 1906 photograph of the damage caused by the San Francisco Earthquake, part of the University of Southern California's Digital Archive Project. | An excerpt of the decision from an 1805 court case in St. Louis concerning Louis Bompart; this is an artifact of the St. Louis Probate Court Digitization Project, 1802-1900. |
[edit] Additional Resources
[edit] Sources
- Boock, Michael and Vondracek, Ruth. (2006). Organizing for Digitization: A Survey. Libraries and the Academy, 6 (2), 197-217
- Jones, Trevor. (2001). "An Introduction to Digital Projects for Libraries, Museums and Archives." Available: http://images.library.uiuc.edu/resources/introduction.htm. Last accessed 24 November 2007.
- Sanett, Shelby. (2003). The Cost to Preserve Authentic Electronic Records in Perpetuity: Comparing Costs across Cost Models and Cost Frameworks. RLG DigiNews. 7 (4), 8-14.
- Troll, Denise. (2002). How and Why Libraries are Changing: What We Know and What We Need to Know. Libraries and the Academy. 2 (1), 99-123.
- Unknown. (1999). Why Preserve Film?. Available: http://www.filmpreservation.org/preservation/why_preserve.html. Last accessed 25 November 2007.

