| Archiving and Preservation | ISCII | PURL |
| Digitizing the Collection | Dublin Core | Web Accessbility (WAI) |
| Unicode | XHTML |
The Digital South Asia Library project (DSAL) together with the Digital Dictionaries of South Asia (DDSA) have instituted a plan for archiving and preserving their resources. This plan considers much of the recent literature concerning these matters (see Works Referenced). The problems of digital archiving are widely described elsewhere and will be summarily referred to here.
In order to avoid the difficulties of proliferation and obsolescence with regard to compression software, DSAL avoids compressing and decompressing data as much as possible. Some of the files are large, but the continuing reductions in the price of digital media make this approach practical. Images from the Web site are currently either gifs or jpegs. However, DSAL archives tiff files for these images because tiffs contain the most information of any image format. DSAL does not touch up tiffs, gifs or jpegs.
The server for DSAL and DDSA uses two tape systems as well as another server to backup the entire Web site. Back-ups occur daily. In order to further secure the data against the possibility of damage to the facility where the back-ups occur, a copy of the data is stored away from the site each week. In addition to the data on the Web site, archived material such as the tiff files used to prepare gifs and jpegs are also backed-up on CDs. The data storage plan also includes the following procedures to ensure the integrity of the storage media.
Naming conventions are established and documented so that resources can be consistently identified and found. DSAL naming conventions include:
Due to the variety of materials involved in the project, various scanners are used.
The microfilm materials are scanned using a Wicks and Wilson 4100 Roll Film Scanstation. Film is scanned at 400 dpi in grayscale. The resulting tiffs are compressed with the recognized standard, the LZW algorithm.
Unbound books are put through a Canon DR-3020 Document Scanner. Tiff images are created at 300 dpi.
Serials and manuscripts are scanned using a Minolta PS 7000 overhead scanner.
Photographs are scanned using a Hewlett Packard ScanJet 4C3C flatbed scanner.
35 mm. film is scanned using a Nikon Super Coolscan 4000. This scanner is also used to scan mounted slides.
Tiffs are created for all images. Three types of tiffs are created. First, in some instances, uncompressed tiffs are created. In addition two categories of compressed tiffs are created, LZW lossless tiffs and G4 fax compression tiffs. The latter are used with Optical Character Recognition (OCR) software for the creation of full text items. All original, untouched tiffs are archived in at least two types of media- either DLT tapes or CDs as well as copies on the server.
Adobe Photoshop is used to create the jpegs and/or gifs for Web delivery. Images in grayscale are saved as gifs with 8 colors. Using grayscale alleviates some of the bleed through (the situation when type from the back of a page is visible when looking at the front of a page) often associated with older materials.
Oversized maps are photographed at one of the labs at the University of Chicago Hospitals. 4 x 6 positives are created from this process and these are then scanned with a full flood scanner. In addition, maps may be digitized on a Contex full scale color scanner located in the Advanced Lab of Social Science Research Computing at the University of Chicago.
In order for any language to be displayed on a computer, the symbols (alphabet and numbers) that comprise that language must be transformed into numbers, the digits that make up the digital language of computers. Before computers were widely and intensively networked, there was less impetus for uniformity in the development of digital representations for the various languages of the world. A plethora of diverse digital character sets were developed for specific languages. These character sets could not practically represent the symbols of other languages and were not compatible with a variety of operating systems or software applications. With the advent of the Internet and the World Wide Web, the desirability of a uniform system for encoding the symbols of all the languages of the world became evident. To this end, a uniform system of encoding the symbols of languages into a single digital character set, a Unicode, was developed. Because Unicode establishes unique values for language symbols, Unicode can be used across a variety of computer operating systems and software applications. For a full description of Unicode please see their Web site http://www.unicode.org. Given the diversity of languages in South Asia and the global audience envisioned for DSAL, the use of Unicode for material presented on the Web site is an imperative.
DSAL and DDSA use the Indian Standard Code for Information Interchange, ISCII, for the encoding of Indic texts into data files. ISCII was developed and adopted by the Government of India and is widely used in South Asia for languages written in the various scripts derived from the Brahmi script. Indic languages written in Perso-Arabic script can not be encoded with ISCII. The ISCII files are then translated into Unicode for display on the Web. The translation from ISCII to Unicode is made simpler by the participation of the Government of India in the Unicode consortium. According to the Unicode consortium, "for any given Indic script, the consonant and vowel letter codes of Unicode are based on ISCII." http://www.unicode.org/unicode/faq/indic.html#1. For more information concerning ISCII, see the following information provided by the Indian government at http://tdil.mit.gov.in/faq.htm#6.
The technical term metadata refers to data describing information sources such as those on a Web site like DSAL. This data allows search engines to find and classify resources. The creation of widely accepted standards for the structure and content of metadata has been an important recent development in information technology. The DSAL and DDSA projects have decided to implement the Dublin Core as the metadata system for these projects in accordance with guidelines outlined by the Library of Congress, http://lcweb.loc.gov/marc/dccross.html. Among the various metadata standards, Dublin Core is being widely adopted in the academic world. DSAL also incorporates generic metadata to ensure that crawler-based search engines can access the metadata. All DSAL and DDSA first and second level pages will have metadata.
One important issue in the evolution of digital resources on the World Wide Web has been the ability to track a resource in spite of changes in location. For a variety of reasons, the address or Uniform Resource Locator (URL) of a resource may change. In order to facilitate the tracking of resources, DSAL and DDSA have decided to use OCLC's PURL service. PURL stands for Persistent Uniform Resource Locator. Instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL can remain consistent even if the URL varies so that users can link to a resource without updating their links. Further information on PURLs can be found at the PURL home page, http://purl.oclc.org/
As part of the effort to conform to the World Wide Web Consortium (W3C) standards and recommendations, DSAL and DDSA are attempting to prepare for the use of XML (eXtensible Markup Language). Most documents on the Web are written in a form of HTML (HyperText Markup Language), but shortcomings in HTML with regard to data exchange and manipulation have led to a call by the W3C for the eventual use of XML (eXtensible Markup Language). In the short term, the W3C have recommended the adoption of XHTML (eXtensible Hypertext Markup Language) as a step toward that end. XHTML is designed to be compatible with XML standards as they are more widely adopted. For more information regarding the changes to HTML required by this standard please refer to http://www.w3.org/XML/Activity. DSAL and DDSA strive to keep abreast of recommended practices in order to ensure that the site remains easy to use.
In 1998, Congress amended the Rehabilitation Act to require Federal agencies to make their electronic and information technology accessible to the disabled. Recent legal interpretations of the act have extended the scope of the law to all state-controlled colleges and universities. The criteria for compliance with regard to Web-based technology and information are based on access guidelines developed by the Web Accessibility Initiative of the World Wide Web Consortium (W3C). These standards consider the needs of users with a variety of disabilities. People with visual disabilities will find it easier to use reading browsers because of the labeled graphics, described video, marked up tables, and guidelines for the use of color and movement. People with hearing disabilities will benefit from the captions included with audio files. Those with physical disabilities that limit their capacity to use a mouse can more easily navigate Web sites by means of keyboard and/or single-switch support for menu commands. Because users with cognitive or neurological disabilities often need a more consistent structure of information, WAI includes recommendations about data structure and display such as consistent navigation, concise language, and the elimination of flickering tags.
The World Wide Web Consortium has identified three levels of compliance with WAI. Level One compliance will remove the major barriers for users with specific disabilities. Levels Two and Three will incrementally improve the transfer of information. DSAL and DDSA Web pages will strive for Level Two compliance
There are coincidental benefits of the WAI for Web users without disabilities. For example, he various modalities of the WAI offer text only options for the users of Web phones or palm pilots with either small or even text only display screens. Clear and concise Web pages are beneficial to all users.
After reading and reviewing a number of articles on the WAI, DSAL and DDSA have put together a brief list of the main points of WAI that affect our site.
| About DSAL | Search | Copyright | Contact Information |
| Technical Information | Funding | Participating Institutions | CRL Home Page |