Production Path Handbook

Below is an outline of the production path, from start to finish. The detailed sections contain various remarks pertaining to considerations and decisions, time frames, misc. notes, shortcuts, and processes.

** Caveat on times. These are very approximate. Variations in the quality of the text and computer situations (crashes, etc.) greatly affect the time spent on any given document. Also, these times do not reflect the learning curve necessary for each part of the process. This includes some degree of technical proficiency in:

Operating Systems: PC, Mac, Unix, Linux
General Apps.: MS Word, Web Browsers, EndNote
Specialized Apps.: Digital Camera and PhaseOne software, Fetch, Adobe Photoshop, Telnet, Adaptec Jewel Case Creator, X11, Adaptec Toast (CD burn), XEmacs, OmniPage, Dreamweaver

Capture Image
Time involved: The PhaseOne Digital camera usually requires at least 1/2 hour of initial preparation settings. After this period, the time per page rate can vary depending on whether you can do two pages at once or if you have to do one page at a time, the quality of the page (focusing can be difficult, especially with warped pages or pages where the text is too close to the binding), whether or not you have to refocus (if the text is long, the focal distance changes), and how long it takes you to resituate the text and refocus to do the verso pages (if you're doing a text one page at a time)

-- Book should face wall on the right in order that you don't have to inverse the image in Photoshop.
-- Keep a record of all the camera and scanner settings: height, f-stop, shutter speed, med. reolut...
-- Numbering conventions: use "00" for page including color bar; use 01-... so the images list in order in folders; if text has roman numerals at beginning, use 001and then use 010 as page one, 020 as page 2 and so on.
-- On first page: Include Color Bar & label (Emory University; Woodruff Library Special Collections; French Revolutionary Pamphlets; Call #, Vol. #, Pamph. #)
-- Rescan 1st page without above info
-- Either scan 2 pages at a time, or scan recto pages, then verso
-- Save as PC-byte Tiff images

Return to Outline

Jewel Case Creation
Time involved: Jewel case creation can be finicky. The initial design template can take some time, but after that it takes perhaps 10 minutes per case.

-- Make jewel cases (2 for each pamphlet: ... copy 1, ... copy 2)
-- Make one text file with all relevant info to doc. and digitization process

Return to Outline

CD production
Time involved: One full CD takes about 20 minutes to burn. But for my project I burned a working copy and two master archival copies (the number of CDs also depends upon whether the images contain one or two pages and upon the total number of pages in the text). 26 or 27 files (at 24.5 mb per/image) will fit on a single CD (with room left over for the txt, .jwl, and .jpg files, see below)

Burn CDs: two copies of each pamphlet
-- Include: image files (.tifs), txt file, jewel case file, jewel case image file (.jpg).
-- Under data-tabs: choose Jouet; don't choose XA
-- Burn as 1S09660
-- Label CDs only around center using a Sharpie

Return to Outline

Time involved: To Scan and OCR 10 pages of text, ending up with a Microsoft Word document, takes approximately minutes.

-- Settings: French language, plain txt, single column
-- You can load 10-20 pages at a time into OmniPage; then manually draw the text zones to OCR scan
-- Then you can hit the Auto123 button to process all of the pages at once
-- Save as "ACSI txt with line breaks"

Return to Outline

JPG images
Time involved: To convert the 10 tiff images into jpgs takes approximately minutes. On this project we decided not to take the time to crop the images, thus the image includes the book cradle apparatus, etc. These page images are available alongside the digital version & they are also useful during the proofreading process

-- Use Photoshop's batch-function to make jpgs from the tiffs
1. Choose Window>Show Actions in order to display the Actions palette
2. Choose Create New Action (small icon on bottom-right of Actions palette) and give it a name
3. Hit Record button
4. Click through your steps then hit Stop button (you can rearrange the order by drag/drop)
5. Then to Batch: Choose File>Automate>Batch
6. Choose the Set, Action, Source and Destination

Return to Outline

-- Use .jpgs (although may need recourse to tif images if parts of the text are really unclear)


-- Rebecca has made a replace-accents script which you can find in the mkazanj home folder on Alfonso

Display processing
-- the mkazanj home folder has an alias which allows the parse command to function
-- Follow this order:
1. Until we move to xml and are able to show the page images, comment out the images at the very end of the document (<,bang,dash,dash,space xxxxxxxxxx space,dash,dash,>)
2. Enter entity names: add image filenames to list in entitymap.sgml (in Revolution /Index) & to tei2drama.dtd (in Chaucer/data/dtd)
3. parse and fix errors
4. Build:
a. ssh to the Index folder, cmd = Build Revolution (the build cmd only functions from this folder)
b. if the cmd can't recognize a file, change permissions = chmod 664 space *
or use the filename instead of the * (or 775)
sudo -u filename chmod 775 filenam

Return to Outline

Web interface
-- Web design by Katherine Ellison
-- Produce Search and Browse functions
-- Figure out and Incorporate User and Referer Stats
-- Mouseover Footnotes on the Overview page: See Eric Bosrup's Java script site. All single and double quotes must be coded, and the text of the footnote must be one long text-string (no hard returns). The two java library files must be in the same directory as the html file. Make sure to remember to put the 2-line Script info just after the <body> tag.

Cataloging and Links
-- Get links to site on main Beck Ctr. page and on main Special Collections page
-- Get Euclid records made for the texts which have been digitized
-- Get RLIN records made (make individual records for the digitized texts available to OCLC/RLIN as a way of preventing other libraries from duplicating our efforts)

Contact and announce existence of site to:
-- Special Collections
-- Woodruff Library
-- University PR people
-- Judith Miller in History Department
-- HLists and Balzac-list
-- Gallica: if we list them, will they list us?
-- Mark at Chicago
-- Get site non-profit registered with Search Engine sites

Return to Outline

Pedagogical considerations
-- Contact Kavanagh
-- Contact Judith Miller
-- Contact Stephen Biek

Final Report
-- This document (...Handbook), Tagging Guidelines, handout flyer for use in Special Collections


Return to Outline