Books in LUNA_ The new BookReader
Introduction
The BookReader integrates the discovery and display of compound objects such as books, manuscripts, and newspapers in LUNA. LUNA's new BookReader allows users to:
- search text within a book
- turn its pages
- zoom into the text
- add individual pages to a Media Group
- view book in Full View
- view book pages as thumbnails
Creating BookReader objects consists of compiling a text-searchable PDF, a folder of images, a converted XML file and a book.properties file- all explained in more detail in the following pages.
In addition to their descriptive data, BookReader objects can be indexed for full text searching. If you haven't already OCR'd the contents of the PDF, it cannot be indexed for searching as a BookReader object in LUNA. If it isn't necessary for your BookReader object to be indexed for full text search, you can still create and upload a BookReader object by skipping the step regarding converting the PDF to HTML.
To convert your scanned document into a text-searchable PDF file there are a number of OCR products to select from, such as OmniPage Pro and ABBYY FineReader. Adobe Acrobat Professional has OCR capabilities as well.
The LUNA BookReader technology is based on the Internet Archive's BookReader format and allows you to add books into LUNA collections using Insight Studio two different ways:
Option 1: Create and Upload New BookReader Objects
Option 2: Represent Publicly Available BookReader Objects
Either way, you'll be able to work with your BookReader objects as you do all other media in LUNA.
Getting Started
BookReader objects can be added to an existing collection, or can be the basis for a new collection. Either way, before you start building and importing these BookReader objects you need to first have a collection to upload these objects to.
If you are adding to an existing collection and have a cataloged CSV data file:
- Open Studio, select a collection, and select the "Import Data" task from the Main Task Menu.
- Upload the CSV file.
If you're creating a new collection, follow the instructions from User's Guide to Basic Collection Building on how to build a collection in Studio. These steps will include uploading the CSV file.
Now that you have a collection to add BookReader objects to you can follow Option 1 or 2.
Option 1: Create and Upload New BookReader Objects
There are 2 major steps to this option:
A. Preparing the BookReader objects for upload to Studio
You'll create a top level directory containing:
- an images sub directory
- a book.properties file
- a text-searchable PDF*
- an XML file converted from the PDF*
*these are optional, and applicable only if your BookReader object will have full text search capabilities.
B. Uploading BookReader objects to Studio
Once you've prepared all your BookReader objects, you can then upload them to Studio and publish them to LUNA.
A. Preparing the BookReader objects for upload to Studio
1. Create a directory (folder) for each book
Create a top level directory (or folder) for each book you wish to publish. The folder will contain the components needed to create and upload each book to LUNA through Studio.
Each of these top level directories (or folders) represents one book. You need to perform steps 1-6 for each BookObject you wish to create.
Note: Make sure the directory where your book files are placed contain no spaces or special characters. Use _ (underscore) to represent a space. For example: ISBN_1587266032_Great_Expectations
2. Prepare images
Prepare all the images that will make up your book. An image represents a page. JPEG and TIFF files are supported and must be numbered sequentially in the following manner, starting with 0001. We understand that you may have already named your files with unique identifiers; we propose that you simply copy your files into a new folder to apply the sequential numeric value as described in this section so that a BookReader object can be created. The file names must be 4 digits long and have leading 0's:
0001.tif
0002.tif
0003.tif
Note: If you want your BookReader object to have full text search capabilities the images in the folder must match the total number of pages, and the exact sequence of the pages in the text-searchable PDF.
3. Place images in sub-directory called "images"
Place image files in a sub-directory (or folder) called images within the top level directory created in step 1. For example:
4. Create a text file called book.properties
Create a text file called "book.properties" and place it in the top level directory. When you save the book.properties file there should be no extension appended to it.
In this book.properties file include two properties:
BookIdentifier = ISBN_1587266032_Great_Expectations
ThumbnailImage = 0005
Wherein "BookIdentifier" will be the name of the folder created in step 1, and "ThumbnailImage" indicates which image file will be used as the thumbnail to represent the book in LUNA. Do NOT include the file extension here. Indicate the image with the four digit number. For example: 0005
Note: More than one book can have the same BookIdentifier as long as it is not processed in the same batch.
5. Convert your text-searchable PDF into an XML file
If your PDF has not been OCR'd and you do not need the contents of the PDF to be searchable, skip this step. To convert your text-searchable PDF into an XML file that contains all the data and word locations in the book Download the utility called PDFtoXML and install it. (This utility is only available in a Windows format)
- This utility makes use of the open source utility pdftohtml but is not the same thing.
After you've installed the utility:
- Copy and paste the PDF into the PDF folder. (located in the install directory of PDFtoXML)
- Click on "Convert_PDF_to_XML.bat".
- Open the XML folder, and grab the newly created XML file.
Note: Do not edit the name of the newly created XML file. It will automatically add "_text" to the file name.
Each time you perform this conversion be sure that both the PDF and the XML folders are empty.
6. Finalize directory for publishing BookReader objects to LUNA
Place your text-searchable PDF file and the optional (only needed for Full Text Search) XML document into the top level directory of the book.
Now is a good time to review the contents of your directory. It should contain ONLY the following:
- an images sub directory containing sequential files beginning with 0001.jpg or 0001.tif.
- a book.properties file containing BookIdentifier (named exactly the same as the directory) and ThumbnailImage.
- a text-searchable PDF*
- an XML file converted from the PDF (named exactly as the PDFtoXML utility named it). *
*these are optional, and applicable only if your BookReader object will have full text search capabilities.
You're now ready to move on to Step B below and upload these BookReader objects to Studio.
B. Uploading BookReader objects to Studio
1. Select a collection, select "Import Book"
Open Studio and select the collection you'll upload these BookReader objects to.
Select the "Import Book" task from the Studio Main Task Menu. The steps that follow are nearly identical to processing other forms of media in Insight. The main difference is that you are selecting the top level book directory (folder) instead of individual files to process.
2. Select BookReader objects for import
Click New to create a new batch. A batch can contain as many books as you like. Each book is represented by the folder you created in Step 1 of Creating a BookReader Object.
The batch in the left panel will show the date and time it was created. You must create a batch or select and existing batch before adding content.
- Under the Process List tab make sure to check the Build Book box:
Either drag each folder containing the book components from where they are stored on your file system and drop them onto the Process List panel to the right of the Batch panel; or find the folders containing each book by clicking Browse and importing them from your file system to the Process List.
3. Import BookReader objects
Once the Process List contains the books you wish to import, click Import. A status message will appear to indicate Pending, Processing, Uploading, or Complete for each book. When the entire batch has been processed completely, Finished Importing Media will appear at the top.
Since each book may contain many images, this import process may take while to complete. If your book contains 200 pages it would take the same amount of time it takes to process 200 images.
4. Add a Linking file
This step ensures your cataloged data will be correctly linked to your BookReader objects. If you did not upload a cataloged CSV file for these BookReader objects skip this step and manually link your records to BookReader objects using Inscribe.
Select the batch that you want to apply the links to. Click on the Linking tab and select "Use External Mapping File".
This linking file is a simple text file containing two columns. The first column must be the same as what you used for BookIdentifier in the book.properties file. The second column is the value in the database you are linking to- what you will select from the Mapping Field Name pull down menu (for example: Identifier; Title; etc).
After uploading the linking file, select the Mapping Field Name.
5. Publish BookReader objects
Proceed to the Publish tab. Select "All unpublished Media" and then "Apply Changes".
Congratulations, you've now created and uploaded BookReader objects to Insight. To view these objects in LUNA proceed to the steps involved in Publishing a Collection to LUNA. Once you've completed the steps involved in publishing a new collection to LUNA, or updating an existing collection, your users will then see the BookReader objects in LUNA.
Option 2: Represent Publicly Available BookReader Objects
You can add your books to the Open Li brary http://openlibrary.org/ and/or the Internet Archive and also access them in LUNA. Just follow the instructions on the Open Library site to add books or find existing books in the Open Library and then follow the simple steps below to add them to your LUNA collection.
1. Find books to upload
Locate the books you want to import into LUNA in the Open Library and/or Internet Archive. Choose "Read Online" by clicking the Read icon next to a book. (Note: if a Read icon does not appear next to the book in the Open Library search results, the book is not available to add.)
2. Create a spreadsheet
This spreadsheet will contain only two columns. The first row MUST contain text, so label Column 1 "URL", and Column 2 "Page Number" for example.
Copy the URL for each book and paste into the first column of the spreadsheet.
Find the thumbnail image number by viewing the book in One-Page view mode, and count until you've reached the image you want. This thumbnail image number is the image you want to represent the book in the Thumbnail View in LUNA, and will be the number you will enter in column 2 of your spreadsheet.
For each book, add a new line in your spreadsheet. Save your spreadsheet as a CSV (Comma Separated Values) file.
Note: Alternatively, instead of creating a spreadsheet and converting it to a CSV file, you can also just create a text file with new lines for each additional book and image number. After the URL to the book, include a comma and then page number you want to be the thumbnail image.
Example: http://www.archive.org/stream/greatexpectatio00dickgoog#page/n10/mode/2up,12
3. Load the spreadsheet into Studio
Open Studio and select the collection you'll add the books to. Select the "Import Book" task from the Studio Main Task Menu.
Click New to create a new batch. The new batch will just contain the spreadsheet created in step 2. The batch in the left panel will show the date and time it was created. If you do not create a new batch, a batch will be automatically generated whenever you add content to the Process List.
- Under the Process List tab make sure to leave the Build Book box unchecked.
Either drag each folder containing the book components from where they are stored on your file system and drop them onto the Process List panel to the right of the Batch panel; or find the folders containing each book by clicking Browse and importing them from your file system to the Process List.
4. Import the spreadsheet
Once the Process List contains the books you wish to import, click Import. A status message will appear to indicate Pending, Processing, Uploading, or Complete for each book. When the entire batch has been processed completely, Finished Importing Media will appear at the top.
5. Add a Linking file
This step ensures your cataloged data will be correctly linked to your BookReader objects. If you did not upload a cataloged CSV file for these BookReader objects skip this step and manually link your records to BookReader objects using Inscribe.
Select the batch that you want to apply the links to. Click on the Linking tab and select "Use External Mapping File".
This linking file is a simple text file containing two columns. The first column must be the unique identifier or URL for each book.
The unique identifier is part of the URL. In this example URL http://www.archive.org/stream/greatexpectation03dick#page/n3/mode/2up* the unique identifier is *greatexpectation03dick.
The second column is the value in the database you are linking to- what you will select from the Mapping Field Name pull down menu (for example: Identifier; Title; etc).
After uploading the linking file, select the Mapping Field Name.
6. Publish BookReader objects
Proceed to the Publish tab. Select "All unpublished Media" and then "Apply Changes".
Congratulations, you've now created and uploaded BookReader objects to Insight. To view these objects in LUNA proceed to the steps involved in Publishing a Collection to LUNA. Once you've completed the steps involved in publishing a new collection to LUNA, or updating an existing collection, your users will now see the BookReader objects in LUNA.