Books in LUNA_ The new BookReader

Books in LUNA_ The new BookReader

Introduction


The BookReader integrates the discovery and display of compound objects such as books, manuscripts, and newspapers in LUNA. LUNA's new BookReader allows users to:

  • search text within a book

  • turn its pages

  • zoom into the text

  • add individual pages to a Media Group

  • view book in Full View

  • view book pages as thumbnails


Creating BookReader objects consists of compiling a text-searchable PDF, a folder of images, a converted XML file and a book.properties file- all explained in more detail in the following pages.
In addition to their descriptive data, BookReader objects can be indexed for full text searching. If you haven't already OCR'd the contents of the PDF, it cannot be indexed for searching as a BookReader object in LUNA. If it isn't necessary for your BookReader object to be indexed for full text search, you can still create and upload a BookReader object by skipping the step regarding converting the PDF to HTML.
To convert your scanned document into a text-searchable PDF file there are a number of OCR products to select from, such as OmniPage Pro and ABBYY FineReader. Adobe Acrobat Professional has OCR capabilities as well.
The LUNA BookReader technology is based on the Internet Archive's BookReader format and allows you to add books into LUNA collections using Insight Studio two different ways:
Option 1: Create and Upload New BookReader Objects
Option 2: Represent Publicly Available BookReader Objects
Either way, you'll be able to work with your BookReader objects as you do all other media in LUNA.

Getting Started


BookReader objects can be added to an existing collection, or can be the basis for a new collection. Either way, before you start building and importing these BookReader objects you need to first have a collection to upload these objects to.
If you are adding to an existing collection and have a cataloged CSV data file:

  1. Open Studio, select a collection, and select the "Import Data" task from the Main Task Menu.

  2. Upload the CSV file.


If you're creating a new collection, follow the instructions from User's Guide to Basic Collection Building on how to build a collection in Studio. These steps will include uploading the CSV file.
Now that you have a collection to add BookReader objects to you can follow Option 1 or 2.

Option 1: Create and Upload New BookReader Objects


There are 2 major steps to this option:
A. Preparing the BookReader objects for upload to Studio
You'll create a top level directory containing:

  • an images sub directory

  • a book.properties file

  • a text-searchable PDF*

  • an XML file converted from the PDF*


*these are optional, and applicable only if your BookReader object will have full text search capabilities.
B. Uploading BookReader objects to Studio
Once you've prepared all your BookReader objects, you can then upload them to Studio and publish them to LUNA.

A. Preparing the BookReader objects for upload to Studio
1. Create a directory (folder) for each book
Create a top level directory (or folder) for each book you wish to publish. The folder will contain the components needed to create and upload each book to LUNA through Studio.
Each of these top level directories (or folders) represents one book. You need to perform steps 1-6 for each BookObject you wish to create.
Note: Make sure the directory where your book files are placed contain no spaces or special characters. Use _ (underscore) to represent a space. For example: ISBN_1587266032_Great_Expectations

2. Prepare images
Prepare all the images that will make up your book. An image represents a page. JPEG and TIFF files are supported and must be numbered sequentially in the following manner, starting with 0001. We understand that you may have already named your files with unique identifiers; we propose that you simply copy your files into a new folder to apply the sequential numeric value as described in this section so that a BookReader object can be created. The file names must be 4 digits long and have leading 0's:
0001.tif
0002.tif
0003.tif
Note: If you want your BookReader object to have full text search capabilities the images in the folder must match the total number of pages, and the exact sequence of the pages in the text-searchable PDF.

3. Place images in sub-directory called "images"
Place image files in a sub-directory (or folder) called images within the top level directory created in step 1. For example:



4. Create a text file called book.properties
Create a text file called "book.properties" and place it in the top level directory. When you save the book.properties file there should be no extension appended to it.

In this book.properties file include two properties:
BookIdentifier = ISBN_1587266032_Great_Expectations
ThumbnailImage = 0005
Wherein "BookIdentifier" will be the name of the folder created in step 1, and "ThumbnailImage" indicates which image file will be used as the thumbnail to represent the book in LUNA. Do NOT include the file extension here. Indicate the image with the four digit number. For example: 0005
Note: More than one book can have the same BookIdentifier as long as it is not processed in the same batch.

5. Convert your text-searchable PDF into an XML file
If your PDF has not been OCR'd and you do not need the contents of the PDF to be searchable, skip this step. To convert your text-searchable PDF into an XML file that contains all the data and word locations in the book Download the utility called PDFtoXML and install it. (This utility is only available in a Windows format)