wiki:NewBookIndexingAlgorithm
Last modified 10 years ago Last modified on 06/14/08 04:34:01

Description of a new book indexing algorithm

The beginnings of this are in AmisCore?/dtb/DtbIndex

  • Create a sketch of the book with the SMIL file names in one column and empty lists for their IDs in the other column
 -------------------------------
| file_1.smil   |  (no IDs yet) |
|               |               |
|-------------------------------|
| file_2.smil   |  (no IDs yet) |
|               |               |
|-------------------------------|
| file_3.smil   |  (no IDs yet) |
|               |               |
 -------------------------------
  • Receive a request to load a SMIL URL (file_1.smil#a)
  • See if that file has already been indexed
  • If not, run QuickDataSmilFileReader? to get a list of all the IDs in the file
  • Note that QuickDataSmilFileReader? will also record in a map the Smil addresses and text SRCs
SMIL IDs Map:
 -------------------------------
| file_1.smil   |  a, b, c, d   |
|               |               |
|-------------------------------|
| file_2.smil   |  (no IDs yet) |
|               |               |
|-------------------------------|
| file_3.smil   |  (no IDs yet) |
|               |               |
 -------------------------------

SMIL address to text SRC
 -------------------------------
| file_1.smil#a |  text.html#a  |
|               |               |
|-------------------------------|
| file_1.smil#b |  text.html#b  |
|               |               |
|-------------------------------|
| file_1.smil#c |  text.html#c  |
|               |               |
 -------------------------------
  • The next step is to map each set of SMIL IDs to Nav Nodes.

Issues

  • How to tell the range belonging to a NavNode? before all the data is present? We cannot be sure that each NavNode? has its own SMIL file (while common in practice, this is not in either standard).
  • Remember that the tables will get filled in out of order -- the first table will have some blank righthand columns if the user jumps around
  • The second table shouldn't matter, it's only a data map
  • If a text search result ends up in the middle of the book, how do we get there? One idea is to start parsing the SMIL files in order, starting with our starting point, and continuing until the search result SRC is reached. I think this can be considered a "best guess"

Another approach

Can the book be indexed in the background while the user starts reading?