On 26 December 2013 10:54, Fatima Issawi <issa...@qu.edu.qa> wrote: > Hello, > > First off, I apologize if this was sent twice. I was having issues > subscribing to the list. > > I'm a complete noob in Solr (and indexing), so I'm hoping someone can help me > figure out how to implement Solr in my project. I have gone through some > tutorials online and I was able to import and query text in some Arabic PDF > documents. > > We have some scans of Historical Handwritten Arabic documents that will have > text extracted into a database (or PDF). We would like the user to be able to > search the document for text, then have the scanned image show up in a viewer > with the text highlighted.
This will not work for scanned images which do not actually contain the text. If you have the text of the documents, the best that you can do is break the text into pages corresponding to the scanned images, and index into Solr the text from the pages and the scanned image that should be linked to the text. For a user search, you will need to show the scanned image for the entire page: Highlighting of the search term in an image is not possible without optical character recognition (OCR). Similarly, if you are indexing from PDFs, you will need to ensure that they contain text, and not just images. Regards, Gora