Re: How to use Solr in my project

Gora Mohanty Mon, 30 Dec 2013 00:47:44 -0800

On 30 December 2013 11:27, Fatima Issawi <issa...@qu.edu.qa> wrote:
> Hi again,
>
> We have another program that will be extracting the text, and it will be 
> extracting the top right and bottom left corners of the words. You are right, 
> I do expect to have a lot of data.
>
> When would solr start experiencing issues in performance? Is it better to:
>
> INDEX:
> - document metadata
> - words
>
> STORE:
> - document metadata
> - words
> - coordinates
>
> in Solr rather than in the database? How would I set up the schema in order 
> to store the coordinates?


You do not mention the number of documents, but for a few
tens of thousands of documents, your problem should be tractable
in Solr. Not sure what document metadata you have, and if you need
to search through it, but what I would do is index the words, and
store the coordinates in Solr, the assumption being that words are
searched but not retrieved from Solr, while coordinates are retrieved
but never searched.

Off the top of my head, each record can be:
<doc1> <pg1> <word1> <coord_x1> <coord_y1> <coord_x2> <coord_y2>
<doc1> <pg1> <word2> ....
...
<doc1> <pg2> ...
...
<doc2> ...

* <doc_id> and <pg_id> from Solr search results let you retrieve the image
  from the filesystem
* The coordinates allow post-processing to highlight the word in the image

As always, set up a prototype system with a subset of the records in order
to measure performance.

> If storing the coordinates in solr is not recommended, what would be the best 
> process to get the coordinates after indexing the words and metadata? Do I 
> search in solr and then use the documentID to then search the database for 
> the words and coordinates?

You could do that, but Solr by itself should be fine.

Regards,
Gora

Re: How to use Solr in my project

Reply via email to