I think we may have up to 100,000 books, but I don't think the site will have a lot of traffic.
Thank you for your help. I think it is a little more clear and will try to implement it now. > -----Original Message----- > From: Gora Mohanty [mailto:g...@mimirtech.com] > Sent: Monday, December 30, 2013 11:46 AM > To: solr-user@lucene.apache.org > Subject: Re: How to use Solr in my project > > On 30 December 2013 11:27, Fatima Issawi <issa...@qu.edu.qa> wrote: > > Hi again, > > > > We have another program that will be extracting the text, and it will be > extracting the top right and bottom left corners of the words. You are right, > I > do expect to have a lot of data. > > > > When would solr start experiencing issues in performance? Is it better to: > > > > INDEX: > > - document metadata > > - words > > > > STORE: > > - document metadata > > - words > > - coordinates > > > > in Solr rather than in the database? How would I set up the schema in order > to store the coordinates? > > You do not mention the number of documents, but for a few tens of > thousands of documents, your problem should be tractable in Solr. Not sure > what document metadata you have, and if you need to search through it, but > what I would do is index the words, and store the coordinates in Solr, the > assumption being that words are searched but not retrieved from Solr, while > coordinates are retrieved but never searched. > > Off the top of my head, each record can be: > <doc1> <pg1> <word1> <coord_x1> <coord_y1> <coord_x2> <coord_y2> > <doc1> <pg1> <word2> .... > ... > <doc1> <pg2> ... > ... > <doc2> ... > > * <doc_id> and <pg_id> from Solr search results let you retrieve the image > from the filesystem > * The coordinates allow post-processing to highlight the word in the image > > As always, set up a prototype system with a subset of the records in order to > measure performance. > > > If storing the coordinates in solr is not recommended, what would be the > best process to get the coordinates after indexing the words and metadata? > Do I search in solr and then use the documentID to then search the database > for the words and coordinates? > > You could do that, but Solr by itself should be fine. > > Regards, > Gora