Using solr for image retrieval - very long response time

2014-07-04 Thread Yossi Biton
Hello there,

Recently I was trying to implement the bag-of-words model for image
retrieval by using Solr. Shortly this model consists of extracting "visual
words" from images and then use tf-idf schema for fast querying (usually
include also re-ranking stage).
I found solr as a suitable platform (hope i'm not wrong), as it provides
tf-idf ranking.

Currently i'm issuing the following problem :
My images usually contains about 1,000 words, so it means the query
consists of 1,000 terms.
When using simple select query with 1,000 OR i get a very long response
time (100s for index with 2M images).

Is there an efficient way to build the query in this case ?


Re: Using solr for image retrieval - very long response time

2014-07-04 Thread Yossi Biton
1. debugQuery shows almost all of the time spent in query.

2. i cant look right now at the heap, but i remember i allocated 4gb for
the JVM and it's far from being fully used.
Regarding GC im not sure how to check it (gc.log ?).

3. The whole index fits in memory during the query.
 On Jul 4, 2014 3:31 PM, "Jack Krupansky"  wrote:

> I would expect an excessively long query (greater than dozens or low
> hundreds of terms) to run significantly slower, but 100 seconds does seem
> excessively slow even for a 1000-term query.
>
> Add the debugQuery=true parameter to your query request and checking the
> timing section to see if the time is spent in the query process or some
> other stage of processing.
>
> How is your JVM heap usage? Make sure you have enough heap but not too
> much. Are a lot of GCs occurring?
>
> Does your index fit entirely in OS system memory for file caching? If not,
> you could be incurring tons of IO.
>
> -- Jack Krupansky
>
> -Original Message- From: Yossi Biton
> Sent: Friday, July 4, 2014 7:25 AM
> To: solr-user@lucene.apache.org
> Subject: Using solr for image retrieval - very long response time
>
> Hello there,
>
> Recently I was trying to implement the bag-of-words model for image
> retrieval by using Solr. Shortly this model consists of extracting "visual
> words" from images and then use tf-idf schema for fast querying (usually
> include also re-ranking stage).
> I found solr as a suitable platform (hope i'm not wrong), as it provides
> tf-idf ranking.
>
> Currently i'm issuing the following problem :
> My images usually contains about 1,000 words, so it means the query
> consists of 1,000 terms.
> When using simple select query with 1,000 OR i get a very long response
> time (100s for index with 2M images).
>
> Is there an efficient way to build the query in this case ?
>


What does getSearcher method of SolrQueryRequest means ?

2014-07-08 Thread Yossi Biton
Hello there,

I'm using a project named LIRE for image retrieval based on sole platform.
There is part of the code which i can't understand, so maybe you could help
me.

The project implements request handler named lireq :
public class LireRequestHandler extends RequestHandlerBase

The search method in this handler is computed from lucene search +
reranking.
The first part goes like this :
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {
...
BooleanQuery query = new BooleanQuery();
for (int i = 0; i < numHashes; i++) {
query.add(new BooleanClause(new TermQuery(new Term(paramField,
Integer.toHexString(hashes[i]))), BooleanClause.Occur.SHOULD));
}

SolrIndexSearcher searcher = req.getSearcher()
TopDocs docs = searcher.search(query, candidateResultNumber);


Re: What does getSearcher method of SolrQueryRequest means ?

2014-07-08 Thread Yossi Biton
(Sorry - my mail was sent half ready)

hashes is an array of hash values generated some-how from the image.

So my question is what is the query being done in this part ?
I tried to reconstruct it by my own, by constructing select query with the
hash values seperated by OR but the results were different.
Any one can tell me why ?

This where the source code is : http://code.google.com/p/lire/



On Wed, Jul 9, 2014 at 1:29 AM, Yossi Biton  wrote:

> Hello there,
>
> I'm using a project named LIRE for image retrieval based on sole platform.
> There is part of the code which i can't understand, so maybe you could
> help me.
>
> The project implements request handler named lireq :
> public class LireRequestHandler extends RequestHandlerBase
>
> The search method in this handler is computed from lucene search +
> reranking.
> The first part goes like this :
> public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
> throws Exception {
> ...
> BooleanQuery query = new BooleanQuery();
> for (int i = 0; i < numHashes; i++) {
> query.add(new BooleanClause(new TermQuery(new Term(paramField,
> Integer.toHexString(hashes[i]))), BooleanClause.Occur.SHOULD));
> }
>
> SolrIndexSearcher searcher = req.getSearcher()
> TopDocs docs = searcher.search(query, candidateResultNumber);
>



-- 

יוסי