Hi,

I am familiar with Lucene and trying out Solr.

I have index which was created outside solr. The index is fairly
simple with two field - document_id  & content. The query result needs
to return all the document IDs. The result need not be ordered by the
score. For this, in Lucene, I use custom hit collector with search to
get results quickly. The index has a few million documents and queries
returning hundreds of thousands of documents are not uncommon. So, the
speed is crucial here.

Since retrieving the document_id for each document is slow, I am using
FileldCache to store the values of document_id. For all the results
collected (in a bitset) with hit collector, document_id field is
retrieved from the fieldcache.

1. How can I effectively disable scoring? I have read that
ConstantScoreQuery is quite fast, but from the code, I see that it is
used only for wildcard queries. How can I use ConstantScoreQuery for
all the queries (boolean, term, phrase, ..)?  Also, is
ConstantScoreQuery as fast as a custom hit collector?

2. How can Solr take advantage of the fieldcache while returning the
field document_id? The documentation says, fieldcache can be
explicitly auto warmed with Solr.  If fieldcache is available and
initialized at the beginning, will solr look into the cache to
retrieve the fields to be returned?

3. If there is an additional field for stemmed_content on which search
needs to use different analyzer, I suppose, that could be specified by
fieldType attribute in the schema.

Thank you,

--shashi

Reply via email to