Hi,
1)Solr has various type of caches . We can specify how many documents cache
can have at a time.
e.g. if windowsize=50
50 results will be cached in queryResult Cache.
if user makes a new request to server for results after 50
documents a new request will be sent to the server & server will retrieve
next 50 results in the cache.
http://wiki.apache.org/solr/SolrCaching
Yes, solr looks into the cache to retrieve the fields to be returned.
2) Yes, we can have different tokenizers or filters for index & search. We
need not create a different fieldtype. We need to configure the same
fieldtype (datatype) for index & search analyzers sections differently.
e.g.
<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" stored="false" multiValued="true">
*<analyzer type="index">*
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="solr.SynonymFilterFactory"
synonyms="Synonyms.txt" ignoreCase="true" expand="false"/>-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
* <analyzer type="query">*
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Regards,
Abhay
On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore <[email protected]>wrote:
> Hi,
>
> I am familiar with Lucene and trying out Solr.
>
> I have index which was created outside solr. The index is fairly
> simple with two field - document_id & content. The query result needs
> to return all the document IDs. The result need not be ordered by the
> score. For this, in Lucene, I use custom hit collector with search to
> get results quickly. The index has a few million documents and queries
> returning hundreds of thousands of documents are not uncommon. So, the
> speed is crucial here.
>
> Since retrieving the document_id for each document is slow, I am using
> FileldCache to store the values of document_id. For all the results
> collected (in a bitset) with hit collector, document_id field is
> retrieved from the fieldcache.
>
> 1. How can I effectively disable scoring? I have read that
> ConstantScoreQuery is quite fast, but from the code, I see that it is
> used only for wildcard queries. How can I use ConstantScoreQuery for
> all the queries (boolean, term, phrase, ..)? Also, is
> ConstantScoreQuery as fast as a custom hit collector?
>
> 2. How can Solr take advantage of the fieldcache while returning the
> field document_id? The documentation says, fieldcache can be
> explicitly auto warmed with Solr. If fieldcache is available and
> initialized at the beginning, will solr look into the cache to
> retrieve the fields to be returned?
>
> 3. If there is an additional field for stemmed_content on which search
> needs to use different analyzer, I suppose, that could be specified by
> fieldType attribute in the schema.
>
> Thank you,
>
> --shashi
>