Hi, 1)Solr has various type of caches . We can specify how many documents cache can have at a time. e.g. if windowsize=50 50 results will be cached in queryResult Cache. if user makes a new request to server for results after 50 documents a new request will be sent to the server & server will retrieve next 50 results in the cache. http://wiki.apache.org/solr/SolrCaching Yes, solr looks into the cache to retrieve the fields to be returned.
2) Yes, we can have different tokenizers or filters for index & search. We need not create a different fieldtype. We need to configure the same fieldtype (datatype) for index & search analyzers sections differently. e.g. <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" stored="false" multiValued="true"> *<analyzer type="index">* <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <!--<filter class="solr.SynonymFilterFactory" synonyms="Synonyms.txt" ignoreCase="true" expand="false"/>--> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> * <analyzer type="query">* <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Regards, Abhay On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore <shashik...@gmail.com>wrote: > Hi, > > I am familiar with Lucene and trying out Solr. > > I have index which was created outside solr. The index is fairly > simple with two field - document_id & content. The query result needs > to return all the document IDs. The result need not be ordered by the > score. For this, in Lucene, I use custom hit collector with search to > get results quickly. The index has a few million documents and queries > returning hundreds of thousands of documents are not uncommon. So, the > speed is crucial here. > > Since retrieving the document_id for each document is slow, I am using > FileldCache to store the values of document_id. For all the results > collected (in a bitset) with hit collector, document_id field is > retrieved from the fieldcache. > > 1. How can I effectively disable scoring? I have read that > ConstantScoreQuery is quite fast, but from the code, I see that it is > used only for wildcard queries. How can I use ConstantScoreQuery for > all the queries (boolean, term, phrase, ..)? Also, is > ConstantScoreQuery as fast as a custom hit collector? > > 2. How can Solr take advantage of the fieldcache while returning the > field document_id? The documentation says, fieldcache can be > explicitly auto warmed with Solr. If fieldcache is available and > initialized at the beginning, will solr look into the cache to > retrieve the fields to be returned? > > 3. If there is an additional field for stemmed_content on which search > needs to use different analyzer, I suppose, that could be specified by > fieldType attribute in the schema. > > Thank you, > > --shashi >