Re: Retrieving a field from all result docuemnts & couple of more queries

Shashikant Kore Wed, 16 Sep 2009 06:29:19 -0700

No, I don't wish to put a custom Similarity.  Rather, I want an
equivalent of HitCollector where I can bypass the scoring altogether.
And I prefer to do it by changing the configuration.


--shashi

On Wed, Sep 16, 2009 at 6:36 PM, rajan chandi <chandi.ra...@gmail.com> wrote:
> You might be talking about modifying the similarity object to modify scoring
> formula in Lucene!
>
>  $searcher->setSimilarity($similarity);
>  $writer->setSimilarity($similarity);
>
>
> This can very well be done in Solr as SolrIndexWriter inherits from Lucene
> IndexWriter class.
> You might want to download the Solr Source code and take a look at the
> SolrIndexWriter to begin with!
>
> It's in the package - org.apache.solr.update
>
> Thanks
> Rajan
>
> On Wed, Sep 16, 2009 at 5:42 PM, Shashikant Kore <shashik...@gmail.com>wrote:
>
>> Thanks, Abhay.
>>
>> Can someone please throw light on how to disable scoring?
>>
>> --shashi
>>
>> On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar <abhay...@gmail.com> wrote:
>> > Hi,
>> >
>> > 1)Solr has various type of caches . We can specify how many documents
>> cache
>> > can have at a time.
>> >       e.g. if windowsize=50
>> >           50 results will be cached in queryResult Cache.
>> >            if user makes a new request to server for results after 50
>> > documents a new request will be sent to the server & server will retrieve
>> > next             50 results in the cache.
>> >       http://wiki.apache.org/solr/SolrCaching
>> >       Yes, solr looks into the cache to retrieve the fields to be
>> returned.
>> >
>> > 2) Yes, we can have different tokenizers or filters for index & search.
>> We
>> > need not create a different fieldtype. We need to configure the same
>> > fieldtype (datatype) for index & search analyzers sections differently.
>> >
>> >   e.g.
>> >
>> >        <fieldType name="textSpell" class="solr.TextField"
>> > positionIncrementGap="100" stored="false" multiValued="true">
>> >          *<analyzer type="index">*
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >
>> >         <!--<filter class="solr.SynonymFilterFactory"
>> > synonyms="Synonyms.txt" ignoreCase="true" expand="false"/>-->
>> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt"/>
>> >         <filter class="solr.StandardFilterFactory"/>
>> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >       </analyzer>
>> >      * <analyzer type="query">*
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >
>> >         <filter class="solr.StandardFilterFactory"/>
>> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >      </analyzer>
>> >    </fieldType>
>> >
>> >
>> >
>> > Regards,
>> > Abhay
>> >
>> > On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore <shashik...@gmail.com
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >> I am familiar with Lucene and trying out Solr.
>> >>
>> >> I have index which was created outside solr. The index is fairly
>> >> simple with two field - document_id  & content. The query result needs
>> >> to return all the document IDs. The result need not be ordered by the
>> >> score. For this, in Lucene, I use custom hit collector with search to
>> >> get results quickly. The index has a few million documents and queries
>> >> returning hundreds of thousands of documents are not uncommon. So, the
>> >> speed is crucial here.
>> >>
>> >> Since retrieving the document_id for each document is slow, I am using
>> >> FileldCache to store the values of document_id. For all the results
>> >> collected (in a bitset) with hit collector, document_id field is
>> >> retrieved from the fieldcache.
>> >>
>> >> 1. How can I effectively disable scoring? I have read that
>> >> ConstantScoreQuery is quite fast, but from the code, I see that it is
>> >> used only for wildcard queries. How can I use ConstantScoreQuery for
>> >> all the queries (boolean, term, phrase, ..)?  Also, is
>> >> ConstantScoreQuery as fast as a custom hit collector?
>> >>
>> >> 2. How can Solr take advantage of the fieldcache while returning the
>> >> field document_id? The documentation says, fieldcache can be
>> >> explicitly auto warmed with Solr.  If fieldcache is available and
>> >> initialized at the beginning, will solr look into the cache to
>> >> retrieve the fields to be returned?
>> >>
>> >> 3. If there is an additional field for stemmed_content on which search
>> >> needs to use different analyzer, I suppose, that could be specified by
>> >> fieldType attribute in the schema.
>> >>
>> >> Thank you,
>> >>
>> >> --shashi
>> >>
>> >
>>
>

Re: Retrieving a field from all result docuemnts & couple of more queries

Reply via email to