from:"peelman"

TermVector (TF-IDF Scores) From Subset of Documents

2009-10-28 Thread peelman


I have an index of about 3 million documents, and specific list of document
ids that belong in that 3 million (somewhere around 20-50 documents on
average).  With my filtered list of documents I want to be able to get
TF-IDF scores calculated based on only that small subset, instead of the
scores from the entire 3 million document index.

Is there an easy way to do this using a filtered/subquery, or via any other
means?

Presently I am testing by creating a new index out of the subset of
documents to get the TF-IDF scores, but obviously that is not going to work
or scale in a finished implementation.

Thanks in advance.
-- 
View this message in context: 
http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-From-Subset-of-Documents-tp26105328p26105328.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TermVector (TF-IDF Scores) or MoreLikeThis From Subset of Documents

2009-10-28 Thread peelman




peelman wrote:
> 
> I have an index of about 3 million documents, and specific list of
> document ids that belong in that 3 million (somewhere around 20-50
> documents on average).  With my filtered list of documents I want to be
> able to get TF-IDF scores or run a MoreLikeThis query against ONE
> particular document but calculated based on only that small subset,
> instead of the scores from the entire 3 million document index.
> 
> Is there an easy way to do this using a filtered/subquery, or via any
> other means?
> 
> Presently I am testing by creating a new index out of the subset of
> documents to get the TF-IDF scores, but obviously that is not going to
> work or scale in a finished implementation.
> 
> Thanks in advance.
> 

-- 
View this message in context: 
http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-or-MoreLikeThis-From-Subset-of-Documents-tp26105328p26105460.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TermVector (TF-IDF Scores) From Subset of Documents

2009-10-29 Thread peelman


Indeed I have used this already, buy unless I am missing something this will
always return scores based on the entire index.  I see now way from the
documentation to have it recalculate TF-IDF scores using only a subset of
documents.  Am I missing something?

Are you saying I can do a filter query us fq= and then use this request
handler to get different TF-IDF scores?


Grant Ingersoll-6 wrote:
> 
> Have a look at the TermVectorComponent:
> http://wiki.apache.org/solr/TermVectorComponent 
> .  That might help.
> 
> On Oct 28, 2009, at 10:30 PM, peelman wrote:
> 
>>
>> I have an index of about 3 million documents, and specific list of  
>> document
>> ids that belong in that 3 million (somewhere around 20-50 documents on
>> average).  With my filtered list of documents I want to be able to get
>> TF-IDF scores calculated based on only that small subset, instead of  
>> the
>> scores from the entire 3 million document index.
>>
>> Is there an easy way to do this using a filtered/subquery, or via  
>> any other
>> means?
>>
>> Presently I am testing by creating a new index out of the subset of
>> documents to get the TF-IDF scores, but obviously that is not going  
>> to work
>> or scale in a finished implementation.
>>
>> Thanks in advance.
>> -- 
>> View this message in context:
>> http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-From-Subset-of-Documents-tp26105328p26105328.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-or-MoreLikeThis-From-Subset-of-Documents-tp26105328p26114900.html
Sent from the Solr - User mailing list archive at Nabble.com.

TermVector (TF-IDF Scores) From Subset of Documents

Re: TermVector (TF-IDF Scores) or MoreLikeThis From Subset of Documents

Re: TermVector (TF-IDF Scores) From Subset of Documents

3 matches

Site Navigation

Mail list logo

Footer information