TermVector (TF-IDF Scores) From Subset of Documents
I have an index of about 3 million documents, and specific list of document ids that belong in that 3 million (somewhere around 20-50 documents on average). With my filtered list of documents I want to be able to get TF-IDF scores calculated based on only that small subset, instead of the scores from the entire 3 million document index. Is there an easy way to do this using a filtered/subquery, or via any other means? Presently I am testing by creating a new index out of the subset of documents to get the TF-IDF scores, but obviously that is not going to work or scale in a finished implementation. Thanks in advance. -- View this message in context: http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-From-Subset-of-Documents-tp26105328p26105328.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TermVector (TF-IDF Scores) or MoreLikeThis From Subset of Documents
peelman wrote: > > I have an index of about 3 million documents, and specific list of > document ids that belong in that 3 million (somewhere around 20-50 > documents on average). With my filtered list of documents I want to be > able to get TF-IDF scores or run a MoreLikeThis query against ONE > particular document but calculated based on only that small subset, > instead of the scores from the entire 3 million document index. > > Is there an easy way to do this using a filtered/subquery, or via any > other means? > > Presently I am testing by creating a new index out of the subset of > documents to get the TF-IDF scores, but obviously that is not going to > work or scale in a finished implementation. > > Thanks in advance. > -- View this message in context: http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-or-MoreLikeThis-From-Subset-of-Documents-tp26105328p26105460.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TermVector (TF-IDF Scores) From Subset of Documents
Indeed I have used this already, buy unless I am missing something this will always return scores based on the entire index. I see now way from the documentation to have it recalculate TF-IDF scores using only a subset of documents. Am I missing something? Are you saying I can do a filter query us fq= and then use this request handler to get different TF-IDF scores? Grant Ingersoll-6 wrote: > > Have a look at the TermVectorComponent: > http://wiki.apache.org/solr/TermVectorComponent > . That might help. > > On Oct 28, 2009, at 10:30 PM, peelman wrote: > >> >> I have an index of about 3 million documents, and specific list of >> document >> ids that belong in that 3 million (somewhere around 20-50 documents on >> average). With my filtered list of documents I want to be able to get >> TF-IDF scores calculated based on only that small subset, instead of >> the >> scores from the entire 3 million document index. >> >> Is there an easy way to do this using a filtered/subquery, or via >> any other >> means? >> >> Presently I am testing by creating a new index out of the subset of >> documents to get the TF-IDF scores, but obviously that is not going >> to work >> or scale in a finished implementation. >> >> Thanks in advance. >> -- >> View this message in context: >> http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-From-Subset-of-Documents-tp26105328p26105328.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://www.nabble.com/TermVector-%28TF-IDF-Scores%29-or-MoreLikeThis-From-Subset-of-Documents-tp26105328p26114900.html Sent from the Solr - User mailing list archive at Nabble.com.