Hi , Thanks Chris . For every document that matches the query i want to able to compute the following set of features for a query document pair
LuceneScore ( The vector space score that lucene gives to each doc) LinkScore ( computed from nutch ) OpicScore ( computed from nutch) co-rd in title,content,anchor,url wt of Entity in title,content,anchor,url length of title,content,anchor,url sum-of-tf in title,content,anchor,url sum-of-norm-tf in title,content,anchor,url min-of-tf in title,content,anchor,url max-of-tf in title,content,anchor,url variance-of-tf in title,content,anchor,url sum-of-tf-idf in title,content,anchor,url site-reputation-score enity-support-score domain score url-click-count query-url-click-count num-of-slashes-in-url Based on these above features i want to build a machine learned model that will learn to rank/score the documents .i am trying to understand how to compute the features efficiently on the fly. Looking into the index and computing these features seems to be very slow . So for the time being i wanted to implement the same by looking into the TopK documents.Few of these features has to be computed on the fly and some of them are computed while indexing and stored in the index . I need to be able to look into all features to score/rank the final set of documents. Thanks , Pom.. On Sat, Apr 27, 2013 at 5:43 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > : used to call the lucene IndexSearcher . As the documents are collected in > : TopDocs in Lucene , before that is passed back to Nutch , i used to look > : into the top K matching documents , consult some external repository > : and further score the Top K documents and reorder them in the TopDocs > array > : . These reordered TopDocs is passed to Nutch . All these reordering > code > : was implemented by Extending the lucene IndexSearcher class . > > 1) that's basically the same info you provided before -- it still doesn't > really tell us anything about what your current logic does with the top K > documents and how/why/when you decide to reorder them or by how much -- > details that are kind of important in being able to provide you with any > meaningful advice on how to achieve your goal using hte plugin hooks > available in Solr. > > 2) if you only care about re-ordering the Top K documents using some > secret sauce, then i would suggest you just set rows=K and let Solr do > it's thing, the post process the reuslts -- either in your client, or in a > SearchComponent that modifies the SolrDocumentList produces by > QueryComponent. > > : > can you elaborate on what exactly your "some logic" involves? > ... > : > https://people.apache.org/~hossman/#xyproblem > : > XY Problem > : > > : > Your question appears to be an "XY Problem" ... that is: you are > dealing > : > with "X", you are assuming "Y" will help you, and you are asking about > "Y" > : > without giving more details about the "X" so that we can understand the > : > full issue. Perhaps the best solution doesn't involve "Y" at all? > : > See Also: http://www.perlmonks.org/index.pl?node_id=542341 > > > -Hoss >