Hi  ,

    Thanks Chris . For every document that matches the query i want to able
to compute the following set of features for a query document pair

    LuceneScore ( The vector space score that lucene gives to each doc)
    LinkScore      ( computed from nutch )
    OpicScore     ( computed from nutch)
   co-rd in title,content,anchor,url
   wt of Entity in title,content,anchor,url
   length of title,content,anchor,url
   sum-of-tf in title,content,anchor,url
   sum-of-norm-tf in title,content,anchor,url
   min-of-tf in title,content,anchor,url
   max-of-tf in title,content,anchor,url
   variance-of-tf in title,content,anchor,url
   sum-of-tf-idf in title,content,anchor,url
   site-reputation-score
   enity-support-score
   domain score
  url-click-count
   query-url-click-count
  num-of-slashes-in-url

Based on these above features i want to build a machine learned model that
will learn to rank/score the documents .i am trying to understand how to
compute the features efficiently on the fly. Looking into the index and
computing these features seems to be very slow . So for the time being i
wanted to implement the same by looking into the TopK documents.Few of
these features has to be computed on the fly and some of them are computed
while indexing and stored in the index . I need to be able to look into all
features to score/rank the final set of documents.

Thanks ,
Pom..

On Sat, Apr 27, 2013 at 5:43 AM, Chris Hostetter
<hossman_luc...@fucit.org>wrote:

> : used to call the lucene IndexSearcher . As the documents are collected in
> : TopDocs in Lucene , before that is passed back to Nutch , i used to look
> : into the top K matching documents , consult some external repository
> : and further score the Top K documents and reorder them in the TopDocs
> array
> : . These reordered  TopDocs is passed to Nutch .  All these reordering
> code
> : was implemented by Extending the lucene IndexSearcher class .
>
> 1) that's basically the same info you provided before -- it still doesn't
> really tell us anything about what your current logic does with the top K
> documents and how/why/when you decide to reorder them or by how much --
> details that are kind of important in being able to provide you with any
> meaningful advice on how to achieve your goal using hte plugin hooks
> available in Solr.
>
> 2) if you only care about re-ordering the Top K documents using some
> secret sauce, then i would suggest you just set rows=K and let Solr do
> it's thing, the post process the reuslts -- either in your client, or in a
> SearchComponent that modifies the SolrDocumentList produces by
> QueryComponent.
>
> : > can you elaborate on what exactly your "some logic" involves?
>         ...
> : > https://people.apache.org/~hossman/#xyproblem
> : > XY Problem
> : >
> : > Your question appears to be an "XY Problem" ... that is: you are
> dealing
> : > with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> : > without giving more details about the "X" so that we can understand the
> : > full issue.  Perhaps the best solution doesn't involve "Y" at all?
> : > See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
> -Hoss
>

Reply via email to