Customizing solr search: SpanQueries (revisited)

seanoc5 Sat, 10 Oct 2009 15:01:57 -0700

Hi all,
    I am trying to use SpanQueries to save*all* hits for custom query type
(e.g. defType=fooSpanQuery), along with token positions. I have this working
in straight lucene, so my challenge is to implement it half-intelligently in
solr. At the moment, I can't figure out where and how to customize the
'inner' search process.

So far, I have my own SpanQParser, and SpanQParserPlugin, which
successfully return a hard-coded span query (but this is not critical for my
current challenge, I believe).

I also have managed to configure solr to call my custom
SpanQueryComponent, which I believe is the focus of my challenge. At this
initial stage, I have simply extended QueryComponent, and overriden
QueryComponent.process() while I am trying to find my way through the code
:-).

So, with all that setup, can someone point me in the right direction for
custom processing of a query (or just the query results)? A few differences
for my use-case are:
-- I want to save every hit along with position information. I believe this
means I want to use SpanQueries (like I have in lucene), but perhaps there
are other options.
-- I do not need to build much in the way of a response. This is an
automated analysis, so no user will see the solr results. I will save them
to a database, but for simplicity just a
log.info("Score:{}, Term:{}, TokenNumber:{}",...)
would be great at the moment.
-- I will always process every span, even those with near zero 'score'

I think I want to focus on SpanQParser.process(), probably overriding
the functionality in (SolrIndexSearcher)searcher.search(result,cmd)
which seems to just call
getDocListC(qr,cmd); // ?? is this my main focus point??

Does this seem like a reasonable approach? If so, how do I do it? I
think I'm missing something obvious; perhaps there is an easy way to extend
SolrIndexSearcher in solrconfig.xml to have my custom SpanQueryComponent
call a custom IndexSearcher where I simply override getDocListC()?

And for extra-karma-credit: any thoughts on performance gains (or loss?)
if I basically drop must of the advanced optimization like TopDocsCollector
and such? If have thousands of queries, and want to save *every* span for
each query, is there likely to be significant overhead from the
optimizations which are intended for users to 'page' through windows of
hits?

Also, thanks to Grant for replying to my previous inquiry
(http://osdir.com/ml/solr-dev.lucene.apache.org/2009-05/msg00010.html). This
email is partly me trying to implement his suggestion, and partly just
trying to understand basic Solr customization. I tried sending out a
previous draft of this message yesterday, but haven't seen it on the lists,
so my apologies if this becomes a duplicate post.
Thank you,

Sean
--
View this message in context:
http://www.nabble.com/Customizing-solr-search%3A-SpanQueries-%28revisited%29-tp25838412p25838412.html
Sent from the Solr - User mailing list archive at Nabble.com.

Customizing solr search: SpanQueries (revisited)

Reply via email to