Sanity check: ResonseWriter directly to a database?

2009-09-03 Thread seanoc5

Hello all,
Are there any hidden gotchas--or even basic suggestions--regarding
implementing something like a DBResponseWriter that puts responses right
into a database? My specific questions are:

1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin JDBC
and then perhaps Hibernate libraries? 
I don't believe so, but I have just enough understanding to be dangerous at
the moment.

2) Is JSONResponseWriter a reasonable copy/paste starting point for me?  Is
there anything that might match better, especially regarding initialization
and connection pooling?

3) Say I have a read-write single-core solr server: a vanilla-out-of-the-box
example install. Can I concurrently update the underlying index safely with
EmbeddedSolrServer? (This is my backup approach, less preferred)
I assume "no", one of them has to be read only, but I've learned not to
under-estimate the lucene/solr developers.  

I'm starting with adapting JSONResponseWriter and the 
http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to
indicate all I need to do is package up the appropriate supporting (jdbc)
jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g.
c:\solr-svn\example\solr\lib). Of course, I need to update my solrconfig.xml
to use the new DBResponseWriter.

Straight straight JDBC seems like the easiest starting point. If that works,
perhaps move the DB stuff to hibernate.  Does anyone have a "best practice"
suggestion for database access inside a plugin? I rather expect the answer
might be "use JNDI and well-configured hibernate; no special problems
related to 'inside' a solr plugin." I will eventually be interested in
saving both query results and document indexing information, so I expect to
do this in both a (custom) ResponseWriter, and ... um... a
DocumentAnalysisRequestHandler?   

I realize embedded solr might be a better choice (performance has been a big
issue in my current implementation), and I am looking into that as well. If
feasible, I'd like to keep solr "in charge" of the database content through
plugins and extensions, rather than keeping both solr and db synced from my
(grails) app. 
Thanks,

Sean


-- 
View this message in context: 
http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25284734.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Sanity check: ResonseWriter directly to a database?

2009-09-03 Thread seanoc5

Avlesh,
Great response, just what I was looking for. 

As far as QueryResponseWriter vs RequestHandler: you're absolutely right,
request handling is the way to go. It looks like I can start with something
like : 
public class SearchSavesToDBHandler extends RequestHandlerBase implements
SolrCoreAware

I am still weighing keeping this logic in my app, However, with solr-cell
coming along nicely, and my the nature of my queries (95% pre-defined for
content analysis), I am leaning toward the extra work of embedding the
processing in solr. I'm still unclear where the best path is, but I think
that's fairly specific to my app.

Great news about the flexibility of having both approaches be able to work
on the same index. That may well save me if I run out of time on the plugin
development. 
Thank for your relply, it was a great help,

Sean



Avlesh Singh wrote:
> 
>>
>> Are there any hidden gotchas--or even basic suggestions--regarding
>> implementing something like a DBResponseWriter that puts responses right
>> into a database?
>>
> Absolutely not! A QueryResponseWriter with an empty "write" method
> fulfills
> all interface obligations. My only question is, why do you want a
> ResponeWriter to do this for you? Why not write something outside Solr,
> which gets the response, and then puts it in database. If it has to be a
> Solr utility, then maybe a RequestHandler.
> The only reason I am asking this, is because your QueryResponseWriter will
> have to implement a method called "getContentType". Sounds illigical in
> your
> case.
> 
> Any problems adding non-trivial jars to a solr plugin?
>>
> None. I have tonnes of them.
> 
> Is JSONResponseWriter a reasonable copy/paste starting point for me?  Is
>> there anything that might match better, especially regarding
>> initialization
>> and connection pooling?
>>
> As I have tried to expalain above, a QueryResponseWriter with an empty
> "write" method is just perfect. You can use anyone of the well know
> writers
> as a starting point.
> 
> Say I have a read-write single-core solr server: a vanilla-out-of-the-box
>> example install. Can I concurrently update the underlying index safely
>> with
>> EmbeddedSolrServer?
> 
> Yes you can! Other searchers will only come to know of changes when they
> are
> "re-opened".
> 
> Cheers
> Avlesh
> 
> On Fri, Sep 4, 2009 at 3:26 AM, seanoc5  wrote:
> 
>>
>> Hello all,
>> Are there any hidden gotchas--or even basic suggestions--regarding
>> implementing something like a DBResponseWriter that puts responses right
>> into a database? My specific questions are:
>>
>> 1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin
>> JDBC
>> and then perhaps Hibernate libraries?
>> I don't believe so, but I have just enough understanding to be dangerous
>> at
>> the moment.
>>
>> 2) Is JSONResponseWriter a reasonable copy/paste starting point for me? 
>> Is
>> there anything that might match better, especially regarding
>> initialization
>> and connection pooling?
>>
>> 3) Say I have a read-write single-core solr server: a
>> vanilla-out-of-the-box
>> example install. Can I concurrently update the underlying index safely
>> with
>> EmbeddedSolrServer? (This is my backup approach, less preferred)
>> I assume "no", one of them has to be read only, but I've learned not to
>> under-estimate the lucene/solr developers.
>>
>> I'm starting with adapting JSONResponseWriter and the
>> http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to
>> indicate all I need to do is package up the appropriate supporting (jdbc)
>> jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g.
>> c:\solr-svn\example\solr\lib). Of course, I need to update my
>> solrconfig.xml
>> to use the new DBResponseWriter.
>>
>> Straight straight JDBC seems like the easiest starting point. If that
>> works,
>> perhaps move the DB stuff to hibernate.  Does anyone have a "best
>> practice"
>> suggestion for database access inside a plugin? I rather expect the
>> answer
>> might be "use JNDI and well-configured hibernate; no special problems
>> related to 'inside' a solr plugin." I will eventually be interested in
>> saving both query results and document indexing information, so I expect
>> to
>> do this in both a (custom) ResponseWriter, and ... um... a
>> DocumentAnalysisRequestHandler?
>>
>> I realize embedded solr might be a better choice (performance has been a

Customizing solr search: SpanQueries (revisited)

2009-10-10 Thread seanoc5

Hi all,
I am trying to use SpanQueries to save*all* hits for custom query type
(e.g. defType=fooSpanQuery), along with token positions. I have this working
in straight lucene, so my challenge is to implement it half-intelligently in
solr. At the moment, I can't figure out where and how to customize the
'inner' search process.

So far, I have my own SpanQParser, and SpanQParserPlugin, which
successfully return a hard-coded span query (but this is not critical for my
current challenge, I believe).

I also have managed to configure solr to call my custom
SpanQueryComponent, which I believe is the focus of my challenge. At this
initial stage, I have simply extended QueryComponent, and overriden
QueryComponent.process() while I am trying to find my way through the code
:-).

So, with all that setup, can someone point me in the right direction for
custom processing of a query (or just the query results)? A few differences
for my use-case are:
-- I want to save every hit along with position information. I believe this
means I want to use SpanQueries (like I have in lucene), but perhaps there
are other options.
-- I do not need to build much in the way of a response. This is an
automated analysis, so no user will see the solr results. I will save them
to a database, but for simplicity just a
log.info("Score:{}, Term:{}, TokenNumber:{}",...)
would be great at the moment.
-- I will always process every span, even those with near zero 'score'

 I think I want to focus on SpanQParser.process(), probably overriding
the functionality in (SolrIndexSearcher)searcher.search(result,cmd)
which seems to just call
getDocListC(qr,cmd);// ?? is this my main focus point??

Does this seem like a reasonable approach? If so, how do I do it? I
think I'm missing something obvious; perhaps there is an easy way to extend
SolrIndexSearcher in solrconfig.xml to have my custom SpanQueryComponent
call a custom IndexSearcher where I simply override getDocListC()?

And for extra-karma-credit: any thoughts on performance gains (or loss?)
if I basically drop must of the advanced optimization like TopDocsCollector
and such? If have thousands of queries, and want to save *every* span for
each query, is there likely to be significant overhead from the
optimizations which are intended for users to 'page' through windows of
hits?

Also, thanks to Grant for replying to my previous inquiry
(http://osdir.com/ml/solr-dev.lucene.apache.org/2009-05/msg00010.html). This
email is partly me trying to implement his suggestion, and partly just
trying to understand basic Solr customization. I tried sending out a
previous draft of this message yesterday, but haven't seen it on the lists,
so my apologies if this becomes a duplicate post.
Thank you,

Sean
-- 
View this message in context: 
http://www.nabble.com/Customizing-solr-search%3A-SpanQueries-%28revisited%29-tp25838412p25838412.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Customizing solr search: SpanQueries (revisited)

2009-10-13 Thread seanoc5

I'm fairly sure I did a custom (Hit)Collector in lucene-java, but all I can
find at the moment are my retro implementations (w/o collectors). I won't
bore (or scare?) you with the details, but I follow some of what you're
suggesting. 

I have been able to get straight SpanQueries to work in my custom
QueryComponent. I think I followed the same path you describe, but correct
me if I misunderstand. I've more-or-less replaced the
QueryComponent.process() code:
SolrIndexSearcher.QueryResult result = new
SolrIndexSearcher.QueryResult();
searcher.search(result, cmd);
rb.setResult(result);

with (in my overridden process() method):
String[] selectFields = {"id", "fileName"};  // the subset of fields
I am interested in
TopDocs results = searcher.search(cmd.getQuery(), 10);   //
custom spanquery, and many/all hits
/* save hit info (doc & score) */
/* maybe process SpanQuery.getSpans() here, but perhaps try "doc
oriented results" processing approach(?) for tokenization
caching/optimization? */

The code above _seems_ to work, but I am still in the initial stages at the
moment. When I get to the point I have a better understanding of the
challenges you mention, I will share thoughts and insights I've gained along
the way : -).

Thanks for your time and help,

Sean




hossman wrote:
> 
> 
> : (e.g. defType=fooSpanQuery), along with token positions. I have this
> working
> : in straight lucene, so my challenge is to implement it
> half-intelligently in
> : solr. At the moment, I can't figure out where and how to customize the
> : 'inner' search process.
> 
> the first step is to really make sense of how you do this with 
> lucene-java, so we can find the best corrisponding points in Solr.
> 
> I suspect you are using a custom (Hit)Collector, which is an area in Solr 
> that isn't easily customizable. Some other issues have brought up the need 
> to allow custom code to provide a Collector that Solr would use in 
> addition to it's own Collectors (for building up DocList and DocSet 
> structiures) but no clear picture has surfaced as to what that API should 
> really like like to be useful to plugin writers, and still be performant 
> in the common case.
> 
> The most straight forward way to add custom logic like this would be to 
> use SolrIndexSearcher just like a regular IndexSearcher, passing your 
> Collector to the same old methods you would outside of Solr -- this 
> bypasses Solr's internal DocList & DocSet caching, but in many cases this 
> may be exactly what you want -- it's the main problem that makes a 
> generalized pluggable Collector implementation hard to implement: if the 
> Collector has side effects, it's not clear that cached results are useful 
> unless they can reproduce the same side effects on cache read.
> 
> For response purposes (if you care) you can always then take the results 
> of your own data structure, and use it to generate a DocLIst or a DocSet.
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Customizing-solr-search%3A-SpanQueries-%28revisited%29-tp25838412p25884771.html
Sent from the Solr - User mailing list archive at Nabble.com.