Sanity check: ResonseWriter directly to a database?
Hello all, Are there any hidden gotchas--or even basic suggestions--regarding implementing something like a DBResponseWriter that puts responses right into a database? My specific questions are: 1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin JDBC and then perhaps Hibernate libraries? I don't believe so, but I have just enough understanding to be dangerous at the moment. 2) Is JSONResponseWriter a reasonable copy/paste starting point for me? Is there anything that might match better, especially regarding initialization and connection pooling? 3) Say I have a read-write single-core solr server: a vanilla-out-of-the-box example install. Can I concurrently update the underlying index safely with EmbeddedSolrServer? (This is my backup approach, less preferred) I assume "no", one of them has to be read only, but I've learned not to under-estimate the lucene/solr developers. I'm starting with adapting JSONResponseWriter and the http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to indicate all I need to do is package up the appropriate supporting (jdbc) jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g. c:\solr-svn\example\solr\lib). Of course, I need to update my solrconfig.xml to use the new DBResponseWriter. Straight straight JDBC seems like the easiest starting point. If that works, perhaps move the DB stuff to hibernate. Does anyone have a "best practice" suggestion for database access inside a plugin? I rather expect the answer might be "use JNDI and well-configured hibernate; no special problems related to 'inside' a solr plugin." I will eventually be interested in saving both query results and document indexing information, so I expect to do this in both a (custom) ResponseWriter, and ... um... a DocumentAnalysisRequestHandler? I realize embedded solr might be a better choice (performance has been a big issue in my current implementation), and I am looking into that as well. If feasible, I'd like to keep solr "in charge" of the database content through plugins and extensions, rather than keeping both solr and db synced from my (grails) app. Thanks, Sean -- View this message in context: http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25284734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sanity check: ResonseWriter directly to a database?
Avlesh, Great response, just what I was looking for. As far as QueryResponseWriter vs RequestHandler: you're absolutely right, request handling is the way to go. It looks like I can start with something like : public class SearchSavesToDBHandler extends RequestHandlerBase implements SolrCoreAware I am still weighing keeping this logic in my app, However, with solr-cell coming along nicely, and my the nature of my queries (95% pre-defined for content analysis), I am leaning toward the extra work of embedding the processing in solr. I'm still unclear where the best path is, but I think that's fairly specific to my app. Great news about the flexibility of having both approaches be able to work on the same index. That may well save me if I run out of time on the plugin development. Thank for your relply, it was a great help, Sean Avlesh Singh wrote: > >> >> Are there any hidden gotchas--or even basic suggestions--regarding >> implementing something like a DBResponseWriter that puts responses right >> into a database? >> > Absolutely not! A QueryResponseWriter with an empty "write" method > fulfills > all interface obligations. My only question is, why do you want a > ResponeWriter to do this for you? Why not write something outside Solr, > which gets the response, and then puts it in database. If it has to be a > Solr utility, then maybe a RequestHandler. > The only reason I am asking this, is because your QueryResponseWriter will > have to implement a method called "getContentType". Sounds illigical in > your > case. > > Any problems adding non-trivial jars to a solr plugin? >> > None. I have tonnes of them. > > Is JSONResponseWriter a reasonable copy/paste starting point for me? Is >> there anything that might match better, especially regarding >> initialization >> and connection pooling? >> > As I have tried to expalain above, a QueryResponseWriter with an empty > "write" method is just perfect. You can use anyone of the well know > writers > as a starting point. > > Say I have a read-write single-core solr server: a vanilla-out-of-the-box >> example install. Can I concurrently update the underlying index safely >> with >> EmbeddedSolrServer? > > Yes you can! Other searchers will only come to know of changes when they > are > "re-opened". > > Cheers > Avlesh > > On Fri, Sep 4, 2009 at 3:26 AM, seanoc5 wrote: > >> >> Hello all, >> Are there any hidden gotchas--or even basic suggestions--regarding >> implementing something like a DBResponseWriter that puts responses right >> into a database? My specific questions are: >> >> 1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin >> JDBC >> and then perhaps Hibernate libraries? >> I don't believe so, but I have just enough understanding to be dangerous >> at >> the moment. >> >> 2) Is JSONResponseWriter a reasonable copy/paste starting point for me? >> Is >> there anything that might match better, especially regarding >> initialization >> and connection pooling? >> >> 3) Say I have a read-write single-core solr server: a >> vanilla-out-of-the-box >> example install. Can I concurrently update the underlying index safely >> with >> EmbeddedSolrServer? (This is my backup approach, less preferred) >> I assume "no", one of them has to be read only, but I've learned not to >> under-estimate the lucene/solr developers. >> >> I'm starting with adapting JSONResponseWriter and the >> http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to >> indicate all I need to do is package up the appropriate supporting (jdbc) >> jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g. >> c:\solr-svn\example\solr\lib). Of course, I need to update my >> solrconfig.xml >> to use the new DBResponseWriter. >> >> Straight straight JDBC seems like the easiest starting point. If that >> works, >> perhaps move the DB stuff to hibernate. Does anyone have a "best >> practice" >> suggestion for database access inside a plugin? I rather expect the >> answer >> might be "use JNDI and well-configured hibernate; no special problems >> related to 'inside' a solr plugin." I will eventually be interested in >> saving both query results and document indexing information, so I expect >> to >> do this in both a (custom) ResponseWriter, and ... um... a >> DocumentAnalysisRequestHandler? >> >> I realize embedded solr might be a better choice (performance has been a
Customizing solr search: SpanQueries (revisited)
Hi all, I am trying to use SpanQueries to save*all* hits for custom query type (e.g. defType=fooSpanQuery), along with token positions. I have this working in straight lucene, so my challenge is to implement it half-intelligently in solr. At the moment, I can't figure out where and how to customize the 'inner' search process. So far, I have my own SpanQParser, and SpanQParserPlugin, which successfully return a hard-coded span query (but this is not critical for my current challenge, I believe). I also have managed to configure solr to call my custom SpanQueryComponent, which I believe is the focus of my challenge. At this initial stage, I have simply extended QueryComponent, and overriden QueryComponent.process() while I am trying to find my way through the code :-). So, with all that setup, can someone point me in the right direction for custom processing of a query (or just the query results)? A few differences for my use-case are: -- I want to save every hit along with position information. I believe this means I want to use SpanQueries (like I have in lucene), but perhaps there are other options. -- I do not need to build much in the way of a response. This is an automated analysis, so no user will see the solr results. I will save them to a database, but for simplicity just a log.info("Score:{}, Term:{}, TokenNumber:{}",...) would be great at the moment. -- I will always process every span, even those with near zero 'score' I think I want to focus on SpanQParser.process(), probably overriding the functionality in (SolrIndexSearcher)searcher.search(result,cmd) which seems to just call getDocListC(qr,cmd);// ?? is this my main focus point?? Does this seem like a reasonable approach? If so, how do I do it? I think I'm missing something obvious; perhaps there is an easy way to extend SolrIndexSearcher in solrconfig.xml to have my custom SpanQueryComponent call a custom IndexSearcher where I simply override getDocListC()? And for extra-karma-credit: any thoughts on performance gains (or loss?) if I basically drop must of the advanced optimization like TopDocsCollector and such? If have thousands of queries, and want to save *every* span for each query, is there likely to be significant overhead from the optimizations which are intended for users to 'page' through windows of hits? Also, thanks to Grant for replying to my previous inquiry (http://osdir.com/ml/solr-dev.lucene.apache.org/2009-05/msg00010.html). This email is partly me trying to implement his suggestion, and partly just trying to understand basic Solr customization. I tried sending out a previous draft of this message yesterday, but haven't seen it on the lists, so my apologies if this becomes a duplicate post. Thank you, Sean -- View this message in context: http://www.nabble.com/Customizing-solr-search%3A-SpanQueries-%28revisited%29-tp25838412p25838412.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customizing solr search: SpanQueries (revisited)
I'm fairly sure I did a custom (Hit)Collector in lucene-java, but all I can find at the moment are my retro implementations (w/o collectors). I won't bore (or scare?) you with the details, but I follow some of what you're suggesting. I have been able to get straight SpanQueries to work in my custom QueryComponent. I think I followed the same path you describe, but correct me if I misunderstand. I've more-or-less replaced the QueryComponent.process() code: SolrIndexSearcher.QueryResult result = new SolrIndexSearcher.QueryResult(); searcher.search(result, cmd); rb.setResult(result); with (in my overridden process() method): String[] selectFields = {"id", "fileName"}; // the subset of fields I am interested in TopDocs results = searcher.search(cmd.getQuery(), 10); // custom spanquery, and many/all hits /* save hit info (doc & score) */ /* maybe process SpanQuery.getSpans() here, but perhaps try "doc oriented results" processing approach(?) for tokenization caching/optimization? */ The code above _seems_ to work, but I am still in the initial stages at the moment. When I get to the point I have a better understanding of the challenges you mention, I will share thoughts and insights I've gained along the way : -). Thanks for your time and help, Sean hossman wrote: > > > : (e.g. defType=fooSpanQuery), along with token positions. I have this > working > : in straight lucene, so my challenge is to implement it > half-intelligently in > : solr. At the moment, I can't figure out where and how to customize the > : 'inner' search process. > > the first step is to really make sense of how you do this with > lucene-java, so we can find the best corrisponding points in Solr. > > I suspect you are using a custom (Hit)Collector, which is an area in Solr > that isn't easily customizable. Some other issues have brought up the need > to allow custom code to provide a Collector that Solr would use in > addition to it's own Collectors (for building up DocList and DocSet > structiures) but no clear picture has surfaced as to what that API should > really like like to be useful to plugin writers, and still be performant > in the common case. > > The most straight forward way to add custom logic like this would be to > use SolrIndexSearcher just like a regular IndexSearcher, passing your > Collector to the same old methods you would outside of Solr -- this > bypasses Solr's internal DocList & DocSet caching, but in many cases this > may be exactly what you want -- it's the main problem that makes a > generalized pluggable Collector implementation hard to implement: if the > Collector has side effects, it's not clear that cached results are useful > unless they can reproduce the same side effects on cache read. > > For response purposes (if you care) you can always then take the results > of your own data structure, and use it to generate a DocLIst or a DocSet. > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Customizing-solr-search%3A-SpanQueries-%28revisited%29-tp25838412p25884771.html Sent from the Solr - User mailing list archive at Nabble.com.