Thank you Erick for your answer. I read your post and I found it very interesting. Unfortunately it is not suitable for my use case: * security is not an issue, since the dbs will be fully replicated in the same infrastructure. * there are no bazillion of data (something like 300K html documents) * if I choose client side approach, I'd have to write twice (Solr index is a merge of 2 dbs). * I'd like to pull data from Solr unless it is absolutely impossible (that was the reason I chose Solr over Lucene). * least but not last, ATM my real issue is to found a reusable solution to index hierarchical data (unless it already exists).
Twitter :http://www.twitter.com/m_cucchiara G+ :https://plus.google.com/107903711540963855921 Linkedin :http://www.linkedin.com/in/mauriziocucchiara VisualizeMe: http://vizualize.me/maurizio.cucchiara?r=maurizio.cucchiara Maurizio Cucchiara On 3 October 2012 14:06, Erick Erickson <erickerick...@gmail.com> wrote: > Maurizio: > > DIH is great for its intended purpose, but when things get complex I generally > prefer writing something in SolrJ, it gives much finer-grained control > over "special circumstances". Plus, you can see everything that > happens. Here's a blog with a skeletal SolrJ program, you can just > pull out all the local-tika stuff. > > http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ > > The take-away IMO is that once you've spent some time working with > DIH without getting what you need, something like using an independent > client (SolrJ in this example) is worth considering.. > > Best > Erick > > On Tue, Oct 2, 2012 at 12:59 PM, Maurizio Cucchiara > <mcucchi...@apache.org> wrote: >> Hi all, >> I'm trying to import some hierarchical data (stored in MySQL) on Solr, >> using DataImportHandler. >> Unfortunately, as most of you already knows, MySQL has no support for >> recursive queries, so there is no way to get hierarchical data stored >> as an adjacency list. >> So I considered writing a DIH custom transformers which given a >> specified sql (like select * from categories) and a value (f.e. >> category_id): >> * fetches all data >> * builds an hierarchical representation of the fetched data >> * optionally caches the hierarchical data structure >> * then returns 2 multi-valued lists which contain the 2 full paths (as >> String and as Number) >> >> Is there something out of the box? >> Alternatively, does the above approach sound good? >> >> TIA >> >> >> Twitter :http://www.twitter.com/m_cucchiara >> G+ :https://plus.google.com/107903711540963855921 >> Linkedin :http://www.linkedin.com/in/mauriziocucchiara >> VisualizeMe: http://vizualize.me/maurizio.cucchiara?r=maurizio.cucchiara >> >> Maurizio Cucchiara