Marek,

I've wanted to do something like this in the past as well.  However, a rewrite 
that supports the same XML syntax might be better.   There are several problems 
with the design of the Data Import Handler that make it not quite suitable:

- Not designed for Multi-threading
- Bad implementation of XPath

Another issue is that one of the big advantages of Data Import Handler goes 
away at this point, which is that it is hosted within Solr, and has a UI for 
testing within the Solr admin.

A better open-source Java solution might be to connect Solr with Apache Camel - 
http://camel.apache.org/solr.html.

If you are not tied absolutely to pure open-source, and freemium products will 
do, then you might look at Pentaho Spoon and Kettle.   Although Talend is much 
more established in the market, I find Pentaho's XML-based ETL a bit easier to 
integrate as a developer, and unit test and such.   Talend does better when you 
have a full infrastructure set up, but then the attention required to unit 
tests and Git integration seems over the top.

Another powerful way to get things done, depending on what you are indexing, is 
to use LogStash and couple that with Document processing chains.   Many of our 
projects benefit from having a single RDBMS view, perhaps a materialized view, 
that is used for the index.   LogStash does just fine here, pulling from the 
RDBMS and posting each row to Solr.  The hierarchical execution of Data Import 
Handler is very nice, but this can often be handled on the RDBMS side by 
creating a view, maybe using functions to provide some rows.   Many RDBMS 
systems also support federation and the import of XML from files, so that this 
brings XML processing into the picture.

Hoping this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH




-----Original Message-----
From: Marek Ščevlík [mailto:mscev...@codenameprojects.com] 
Sent: Friday, November 18, 2016 9:29 AM
To: solr-user@lucene.apache.org
Subject: Data Import Request Handler isolated into its own project - any 
suggestions?

Hello. My name is Marek Scevlik.



Currently I am working for a small company where we are interested in 
implementing your Sorl 6.3 search engine.



We are hoping to take out from the original source package the Data Import 
Request Handler into its own project and create a usable .jar file out of it.



It should then serve as tool that would allow to connect to a remote server and 
return data for us to our other application that would use the returned data.



What do you think? Would anything like this possible? To isolate out the Data 
Import Request Handler into its own standalone project?



If we could achieve this we won’t mind to share with the community this new 
feature.



I realize this is a first email and may lead into several hundreds so for the 
start my request is very simple and not so high level detailed but I am sure 
you realize it may lead into being quite complex.



So I wonder if anyone replies.



Thanks a lot for any replies and further info or guidance.





Thanks.

Regards Marek Scevlik

Reply via email to