Marek, I've wanted to do something like this in the past as well. However, a rewrite that supports the same XML syntax might be better. There are several problems with the design of the Data Import Handler that make it not quite suitable:
- Not designed for Multi-threading - Bad implementation of XPath Another issue is that one of the big advantages of Data Import Handler goes away at this point, which is that it is hosted within Solr, and has a UI for testing within the Solr admin. A better open-source Java solution might be to connect Solr with Apache Camel - http://camel.apache.org/solr.html. If you are not tied absolutely to pure open-source, and freemium products will do, then you might look at Pentaho Spoon and Kettle. Although Talend is much more established in the market, I find Pentaho's XML-based ETL a bit easier to integrate as a developer, and unit test and such. Talend does better when you have a full infrastructure set up, but then the attention required to unit tests and Git integration seems over the top. Another powerful way to get things done, depending on what you are indexing, is to use LogStash and couple that with Document processing chains. Many of our projects benefit from having a single RDBMS view, perhaps a materialized view, that is used for the index. LogStash does just fine here, pulling from the RDBMS and posting each row to Solr. The hierarchical execution of Data Import Handler is very nice, but this can often be handled on the RDBMS side by creating a view, maybe using functions to provide some rows. Many RDBMS systems also support federation and the import of XML from files, so that this brings XML processing into the picture. Hoping this helps, Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and Communications Systems, National Library of Medicine, NIH -----Original Message----- From: Marek Ščevlík [mailto:mscev...@codenameprojects.com] Sent: Friday, November 18, 2016 9:29 AM To: solr-user@lucene.apache.org Subject: Data Import Request Handler isolated into its own project - any suggestions? Hello. My name is Marek Scevlik. Currently I am working for a small company where we are interested in implementing your Sorl 6.3 search engine. We are hoping to take out from the original source package the Data Import Request Handler into its own project and create a usable .jar file out of it. It should then serve as tool that would allow to connect to a remote server and return data for us to our other application that would use the returned data. What do you think? Would anything like this possible? To isolate out the Data Import Request Handler into its own standalone project? If we could achieve this we won’t mind to share with the community this new feature. I realize this is a first email and may lead into several hundreds so for the start my request is very simple and not so high level detailed but I am sure you realize it may lead into being quite complex. So I wonder if anyone replies. Thanks a lot for any replies and further info or guidance. Thanks. Regards Marek Scevlik