Re: Data Import Request Handler isolated into its own project - any suggestions?

Marek Ščevlík Sat, 26 Nov 2016 14:36:07 -0800

I ran my jar application beside solr running instance where I want to
trigger a DIH import.
I tried this approach:


String urlString1 = "http://localhost:8983/solr/db/dataimport";;
SolrClient solr1 = new HttpSolrClient.Builder(urlString1).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
SolrRequest request = new QueryRequest(params);
solr1.request(request);

.. and it returns now:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/db/dataimport: Expected mime type
application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/db/dataimport/select. Reason:
<pre>    Not Found</pre></p>
</body>
</html>

So I am still confused now ...

What do you think ? Any ideas?

I am trying to figure it out. Silly think is when I create a simple URL
call with the URL string used in those solr request objects and fire it off
in java it does the right desired thing.

Weird. I think.

Thanks for any replies or help.


2016-11-26 20:03 GMT+01:00 Marek Ščevlík <mscev...@codenameprojects.com>:

> Actually to be honest I realized that I only needed to trigger a data
> import handler from a jar file. Previously this was done in earlier
> versions via the SolrServer object. Now I am thinking if this is OK?:
>
> String urlString1 = "http://localhost:8983/solr/";;
> SolrClient solr1 = new HttpSolrClient.Builder(urlString).build();
>                       
> ModifiableSolrParams params = new ModifiableSolrParams();
> params.set("db","/dataimport");
> params.set("command", "full-import");
> System.out.println(params.toString());
> QueryResponse qresponse1 = solr1.query(params);
>
> System.out.println("response = " + qresponse1);
>
> Output i get from this is: response = {responseHeader={status=0,
> QTime=0,params={wt=javabin,version=2,db=/dataimport,command=full-import}},
> response={numFound=0,start=0,docs=[]}}
>
> There is a core db which come with the examples in solr 6.3 package. It is
> loaded. From web ui admin I can operate it a run the dih reindex process.
>
> I wonder whether this could work ? What do you think? I am trying to call
> DIH whilst solr is running. This code is in a separate jar file that is run
> besides solr instance.
>
> This so far is not working for me. And I wonder why? What do you think?
> Should this work at all? OR perhaps someone else could help out.
>
>
> Thanks anyone for any help.
> ========================
>
> 2016-11-25 19:50 GMT+01:00 Marek Ščevlík <mscev...@codenameprojects.com>:
>
>> I forgot to mention I am creating a jar file beside of a running solr 6.3
>> instance to which I am hoping to attach with java via the
>> SolrDispatchFilter to get at the cores and so then I could work with
>> data in code.
>>
>>
>> 2016-11-25 19:31 GMT+01:00 Marek Ščevlík <mscev...@codenameprojects.com>:
>>
>>> Hi Daniel. Thanks for a reply. I wonder is it now still possibly with
>>> release of Solr 6.3 to get hold of a running instance of the jetty server
>>> that is part of the solution? I found some code for previous versions where
>>> it was captured with this code and one could then obtain cores for a
>>> running solr instance ...
>>>
>>> SolrDispatchFilter solrDispatchFilter = (SolrDispatchFilter) jetty
>>>
>>> .getDispatchFilter().getFilter();
>>>
>>>
>>> I was trying to implement it this way but that is not working out very
>>> well now. I cant seem to get the jetty server object for the running
>>> instance. I tried several combinations but none seemed to work.
>>>
>>> Can you perhaps point me in the right direction?
>>>
>>> Perhaps you may know more than I do at the moment.
>>>
>>>
>>> Any help would be great.
>>>
>>>
>>> Thanks a lot
>>> Regards Marek Scevlik
>>>
>>>
>>>
>>> 2016-11-18 15:53 GMT+01:00 Davis, Daniel (NIH/NLM) [C] <
>>> daniel.da...@nih.gov>:
>>>
>>>> Marek,
>>>>
>>>> I've wanted to do something like this in the past as well.  However, a
>>>> rewrite that supports the same XML syntax might be better.   There are
>>>> several problems with the design of the Data Import Handler that make it
>>>> not quite suitable:
>>>>
>>>> - Not designed for Multi-threading
>>>> - Bad implementation of XPath
>>>>
>>>> Another issue is that one of the big advantages of Data Import Handler
>>>> goes away at this point, which is that it is hosted within Solr, and has a
>>>> UI for testing within the Solr admin.
>>>>
>>>> A better open-source Java solution might be to connect Solr with Apache
>>>> Camel - http://camel.apache.org/solr.html.
>>>>
>>>> If you are not tied absolutely to pure open-source, and freemium
>>>> products will do, then you might look at Pentaho Spoon and Kettle.
>>>>  Although Talend is much more established in the market, I find Pentaho's
>>>> XML-based ETL a bit easier to integrate as a developer, and unit test and
>>>> such.   Talend does better when you have a full infrastructure set up, but
>>>> then the attention required to unit tests and Git integration seems over
>>>> the top.
>>>>
>>>> Another powerful way to get things done, depending on what you are
>>>> indexing, is to use LogStash and couple that with Document processing
>>>> chains.   Many of our projects benefit from having a single RDBMS view,
>>>> perhaps a materialized view, that is used for the index.   LogStash does
>>>> just fine here, pulling from the RDBMS and posting each row to Solr.  The
>>>> hierarchical execution of Data Import Handler is very nice, but this can
>>>> often be handled on the RDBMS side by creating a view, maybe using
>>>> functions to provide some rows.   Many RDBMS systems also support
>>>> federation and the import of XML from files, so that this brings XML
>>>> processing into the picture.
>>>>
>>>> Hoping this helps,
>>>>
>>>> Dan Davis, Systems/Applications Architect (Contractor),
>>>> Office of Computer and Communications Systems,
>>>> National Library of Medicine, NIH
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Marek Ščevlík [mailto:mscev...@codenameprojects.com]
>>>> Sent: Friday, November 18, 2016 9:29 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Data Import Request Handler isolated into its own project -
>>>> any suggestions?
>>>>
>>>> Hello. My name is Marek Scevlik.
>>>>
>>>>
>>>>
>>>> Currently I am working for a small company where we are interested in
>>>> implementing your Sorl 6.3 search engine.
>>>>
>>>>
>>>>
>>>> We are hoping to take out from the original source package the Data
>>>> Import Request Handler into its own project and create a usable .jar file
>>>> out of it.
>>>>
>>>>
>>>>
>>>> It should then serve as tool that would allow to connect to a remote
>>>> server and return data for us to our other application that would use the
>>>> returned data.
>>>>
>>>>
>>>>
>>>> What do you think? Would anything like this possible? To isolate out
>>>> the Data Import Request Handler into its own standalone project?
>>>>
>>>>
>>>>
>>>> If we could achieve this we won’t mind to share with the community this
>>>> new feature.
>>>>
>>>>
>>>>
>>>> I realize this is a first email and may lead into several hundreds so
>>>> for the start my request is very simple and not so high level detailed but
>>>> I am sure you realize it may lead into being quite complex.
>>>>
>>>>
>>>>
>>>> So I wonder if anyone replies.
>>>>
>>>>
>>>>
>>>> Thanks a lot for any replies and further info or guidance.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks.
>>>>
>>>> Regards Marek Scevlik
>>>>
>>>
>>>
>>
>

Re: Data Import Request Handler isolated into its own project - any suggestions?

Reply via email to