Matthew,
Do you have some sort of script calling xslt? Sorry, I do not know Scala and I 
did not have time to look into your spark utils.  The script or Scala could 
then shell out to curl, or if it is python it could use the request library to 
send a doc to Solr. Extra points for batching the documents. 

Erick
The last time I used the post tool, it was spinning up a jvm each time I called 
it (natch). Is there a simple way to launch it from a Java app server so you 
can call it repeatedly without the start-up overhead? It has been a few years, 
maybe I am wrong.
Cheers -- Rick

On December 6, 2017 5:36:51 PM EST, Erick Erickson <erickerick...@gmail.com> 
wrote:
>Perhaps the bin/post tool? See:
>https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/
>
>On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth <mgrot...@gmail.com>
>wrote:
>> Hi All,
>>
>> Is there a DIH for HDFS? I see this old feature request [0
>> <https://issues.apache.org/jira/browse/SOLR-2096>] that never seems
>to have
>> gone anywhere. Google searches and searches on this list don't get me
>to
>> far.
>>
>> Essentially my workflow is that I have many thousands of XML
>documents
>> stored in hdfs. I run an xslt transformation in spark [1
>> <https://github.com/elsevierlabs-os/spark-xml-utils>]. This
>transforms to
>> the expected solr input of <add><doc><field ... /></doc></add>. This
>is
>> than written the back to hdfs. Now how do I get it back to solr? I
>suppose
>> I could move the data back to the local fs, but on the surface that
>feels
>> like the wrong way.
>>
>> I don't need to store the documents in HDFS after the spark
>transformation,
>> I wonder if I can write them using solrj. However, I am not really
>familiar
>> with solrj. I am also running a single node. Most of the material I
>have
>> read on spark-solr expects you to be running SolrCloud.
>>
>> Best,
>> Matt
>>
>>
>>
>> [0] https://issues.apache.org/jira/browse/SOLR-2096
>> [1] https://github.com/elsevierlabs-os/spark-xml-utils

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Reply via email to