Matthew, Do you have some sort of script calling xslt? Sorry, I do not know Scala and I did not have time to look into your spark utils. The script or Scala could then shell out to curl, or if it is python it could use the request library to send a doc to Solr. Extra points for batching the documents.
Erick The last time I used the post tool, it was spinning up a jvm each time I called it (natch). Is there a simple way to launch it from a Java app server so you can call it repeatedly without the start-up overhead? It has been a few years, maybe I am wrong. Cheers -- Rick On December 6, 2017 5:36:51 PM EST, Erick Erickson <erickerick...@gmail.com> wrote: >Perhaps the bin/post tool? See: >https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/ > >On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth <mgrot...@gmail.com> >wrote: >> Hi All, >> >> Is there a DIH for HDFS? I see this old feature request [0 >> <https://issues.apache.org/jira/browse/SOLR-2096>] that never seems >to have >> gone anywhere. Google searches and searches on this list don't get me >to >> far. >> >> Essentially my workflow is that I have many thousands of XML >documents >> stored in hdfs. I run an xslt transformation in spark [1 >> <https://github.com/elsevierlabs-os/spark-xml-utils>]. This >transforms to >> the expected solr input of <add><doc><field ... /></doc></add>. This >is >> than written the back to hdfs. Now how do I get it back to solr? I >suppose >> I could move the data back to the local fs, but on the surface that >feels >> like the wrong way. >> >> I don't need to store the documents in HDFS after the spark >transformation, >> I wonder if I can write them using solrj. However, I am not really >familiar >> with solrj. I am also running a single node. Most of the material I >have >> read on spark-solr expects you to be running SolrCloud. >> >> Best, >> Matt >> >> >> >> [0] https://issues.apache.org/jira/browse/SOLR-2096 >> [1] https://github.com/elsevierlabs-os/spark-xml-utils -- Sorry for being brief. Alternate email is rickleir at yahoo dot com