Hi All, Is there a DIH for HDFS? I see this old feature request [0 <https://issues.apache.org/jira/browse/SOLR-2096>] that never seems to have gone anywhere. Google searches and searches on this list don't get me to far.
Essentially my workflow is that I have many thousands of XML documents stored in hdfs. I run an xslt transformation in spark [1 <https://github.com/elsevierlabs-os/spark-xml-utils>]. This transforms to the expected solr input of <add><doc><field ... /></doc></add>. This is than written the back to hdfs. Now how do I get it back to solr? I suppose I could move the data back to the local fs, but on the surface that feels like the wrong way. I don't need to store the documents in HDFS after the spark transformation, I wonder if I can write them using solrj. However, I am not really familiar with solrj. I am also running a single node. Most of the material I have read on spark-solr expects you to be running SolrCloud. Best, Matt [0] https://issues.apache.org/jira/browse/SOLR-2096 [1] https://github.com/elsevierlabs-os/spark-xml-utils