Try the absolute path on your -Solr- server. That's where DIH runs. Erik
> On Dec 2, 2016, at 08:36, Chris Rogers <chris.rog...@bodleian.ox.ac.uk> wrote: > > Hi all, > > A question regarding using the DIH FileListEntityProcessor with SolrCloud > (solr 6.3.0, zookeeper 3.4.8). > > I get that the config in SolrCloud lives on the Zookeeper node (a different > server from the solr nodes in my setup). > > With this in mind, where is the baseDir attribute in the > FileListEntityProcessor config relative to? I’m seeing the config in the Solr > GUI, and I’ve tried setting it as an absolute path on my Zookeeper server, > but this doesn’t seem to work… any ideas how this should be setup? > > My DIH config is below: > > <dataConfig> > <dataSource type="FileDataSource"/> > <document> > <!-- this outer processor generates a list of files satisfying the > conditions > specified in the attributes --> > <entity name="f" processor="FileListEntityProcessor" > fileName=".*xml" > newerThan="'NOW-5YEARS'" > recursive="true" > rootEntity="false" > dataSource="null" > baseDir="/home/bodl-zoo-svc/files/"> > > <!-- this processor extracts content using Xpath from each file found --> > > <entity name="tei" processor="XPathEntityProcessor" > forEach="/TEI" url="${f.fileAbsolutePath}" > transformer="RegexTransformer" > > <field column="manuscript_title" name="manuscript_title" > xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/> > <field column="repository" name="repository" > xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/> > <field column="id" name="id" > xpath="/TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/> > </entity> > > </entity> > > </document> > </dataConfig> > > > This same script worked as expected on a single solr node (i.e. not in > SolrCloud mode). > > Thanks, > Chris > > -- > Chris Rogers > Digital Projects Manager > Bodleian Digital Library Systems and Services > chris.rog...@bodleian.ox.ac.uk