Hi *Chris*, I've never used the DIH, but maybe the "*fileName*" pattern is wrong? fileName="*.*xml*"
Should be: fileName="**.xml*" Regards, *Felipe*. On Mon, Dec 5, 2016 at 9:43 AM, Chris Rogers <chris.rog...@bodleian.ox.ac.uk > wrote: > Hi all, > > Just bumping my question again, as doesn’t seem to have been picked up by > anyone. Any help would be much appreciated. > > Chris > > On 02/12/2016, 16:36, "Chris Rogers" <chris.rog...@bodleian.ox.ac.uk> > wrote: > > Hi all, > > A question regarding using the DIH FileListEntityProcessor with > SolrCloud (solr 6.3.0, zookeeper 3.4.8). > > I get that the config in SolrCloud lives on the Zookeeper node (a > different server from the solr nodes in my setup). > > With this in mind, where is the baseDir attribute in the > FileListEntityProcessor config relative to? I’m seeing the config in the > Solr GUI, and I’ve tried setting it as an absolute path on my Zookeeper > server, but this doesn’t seem to work… any ideas how this should be setup? > > My DIH config is below: > > <dataConfig> > <dataSource type="FileDataSource"/> > <document> > <!-- this outer processor generates a list of files satisfying the > conditions > specified in the attributes --> > <entity name="f" processor="FileListEntityProcessor" > fileName=".*xml" > newerThan="'NOW-5YEARS'" > recursive="true" > rootEntity="false" > dataSource="null" > baseDir="/home/bodl-zoo-svc/files/"> > > <!-- this processor extracts content using Xpath from each file > found --> > > <entity name="tei" processor="XPathEntityProcessor" > forEach="/TEI" url="${f.fileAbsolutePath}" > transformer="RegexTransformer" > > <field column="manuscript_title" name="manuscript_title" > xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/> > <field column="repository" name="repository" > xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/> > <field column="id" name="id" xpath="/TEI/teiHeader/ > fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/> > </entity> > > </entity> > > </document> > </dataConfig> > > > This same script worked as expected on a single solr node (i.e. not in > SolrCloud mode). > > Thanks, > Chris > > -- > Chris Rogers > Digital Projects Manager > Bodleian Digital Library Systems and Services > chris.rog...@bodleian.ox.ac.uk > > >