Hi all,

Just bumping my question again, as doesn’t seem to have been picked up by 
anyone. Any help would be much appreciated.

Chris

On 02/12/2016, 16:36, "Chris Rogers" <chris.rog...@bodleian.ox.ac.uk> wrote:

    Hi all,
    
    A question regarding using the DIH FileListEntityProcessor with SolrCloud 
(solr 6.3.0, zookeeper 3.4.8).
    
    I get that the config in SolrCloud lives on the Zookeeper node (a different 
server from the solr nodes in my setup).
    
    With this in mind, where is the baseDir attribute in the 
FileListEntityProcessor config relative to? I’m seeing the config in the Solr 
GUI, and I’ve tried setting it as an absolute path on my Zookeeper server, but 
this doesn’t seem to work… any ideas how this should be setup?
    
    My DIH config is below:
    
    <dataConfig>
      <dataSource type="FileDataSource"/>
      <document>
        <!-- this outer processor generates a list of files satisfying the 
conditions
             specified in the attributes -->
        <entity name="f" processor="FileListEntityProcessor"
                fileName=".*xml"
                newerThan="'NOW-5YEARS'"
                recursive="true"
                rootEntity="false"
                dataSource="null"
                baseDir="/home/bodl-zoo-svc/files/">
    
          <!-- this processor extracts content using Xpath from each file found 
-->
    
          <entity name="tei" processor="XPathEntityProcessor"
                  forEach="/TEI" url="${f.fileAbsolutePath}" 
transformer="RegexTransformer" >
            <field column="manuscript_title" name="manuscript_title" 
xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
            <field column="repository" name="repository" 
xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
            <field column="id" name="id" 
xpath="/TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/>
          </entity>
    
        </entity>
    
      </document>
    </dataConfig>
    
    
    This same script worked as expected on a single solr node (i.e. not in 
SolrCloud mode).
    
    Thanks,
    Chris
    
    --
    Chris Rogers
    Digital Projects Manager
    Bodleian Digital Library Systems and Services
    chris.rog...@bodleian.ox.ac.uk
    

Reply via email to