Hi All,

The DataImportHandler is the most fantastic thing that has recently come to Solr. Thank you.

I'm noticing that when I use variables in nested entities that square brackets are wrapped around the variable value when they are used. For example ${x.url} used in the "tika" entity below resolves as [http://publicdomain.ca/content/Sample.pdf] (note the square brackets) so I get the error in my log:

SEVERE: Exception thrown while getting data
java.net.MalformedURLException: no protocol: [http://publicdomain.ca/content/Sample.pdf]
        at java.net.URL.<init>(URL.java:567)
        at java.net.URL.<init>(URL.java:464)
        at java.net.URL.<init>(URL.java:413)
at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDat
aSource.java:78)
at org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDat
aSource.java:38)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
tityProcessor.java:98)

I encountered this previously when I tried to concatenate fields from different entities into one field. I worked around this by gathering fields with an xsl. Not being able to resolve the url for Tika is a little more problematic.

*Is this a bug? If not, how do I remove the brackets so that I can use my variable as it was meant?*

<dataConfig>

   <dataSource type="BinURLDataSource" name="bin"/>

   <dataSource type="FileDataSource" name="fileReader"/>

   <document>

       <entity name="f" processor="FileListEntityProcessor" baseDir="/home/pgwillia/content" 
dataSource="null" fileName=".*xml" rootEntity="false">

           <entity name="x" processor="XPathEntityProcessor" dataSource="fileReader" 
transformer="TemplateTransformer,RegexTransformer" forEach="/RDF/Description" url="${f.fileAbsolutePath}">

                ...

               <field column="url" xpath="/RDF/Description/identifier" 
regex="http://privatedomain:8080/content/"; replaceWith="http://publicdomain.ca/content/"/>

               <entity name="tika" processor="TikaEntityProcessor" url="${x.url}" 
dataSource="bin" format="text">

                       <field column="fulltext" name="text"/>

                </entity>

           </entity>

       </entity>

   </document>

</dataConfig>


Many thanks,
Tricia

Reply via email to