Hi All,
The DataImportHandler is the most fantastic thing that has recently
come to Solr. Thank you.
I'm noticing that when I use variables in nested entities that
square brackets are wrapped around the variable value when they are
used. For example ${x.url} used in the "tika" entity below resolves as
[http://publicdomain.ca/content/Sample.pdf] (note the square brackets)
so I get the error in my log:
SEVERE: Exception thrown while getting data
java.net.MalformedURLException: no protocol:
[http://publicdomain.ca/content/Sample.pdf]
at java.net.URL.<init>(URL.java:567)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDat
aSource.java:78)
at
org.apache.solr.handler.dataimport.BinURLDataSource.getData(BinURLDat
aSource.java:38)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
tityProcessor.java:98)
I encountered this previously when I tried to concatenate fields
from different entities into one field. I worked around this by
gathering fields with an xsl. Not being able to resolve the url for
Tika is a little more problematic.
*Is this a bug? If not, how do I remove the brackets so that I can
use my variable as it was meant?*
<dataConfig>
<dataSource type="BinURLDataSource" name="bin"/>
<dataSource type="FileDataSource" name="fileReader"/>
<document>
<entity name="f" processor="FileListEntityProcessor" baseDir="/home/pgwillia/content"
dataSource="null" fileName=".*xml" rootEntity="false">
<entity name="x" processor="XPathEntityProcessor" dataSource="fileReader"
transformer="TemplateTransformer,RegexTransformer" forEach="/RDF/Description" url="${f.fileAbsolutePath}">
...
<field column="url" xpath="/RDF/Description/identifier"
regex="http://privatedomain:8080/content/" replaceWith="http://publicdomain.ca/content/"/>
<entity name="tika" processor="TikaEntityProcessor" url="${x.url}"
dataSource="bin" format="text">
<field column="fulltext" name="text"/>
</entity>
</entity>
</entity>
</document>
</dataConfig>
Many thanks,
Tricia