---------- Forwarded message ---------
From: Martin Frank Hansen (MHQ) <[email protected]>
Date: Wed, Oct 10, 2018, 11:15
Subject: DIH for TikaEntityProcessor
To: [email protected] <[email protected]>


Hi,



I am trying to read documents from a file system into Solr, using
dataimporthandler but keep getting the following errors:



Exception while processing: files document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
to java.io.InputStream

         at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)

         at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)

         at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)

         at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)

         at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)

         at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)

         at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)

         at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)

         at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)

         at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)

         at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.ClassCastException: java.io.InputStreamReader
cannot be cast to java.io.InputStream

         at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)

         at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)

         ... 9 more









Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
java.io.InputStream

         at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)

         at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)

         at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)

         at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)

         at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
java.io.InputStream

         at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)

         at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)

         at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)

         ... 4 more

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
java.io.InputStream

         at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)

         at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)

         at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)

         at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)

         at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)

         ... 6 more

Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot
be cast to java.io.InputStream

         at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)

         at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)

         ... 9 more





My data-config file looks as follows:



<dataConfig>

  <dataSource name="bin" type="BinFileDataSource" />

  <document>

      <entity name="files" processor="FileListEntityProcessor" baseDir="
D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false"
dataSource="bin" onError="skip">

        <field column="fileAbsolutePath" name="id" />



        <entity

         name="read_file"

         processor="TikaEntityProcessor"

         url="${files.fileAbsolutePath}"

         >

          <field column="text" name="content" />

        </entity>

      </entity>

  </document>

</dataConfig>



And in the Schema I basically have two fields:



<field name="Id" type="string" indexed="true" stored="true" required="true"
multiValued="false"/>

<field name="text" type="text_general" indexed="true" stored="false"
multiValued="true"/>



Any help is appreciated.





*Martin Frank Hansen*



Beskyttelse af dine personlige oplysninger er vigtig for os. Her
finder du KMD’s
Privatlivspolitik <http://www.kmd.dk/Privatlivspolitik>, der fortæller,
hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s
Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how we process
your personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi
dig slette e-mailen i dit system uden at videresende eller kopiere den.
Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
for virus og andre fejl, som kan påvirke computeren eller it-systemet,
hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi
påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse
med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you
have received this message by mistake, please inform the sender of the
mistake by sending a reply, then delete the message from your system
without making, distributing or retaining any copies of it. Although we
believe that the message and any attachments are free from viruses and
other errors that might affect the computer or it-system where it is
received and read, the recipient opens the message at his or her own risk.
We assume no responsibility for any loss or damage arising from the receipt
or use of this message.

Reply via email to