Also, just wondering, have you have tried to specify dataSource="bin" for
read_file?

On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau <kamuela....@gmail.com> wrote:

> Hi,
>
> I was unable to reproduce the error that you got with the information
> provided.
> Below are the data-config.xml and managed-schema fields I used; the
> data-config is mostly the same
> (I think that BinFileDataSource doesn't actually require a dataSource, so
> I think it's safe to put dataSource="null"):
>
> <dataConfig>
>   <dataSource name="bin" type="BinFileDataSource"/>
>   <document>
>       <entity name="files" processor="FileListEntityProcessor"
> baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
> rootEntity="false" dataSource="bin" onError="skip">
>         <field column="fileAbsolutePath" name="id"/>
>         <entity name="read_file" processor="TikaEntityProcessor"
> url="${files.fileAbsolutePath}">
>           <field column="text" name="text"/>
>         </entity>
>       </entity>
>   </document>
> </dataConfig>
>
> And from the managed schema:
>     <field name="id" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>     <!-- docValues are enabled by default for long type so we don't need
> to index the version field  -->
>     <field name="_version_" type="plong" indexed="false" stored="false"/>
>     <field name="_root_" type="string" indexed="true" stored="false"
> docValues="false" />
>     <field name="text" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>
> When I had field column="text" name="content", the documents were still
> indexed, but the text/content was not (as I had no content field in the
> schema).
> I used the default config, and Solr version 7.5.0; I was able to import
> the data just fine (I also tested with .*DOC). Is there any other
> information you can provide that can help me reproduce this error?
>
>
>
>
> On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <m...@kmd.dk>
> wrote:
>
>> Hi again,
>>
>>
>>
>> Can anybody help me? Any suggestions to why I am getting the error below?
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*, Senior Data Analytiker
>>
>> Data, IM & Analytics
>>
>> [image: cid:image001.png@01D383C9.6C129A60]
>>
>>
>> Lautrupparken 40-42, DK-2750 Ballerup
>> E-mail m...@kmd.dk  Web www.kmd.dk
>> Mobil +4525571418
>>
>>
>>
>> *Fra:* Martin Frank Hansen (MHQ)
>> *Sendt:* 10. oktober 2018 10:15
>> *Til:* solr-user <solr-user@lucene.apache.org>
>> *Emne:* DIH for TikaEntityProcessor
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am trying to read documents from a file system into Solr, using
>> dataimporthandler but keep getting the following errors:
>>
>>
>>
>> Exception while processing: files document : 
>> null:org.apache.solr.handler.dataimport.DataImportHandlerException: 
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to 
>> java.io.InputStream
>>
>>          at 
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>>
>>          at 
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>>
>>          at 
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>>
>>          at 
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>>
>>          at 
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>>
>>          at 
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>>
>>          at 
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>>
>>          at 
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>>
>>          at 
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>>
>>          at 
>> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>>
>>          at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be 
>> cast to java.io.InputStream
>>
>>          at 
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>>
>>          at 
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>>
>>          ... 9 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Full Import failed:java.lang.RuntimeException:
>> java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
>> java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>>
>>          at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
>> java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>>
>>          ... 4 more
>>
>> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
>> java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>>
>>          ... 6 more
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot
>> be cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>>
>>          ... 9 more
>>
>>
>>
>>
>>
>> My data-config file looks as follows:
>>
>>
>>
>> <dataConfig>
>>
>>   <dataSource name="bin" type="BinFileDataSource" />
>>
>>   <document>
>>
>>       <entity name="files" processor="FileListEntityProcessor" baseDir="
>> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false
>> " dataSource="bin" onError="skip">
>>
>>         <field column="fileAbsolutePath" name="id" />
>>
>>
>>
>>         <entity
>>
>>          name="read_file"
>>
>>          processor="TikaEntityProcessor"
>>
>>          url="${files.fileAbsolutePath}"
>>
>>          >
>>
>>           <field column="text" name="content" />
>>
>>         </entity>
>>
>>       </entity>
>>
>>   </document>
>>
>> </dataConfig>
>>
>>
>>
>> And in the Schema I basically have two fields:
>>
>>
>>
>> <field name="Id" type="string" indexed="true" stored="true" required="
>> true" multiValued="false"/>
>>
>> <field name="text" type="text_general" indexed="true" stored="false"
>> multiValued="true"/>
>>
>>
>>
>> Any help is appreciated.
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*
>>
>>
>>
>> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder
>> du KMD’s Privatlivspolitik <http://www.kmd.dk/Privatlivspolitik>, der
>> fortæller, hvordan vi behandler oplysninger om dig.
>>
>> Protection of your personal data is important to us. Here you can read KMD’s
>> Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how we
>> process your personal data.
>>
>> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
>> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
>> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi
>> dig slette e-mailen i dit system uden at videresende eller kopiere den.
>> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri
>> for virus og andre fejl, som kan påvirke computeren eller it-systemet,
>> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi
>> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse
>> med at modtage og bruge e-mailen.
>>
>> Please note that this message may contain confidential information. If
>> you have received this message by mistake, please inform the sender of the
>> mistake by sending a reply, then delete the message from your system
>> without making, distributing or retaining any copies of it. Although we
>> believe that the message and any attachments are free from viruses and
>> other errors that might affect the computer or it-system where it is
>> received and read, the recipient opens the message at his or her own risk.
>> We assume no responsibility for any loss or damage arising from the receipt
>> or use of this message.
>>
>

Reply via email to