Hi Kamuela,

Thanks for your answer.

I still get the same error, so I think I will try with the tech-products 
example to see if it works there as Alexendre suggest in the mail above.

Martin Frank Hansen,

-----Oprindelig meddelelse-----
Fra: Kamuela Lau <kamuela....@gmail.com>
Sendt: 12. oktober 2018 11:38
Til: solr-user@lucene.apache.org
Emne: Re: DIH for TikaEntityProcessor

Hi,

I was unable to reproduce the error that you got with the information provided.
Below are the data-config.xml and managed-schema fields I used; the data-config 
is mostly the same (I think that BinFileDataSource doesn't actually require a 
dataSource, so I think it's safe to put dataSource="null"):

<dataConfig>
  <dataSource name="bin" type="BinFileDataSource"/>
  <document>
      <entity name="files" processor="FileListEntityProcessor"
baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
rootEntity="false" dataSource="bin" onError="skip">
        <field column="fileAbsolutePath" name="id"/>
        <entity name="read_file" processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}">
          <field column="text" name="text"/>
        </entity>
      </entity>
  </document>
</dataConfig>

And from the managed schema:
    <field name="id" type="string" indexed="true" stored="true"
required="true" multiValued="false" />
    <!-- docValues are enabled by default for long type so we don't need to 
index the version field  -->
    <field name="_version_" type="plong" indexed="false" stored="false"/>
    <field name="_root_" type="string" indexed="true" stored="false"
docValues="false" />
    <field name="text" type="text_general" indexed="true" stored="true"
multiValued="true"/>

When I had field column="text" name="content", the documents were still 
indexed, but the text/content was not (as I had no content field in the schema).
I used the default config, and Solr version 7.5.0; I was able to import the 
data just fine (I also tested with .*DOC). Is there any other information you 
can provide that can help me reproduce this error?




On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <m...@kmd.dk>
wrote:

> Hi again,
>
>
>
> Can anybody help me? Any suggestions to why I am getting the error below?
>
>
>
>
>
> *Martin Frank Hansen*, Senior Data Analytiker
>
> Data, IM & Analytics
>
> [image: cid:image001.png@01D383C9.6C129A60]
>
>
> Lautrupparken 40-42, DK-2750 Ballerup
> E-mail m...@kmd.dk  Web www.kmd.dk
> Mobil +4525571418
>
>
>
> *Fra:* Martin Frank Hansen (MHQ)
> *Sendt:* 10. oktober 2018 10:15
> *Til:* solr-user <solr-user@lucene.apache.org>
> *Emne:* DIH for TikaEntityProcessor
>
>
>
> Hi,
>
>
>
> I am trying to read documents from a file system into Solr, using
> dataimporthandler but keep getting the following errors:
>
>
>
> Exception while processing: files document :
> null:org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
> to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
> hrow(DataImportHandlerException.java:61)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:270)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:476)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:517)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:415)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.ja
> va:330)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:
> 233)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpor
> ter.java:424)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.ja
> va:483)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Data
> Importer.java:466)
>
>          at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
> cannot be cast to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEnt
> ityProcessor.java:132)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:267)
>
>          ... 9 more
>
>
>
>
>
>
>
>
>
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
> to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:
> 271)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpor
> ter.java:424)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.ja
> va:483)
>
>          at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Data
> Importer.java:466)
>
>          at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
> to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:417)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.ja
> va:330)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:
> 233)
>
>          ... 4 more
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast
> to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
> hrow(DataImportHandlerException.java:61)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:270)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:476)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:517)
>
>          at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder
> .java:415)
>
>          ... 6 more
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
> cannot be cast to java.io.InputStream
>
>          at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEnt
> ityProcessor.java:132)
>
>          at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti
> tyProcessorWrapper.java:267)
>
>          ... 9 more
>
>
>
>
>
> My data-config file looks as follows:
>
>
>
> <dataConfig>
>
>   <dataSource name="bin" type="BinFileDataSource" />
>
>   <document>
>
>       <entity name="files" processor="FileListEntityProcessor" baseDir="
> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false"
> dataSource="bin" onError="skip">
>
>         <field column="fileAbsolutePath" name="id" />
>
>
>
>         <entity
>
>          name="read_file"
>
>          processor="TikaEntityProcessor"
>
>          url="${files.fileAbsolutePath}"
>
>          >
>
>           <field column="text" name="content" />
>
>         </entity>
>
>       </entity>
>
>   </document>
>
> </dataConfig>
>
>
>
> And in the Schema I basically have two fields:
>
>
>
> <field name="Id" type="string" indexed="true" stored="true"
> required="true " multiValued="false"/>
>
> <field name="text" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
>
>
> Any help is appreciated.
>
>
>
>
>
> *Martin Frank Hansen*
>
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> finder du KMD’s Privatlivspolitik
> <http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler 
> oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read
> KMD’s Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how
> we process your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
> beder vi dig slette e-mailen i dit system uden at videresende eller kopiere 
> den.
> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
> er fri for virus og andre fejl, som kan påvirke computeren eller
> it-systemet, hvori den modtages og læses, åbnes den på modtagerens
> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er
> opstået i forbindelse med at modtage og bruge e-mailen.
>
> Please note that this message may contain confidential information. If
> you have received this message by mistake, please inform the sender of
> the mistake by sending a reply, then delete the message from your
> system without making, distributing or retaining any copies of it.
> Although we believe that the message and any attachments are free from
> viruses and other errors that might affect the computer or it-system
> where it is received and read, the recipient opens the message at his or her 
> own risk.
> We assume no responsibility for any loss or damage arising from the
> receipt or use of this message.
>

Reply via email to