You sir just made my day!!!

It worked!!! Thanks a million!


Martin Frank Hansen,

-----Oprindelig meddelelse-----
Fra: Kamuela Lau <kamuela....@gmail.com>
Sendt: 12. oktober 2018 11:41
Til: solr-user@lucene.apache.org
Emne: Re: DIH for TikaEntityProcessor

Also, just wondering, have you have tried to specify dataSource="bin" for 
read_file?

On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau <kamuela....@gmail.com> wrote:

> Hi,
>
> I was unable to reproduce the error that you got with the information
> provided.
> Below are the data-config.xml and managed-schema fields I used; the
> data-config is mostly the same (I think that BinFileDataSource doesn't
> actually require a dataSource, so I think it's safe to put
> dataSource="null"):
>
> <dataConfig>
>   <dataSource name="bin" type="BinFileDataSource"/>
>   <document>
>       <entity name="files" processor="FileListEntityProcessor"
> baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
> rootEntity="false" dataSource="bin" onError="skip">
>         <field column="fileAbsolutePath" name="id"/>
>         <entity name="read_file" processor="TikaEntityProcessor"
> url="${files.fileAbsolutePath}">
>           <field column="text" name="text"/>
>         </entity>
>       </entity>
>   </document>
> </dataConfig>
>
> And from the managed schema:
>     <field name="id" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>     <!-- docValues are enabled by default for long type so we don't
> need to index the version field  -->
>     <field name="_version_" type="plong" indexed="false" stored="false"/>
>     <field name="_root_" type="string" indexed="true" stored="false"
> docValues="false" />
>     <field name="text" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>
> When I had field column="text" name="content", the documents were
> still indexed, but the text/content was not (as I had no content field
> in the schema).
> I used the default config, and Solr version 7.5.0; I was able to
> import the data just fine (I also tested with .*DOC). Is there any
> other information you can provide that can help me reproduce this error?
>
>
>
>
> On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <m...@kmd.dk>
> wrote:
>
>> Hi again,
>>
>>
>>
>> Can anybody help me? Any suggestions to why I am getting the error below?
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*, Senior Data Analytiker
>>
>> Data, IM & Analytics
>>
>> [image: cid:image001.png@01D383C9.6C129A60]
>>
>>
>> Lautrupparken 40-42, DK-2750 Ballerup E-mail m...@kmd.dk  Web
>> www.kmd.dk Mobil +4525571418
>>
>>
>>
>> *Fra:* Martin Frank Hansen (MHQ)
>> *Sendt:* 10. oktober 2018 10:15
>> *Til:* solr-user <solr-user@lucene.apache.org>
>> *Emne:* DIH for TikaEntityProcessor
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am trying to read documents from a file system into Solr, using
>> dataimporthandler but keep getting the following errors:
>>
>>
>>
>> Exception while processing: files document :
>> null:org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be
>> cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
>> Throw(DataImportHandlerException.java:61)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:270)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:476)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:517)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:415)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>> ava:330)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :233)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>> rter.java:424)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>> ava:483)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Dat
>> aImporter.java:466)
>>
>>          at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
>> cannot be cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
>> tityProcessor.java:132)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:267)
>>
>>          ... 9 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Full Import failed:java.lang.RuntimeException:
>> java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be
>> cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :271)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
>> rter.java:424)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
>> ava:483)
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Dat
>> aImporter.java:466)
>>
>>          at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be
>> cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:417)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
>> ava:330)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
>> :233)
>>
>>          ... 4 more
>>
>> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be
>> cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
>> Throw(DataImportHandlerException.java:61)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:270)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:476)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:517)
>>
>>          at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
>> r.java:415)
>>
>>          ... 6 more
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader
>> cannot be cast to java.io.InputStream
>>
>>          at
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn
>> tityProcessor.java:132)
>>
>>          at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
>> ityProcessorWrapper.java:267)
>>
>>          ... 9 more
>>
>>
>>
>>
>>
>> My data-config file looks as follows:
>>
>>
>>
>> <dataConfig>
>>
>>   <dataSource name="bin" type="BinFileDataSource" />
>>
>>   <document>
>>
>>       <entity name="files" processor="FileListEntityProcessor" baseDir="
>> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true"
>> rootEntity="false " dataSource="bin" onError="skip">
>>
>>         <field column="fileAbsolutePath" name="id" />
>>
>>
>>
>>         <entity
>>
>>          name="read_file"
>>
>>          processor="TikaEntityProcessor"
>>
>>          url="${files.fileAbsolutePath}"
>>
>>          >
>>
>>           <field column="text" name="content" />
>>
>>         </entity>
>>
>>       </entity>
>>
>>   </document>
>>
>> </dataConfig>
>>
>>
>>
>> And in the Schema I basically have two fields:
>>
>>
>>
>> <field name="Id" type="string" indexed="true" stored="true" required="
>> true" multiValued="false"/>
>>
>> <field name="text" type="text_general" indexed="true" stored="false"
>> multiValued="true"/>
>>
>>
>>
>> Any help is appreciated.
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*
>>
>>
>>
>> Beskyttelse af dine personlige oplysninger er vigtig for os. Her
>> finder du KMD’s Privatlivspolitik
>> <http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler 
>> oplysninger om dig.
>>
>> Protection of your personal data is important to us. Here you can
>> read KMD’s Privacy Policy <http://www.kmd.net/Privacy-Policy>
>> outlining how we process your personal data.
>>
>> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
>> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
>> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
>> beder vi dig slette e-mailen i dit system uden at videresende eller kopiere 
>> den.
>> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
>> er fri for virus og andre fejl, som kan påvirke computeren eller
>> it-systemet, hvori den modtages og læses, åbnes den på modtagerens
>> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er
>> opstået i forbindelse med at modtage og bruge e-mailen.
>>
>> Please note that this message may contain confidential information.
>> If you have received this message by mistake, please inform the
>> sender of the mistake by sending a reply, then delete the message
>> from your system without making, distributing or retaining any copies
>> of it. Although we believe that the message and any attachments are
>> free from viruses and other errors that might affect the computer or
>> it-system where it is received and read, the recipient opens the message at 
>> his or her own risk.
>> We assume no responsibility for any loss or damage arising from the
>> receipt or use of this message.
>>
>

Reply via email to