Hi Kamuela, Thanks for your answer.
I still get the same error, so I think I will try with the tech-products example to see if it works there as Alexendre suggest in the mail above. Martin Frank Hansen, -----Oprindelig meddelelse----- Fra: Kamuela Lau <kamuela....@gmail.com> Sendt: 12. oktober 2018 11:38 Til: solr-user@lucene.apache.org Emne: Re: DIH for TikaEntityProcessor Hi, I was unable to reproduce the error that you got with the information provided. Below are the data-config.xml and managed-schema fields I used; the data-config is mostly the same (I think that BinFileDataSource doesn't actually require a dataSource, so I think it's safe to put dataSource="null"): <dataConfig> <dataSource name="bin" type="BinFileDataSource"/> <document> <entity name="files" processor="FileListEntityProcessor" baseDir="/path/to/sampleData" fileName=".*doc" recursive="true" rootEntity="false" dataSource="bin" onError="skip"> <field column="fileAbsolutePath" name="id"/> <entity name="read_file" processor="TikaEntityProcessor" url="${files.fileAbsolutePath}"> <field column="text" name="text"/> </entity> </entity> </document> </dataConfig> And from the managed schema: <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <!-- docValues are enabled by default for long type so we don't need to index the version field --> <field name="_version_" type="plong" indexed="false" stored="false"/> <field name="_root_" type="string" indexed="true" stored="false" docValues="false" /> <field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/> When I had field column="text" name="content", the documents were still indexed, but the text/content was not (as I had no content field in the schema). I used the default config, and Solr version 7.5.0; I was able to import the data just fine (I also tested with .*DOC). Is there any other information you can provide that can help me reproduce this error? On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <m...@kmd.dk> wrote: > Hi again, > > > > Can anybody help me? Any suggestions to why I am getting the error below? > > > > > > *Martin Frank Hansen*, Senior Data Analytiker > > Data, IM & Analytics > > [image: cid:image001.png@01D383C9.6C129A60] > > > Lautrupparken 40-42, DK-2750 Ballerup > E-mail m...@kmd.dk Web www.kmd.dk > Mobil +4525571418 > > > > *Fra:* Martin Frank Hansen (MHQ) > *Sendt:* 10. oktober 2018 10:15 > *Til:* solr-user <solr-user@lucene.apache.org> > *Emne:* DIH for TikaEntityProcessor > > > > Hi, > > > > I am trying to read documents from a file system into Solr, using > dataimporthandler but keep getting the following errors: > > > > Exception while processing: files document : > null:org.apache.solr.handler.dataimport.DataImportHandlerException: > java.lang.ClassCastException: java.io.InputStreamReader cannot be cast > to java.io.InputStream > > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT > hrow(DataImportHandlerException.java:61) > > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti > tyProcessorWrapper.java:270) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder > .java:476) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder > .java:517) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder > .java:415) > > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.ja > va:330) > > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java: > 233) > > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpor > ter.java:424) > > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.ja > va:483) > > at > org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Data > Importer.java:466) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.lang.ClassCastException: java.io.InputStreamReader > cannot be cast to java.io.InputStream > > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEnt > ityProcessor.java:132) > > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti > tyProcessorWrapper.java:267) > > ... 9 more > > > > > > > > > > Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.solr.handler.dataimport.DataImportHandlerException: > java.lang.ClassCastException: java.io.InputStreamReader cannot be cast > to java.io.InputStream > > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java: > 271) > > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpor > ter.java:424) > > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.ja > va:483) > > at > org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Data > Importer.java:466) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.lang.RuntimeException: > org.apache.solr.handler.dataimport.DataImportHandlerException: > java.lang.ClassCastException: java.io.InputStreamReader cannot be cast > to java.io.InputStream > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder > .java:417) > > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.ja > va:330) > > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java: > 233) > > ... 4 more > > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: > java.lang.ClassCastException: java.io.InputStreamReader cannot be cast > to java.io.InputStream > > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT > hrow(DataImportHandlerException.java:61) > > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti > tyProcessorWrapper.java:270) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder > .java:476) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder > .java:517) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder > .java:415) > > ... 6 more > > Caused by: java.lang.ClassCastException: java.io.InputStreamReader > cannot be cast to java.io.InputStream > > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEnt > ityProcessor.java:132) > > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Enti > tyProcessorWrapper.java:267) > > ... 9 more > > > > > > My data-config file looks as follows: > > > > <dataConfig> > > <dataSource name="bin" type="BinFileDataSource" /> > > <document> > > <entity name="files" processor="FileListEntityProcessor" baseDir=" > D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false" > dataSource="bin" onError="skip"> > > <field column="fileAbsolutePath" name="id" /> > > > > <entity > > name="read_file" > > processor="TikaEntityProcessor" > > url="${files.fileAbsolutePath}" > > > > > <field column="text" name="content" /> > > </entity> > > </entity> > > </document> > > </dataConfig> > > > > And in the Schema I basically have two fields: > > > > <field name="Id" type="string" indexed="true" stored="true" > required="true " multiValued="false"/> > > <field name="text" type="text_general" indexed="true" stored="false" > multiValued="true"/> > > > > Any help is appreciated. > > > > > > *Martin Frank Hansen* > > > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her > finder du KMD’s Privatlivspolitik > <http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler > oplysninger om dig. > > Protection of your personal data is important to us. Here you can read > KMD’s Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how > we process your personal data. > > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. > Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst > informere afsender om fejlen ved at bruge svarfunktionen. Samtidig > beder vi dig slette e-mailen i dit system uden at videresende eller kopiere > den. > Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning > er fri for virus og andre fejl, som kan påvirke computeren eller > it-systemet, hvori den modtages og læses, åbnes den på modtagerens > eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er > opstået i forbindelse med at modtage og bruge e-mailen. > > Please note that this message may contain confidential information. If > you have received this message by mistake, please inform the sender of > the mistake by sending a reply, then delete the message from your > system without making, distributing or retaining any copies of it. > Although we believe that the message and any attachments are free from > viruses and other errors that might affect the computer or it-system > where it is received and read, the recipient opens the message at his or her > own risk. > We assume no responsibility for any loss or damage arising from the > receipt or use of this message. >