Also, just wondering, have you have tried to specify dataSource="bin" for read_file?
On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau <kamuela....@gmail.com> wrote: > Hi, > > I was unable to reproduce the error that you got with the information > provided. > Below are the data-config.xml and managed-schema fields I used; the > data-config is mostly the same > (I think that BinFileDataSource doesn't actually require a dataSource, so > I think it's safe to put dataSource="null"): > > <dataConfig> > <dataSource name="bin" type="BinFileDataSource"/> > <document> > <entity name="files" processor="FileListEntityProcessor" > baseDir="/path/to/sampleData" fileName=".*doc" recursive="true" > rootEntity="false" dataSource="bin" onError="skip"> > <field column="fileAbsolutePath" name="id"/> > <entity name="read_file" processor="TikaEntityProcessor" > url="${files.fileAbsolutePath}"> > <field column="text" name="text"/> > </entity> > </entity> > </document> > </dataConfig> > > And from the managed schema: > <field name="id" type="string" indexed="true" stored="true" > required="true" multiValued="false" /> > <!-- docValues are enabled by default for long type so we don't need > to index the version field --> > <field name="_version_" type="plong" indexed="false" stored="false"/> > <field name="_root_" type="string" indexed="true" stored="false" > docValues="false" /> > <field name="text" type="text_general" indexed="true" stored="true" > multiValued="true"/> > > When I had field column="text" name="content", the documents were still > indexed, but the text/content was not (as I had no content field in the > schema). > I used the default config, and Solr version 7.5.0; I was able to import > the data just fine (I also tested with .*DOC). Is there any other > information you can provide that can help me reproduce this error? > > > > > On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <m...@kmd.dk> > wrote: > >> Hi again, >> >> >> >> Can anybody help me? Any suggestions to why I am getting the error below? >> >> >> >> >> >> *Martin Frank Hansen*, Senior Data Analytiker >> >> Data, IM & Analytics >> >> [image: cid:image001.png@01D383C9.6C129A60] >> >> >> Lautrupparken 40-42, DK-2750 Ballerup >> E-mail m...@kmd.dk Web www.kmd.dk >> Mobil +4525571418 >> >> >> >> *Fra:* Martin Frank Hansen (MHQ) >> *Sendt:* 10. oktober 2018 10:15 >> *Til:* solr-user <solr-user@lucene.apache.org> >> *Emne:* DIH for TikaEntityProcessor >> >> >> >> Hi, >> >> >> >> I am trying to read documents from a file system into Solr, using >> dataimporthandler but keep getting the following errors: >> >> >> >> Exception while processing: files document : >> null:org.apache.solr.handler.dataimport.DataImportHandlerException: >> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to >> java.io.InputStream >> >> at >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61) >> >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) >> >> at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) >> >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) >> >> at >> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) >> >> at java.lang.Thread.run(Thread.java:748) >> >> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be >> cast to java.io.InputStream >> >> at >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132) >> >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) >> >> ... 9 more >> >> >> >> >> >> >> >> >> >> Full Import failed:java.lang.RuntimeException: >> java.lang.RuntimeException: >> org.apache.solr.handler.dataimport.DataImportHandlerException: >> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to >> java.io.InputStream >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271) >> >> at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424) >> >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) >> >> at >> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) >> >> at java.lang.Thread.run(Thread.java:748) >> >> Caused by: java.lang.RuntimeException: >> org.apache.solr.handler.dataimport.DataImportHandlerException: >> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to >> java.io.InputStream >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) >> >> ... 4 more >> >> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: >> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to >> java.io.InputStream >> >> at >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61) >> >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) >> >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) >> >> ... 6 more >> >> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot >> be cast to java.io.InputStream >> >> at >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132) >> >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) >> >> ... 9 more >> >> >> >> >> >> My data-config file looks as follows: >> >> >> >> <dataConfig> >> >> <dataSource name="bin" type="BinFileDataSource" /> >> >> <document> >> >> <entity name="files" processor="FileListEntityProcessor" baseDir=" >> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" rootEntity="false >> " dataSource="bin" onError="skip"> >> >> <field column="fileAbsolutePath" name="id" /> >> >> >> >> <entity >> >> name="read_file" >> >> processor="TikaEntityProcessor" >> >> url="${files.fileAbsolutePath}" >> >> > >> >> <field column="text" name="content" /> >> >> </entity> >> >> </entity> >> >> </document> >> >> </dataConfig> >> >> >> >> And in the Schema I basically have two fields: >> >> >> >> <field name="Id" type="string" indexed="true" stored="true" required=" >> true" multiValued="false"/> >> >> <field name="text" type="text_general" indexed="true" stored="false" >> multiValued="true"/> >> >> >> >> Any help is appreciated. >> >> >> >> >> >> *Martin Frank Hansen* >> >> >> >> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder >> du KMD’s Privatlivspolitik <http://www.kmd.dk/Privatlivspolitik>, der >> fortæller, hvordan vi behandler oplysninger om dig. >> >> Protection of your personal data is important to us. Here you can read KMD’s >> Privacy Policy <http://www.kmd.net/Privacy-Policy> outlining how we >> process your personal data. >> >> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. >> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst >> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi >> dig slette e-mailen i dit system uden at videresende eller kopiere den. >> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri >> for virus og andre fejl, som kan påvirke computeren eller it-systemet, >> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi >> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse >> med at modtage og bruge e-mailen. >> >> Please note that this message may contain confidential information. If >> you have received this message by mistake, please inform the sender of the >> mistake by sending a reply, then delete the message from your system >> without making, distributing or retaining any copies of it. Although we >> believe that the message and any attachments are free from viruses and >> other errors that might affect the computer or it-system where it is >> received and read, the recipient opens the message at his or her own risk. >> We assume no responsibility for any loss or damage arising from the receipt >> or use of this message. >> >