Glad to help :) 2018年10月12日(金) 21:10 Martin Frank Hansen (MHQ) <m...@kmd.dk>:
> You sir just made my day!!! > > It worked!!! Thanks a million! > > > Martin Frank Hansen, > > -----Oprindelig meddelelse----- > Fra: Kamuela Lau <kamuela....@gmail.com> > Sendt: 12. oktober 2018 11:41 > Til: solr-user@lucene.apache.org > Emne: Re: DIH for TikaEntityProcessor > > Also, just wondering, have you have tried to specify dataSource="bin" for > read_file? > > On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau <kamuela....@gmail.com> wrote: > > > Hi, > > > > I was unable to reproduce the error that you got with the information > > provided. > > Below are the data-config.xml and managed-schema fields I used; the > > data-config is mostly the same (I think that BinFileDataSource doesn't > > actually require a dataSource, so I think it's safe to put > > dataSource="null"): > > > > <dataConfig> > > <dataSource name="bin" type="BinFileDataSource"/> > > <document> > > <entity name="files" processor="FileListEntityProcessor" > > baseDir="/path/to/sampleData" fileName=".*doc" recursive="true" > > rootEntity="false" dataSource="bin" onError="skip"> > > <field column="fileAbsolutePath" name="id"/> > > <entity name="read_file" processor="TikaEntityProcessor" > > url="${files.fileAbsolutePath}"> > > <field column="text" name="text"/> > > </entity> > > </entity> > > </document> > > </dataConfig> > > > > And from the managed schema: > > <field name="id" type="string" indexed="true" stored="true" > > required="true" multiValued="false" /> > > <!-- docValues are enabled by default for long type so we don't > > need to index the version field --> > > <field name="_version_" type="plong" indexed="false" stored="false"/> > > <field name="_root_" type="string" indexed="true" stored="false" > > docValues="false" /> > > <field name="text" type="text_general" indexed="true" stored="true" > > multiValued="true"/> > > > > When I had field column="text" name="content", the documents were > > still indexed, but the text/content was not (as I had no content field > > in the schema). > > I used the default config, and Solr version 7.5.0; I was able to > > import the data just fine (I also tested with .*DOC). Is there any > > other information you can provide that can help me reproduce this error? > > > > > > > > > > On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) <m...@kmd.dk> > > wrote: > > > >> Hi again, > >> > >> > >> > >> Can anybody help me? Any suggestions to why I am getting the error > below? > >> > >> > >> > >> > >> > >> *Martin Frank Hansen*, Senior Data Analytiker > >> > >> Data, IM & Analytics > >> > >> [image: cid:image001.png@01D383C9.6C129A60] > >> > >> > >> Lautrupparken 40-42, DK-2750 Ballerup E-mail m...@kmd.dk Web > >> www.kmd.dk Mobil +4525571418 > >> > >> > >> > >> *Fra:* Martin Frank Hansen (MHQ) > >> *Sendt:* 10. oktober 2018 10:15 > >> *Til:* solr-user <solr-user@lucene.apache.org> > >> *Emne:* DIH for TikaEntityProcessor > >> > >> > >> > >> Hi, > >> > >> > >> > >> I am trying to read documents from a file system into Solr, using > >> dataimporthandler but keep getting the following errors: > >> > >> > >> > >> Exception while processing: files document : > >> null:org.apache.solr.handler.dataimport.DataImportHandlerException: > >> java.lang.ClassCastException: java.io.InputStreamReader cannot be > >> cast to java.io.InputStream > >> > >> at > >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd > >> Throw(DataImportHandlerException.java:61) > >> > >> at > >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent > >> ityProcessorWrapper.java:270) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde > >> r.java:476) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde > >> r.java:517) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde > >> r.java:415) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j > >> ava:330) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java > >> :233) > >> > >> at > >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo > >> rter.java:424) > >> > >> at > >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j > >> ava:483) > >> > >> at > >> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Dat > >> aImporter.java:466) > >> > >> at java.lang.Thread.run(Thread.java:748) > >> > >> Caused by: java.lang.ClassCastException: java.io.InputStreamReader > >> cannot be cast to java.io.InputStream > >> > >> at > >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn > >> tityProcessor.java:132) > >> > >> at > >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent > >> ityProcessorWrapper.java:267) > >> > >> ... 9 more > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> Full Import failed:java.lang.RuntimeException: > >> java.lang.RuntimeException: > >> org.apache.solr.handler.dataimport.DataImportHandlerException: > >> java.lang.ClassCastException: java.io.InputStreamReader cannot be > >> cast to java.io.InputStream > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java > >> :271) > >> > >> at > >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo > >> rter.java:424) > >> > >> at > >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j > >> ava:483) > >> > >> at > >> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(Dat > >> aImporter.java:466) > >> > >> at java.lang.Thread.run(Thread.java:748) > >> > >> Caused by: java.lang.RuntimeException: > >> org.apache.solr.handler.dataimport.DataImportHandlerException: > >> java.lang.ClassCastException: java.io.InputStreamReader cannot be > >> cast to java.io.InputStream > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde > >> r.java:417) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j > >> ava:330) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java > >> :233) > >> > >> ... 4 more > >> > >> Caused by: > org.apache.solr.handler.dataimport.DataImportHandlerException: > >> java.lang.ClassCastException: java.io.InputStreamReader cannot be > >> cast to java.io.InputStream > >> > >> at > >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd > >> Throw(DataImportHandlerException.java:61) > >> > >> at > >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent > >> ityProcessorWrapper.java:270) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde > >> r.java:476) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde > >> r.java:517) > >> > >> at > >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde > >> r.java:415) > >> > >> ... 6 more > >> > >> Caused by: java.lang.ClassCastException: java.io.InputStreamReader > >> cannot be cast to java.io.InputStream > >> > >> at > >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEn > >> tityProcessor.java:132) > >> > >> at > >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent > >> ityProcessorWrapper.java:267) > >> > >> ... 9 more > >> > >> > >> > >> > >> > >> My data-config file looks as follows: > >> > >> > >> > >> <dataConfig> > >> > >> <dataSource name="bin" type="BinFileDataSource" /> > >> > >> <document> > >> > >> <entity name="files" processor="FileListEntityProcessor" baseDir=" > >> D:/CAPTIA/docs/19107" fileName=".*DOC" recursive="true" > >> rootEntity="false " dataSource="bin" onError="skip"> > >> > >> <field column="fileAbsolutePath" name="id" /> > >> > >> > >> > >> <entity > >> > >> name="read_file" > >> > >> processor="TikaEntityProcessor" > >> > >> url="${files.fileAbsolutePath}" > >> > >> > > >> > >> <field column="text" name="content" /> > >> > >> </entity> > >> > >> </entity> > >> > >> </document> > >> > >> </dataConfig> > >> > >> > >> > >> And in the Schema I basically have two fields: > >> > >> > >> > >> <field name="Id" type="string" indexed="true" stored="true" required=" > >> true" multiValued="false"/> > >> > >> <field name="text" type="text_general" indexed="true" stored="false" > >> multiValued="true"/> > >> > >> > >> > >> Any help is appreciated. > >> > >> > >> > >> > >> > >> *Martin Frank Hansen* > >> > >> > >> > >> Beskyttelse af dine personlige oplysninger er vigtig for os. Her > >> finder du KMD’s Privatlivspolitik > >> <http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi > behandler oplysninger om dig. > >> > >> Protection of your personal data is important to us. Here you can > >> read KMD’s Privacy Policy <http://www.kmd.net/Privacy-Policy> > >> outlining how we process your personal data. > >> > >> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig > information. > >> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst > >> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig > >> beder vi dig slette e-mailen i dit system uden at videresende eller > kopiere den. > >> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning > >> er fri for virus og andre fejl, som kan påvirke computeren eller > >> it-systemet, hvori den modtages og læses, åbnes den på modtagerens > >> eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er > >> opstået i forbindelse med at modtage og bruge e-mailen. > >> > >> Please note that this message may contain confidential information. > >> If you have received this message by mistake, please inform the > >> sender of the mistake by sending a reply, then delete the message > >> from your system without making, distributing or retaining any copies > >> of it. Although we believe that the message and any attachments are > >> free from viruses and other errors that might affect the computer or > >> it-system where it is received and read, the recipient opens the > message at his or her own risk. > >> We assume no responsibility for any loss or damage arising from the > >> receipt or use of this message. > >> > > >