Hi Erick and Tim, Thanks for your answers, I can see that my mail got messed up on the way through the server. It looked much more readable at my end 😉 The attachment simply included my build-path.
@Erick I am compiling the program using Netbeans at the moment. I updated to tika-1.7 but that did not help, and I haven't tried maven yet but will probably have to give that a chance. I just find it a bit odd that I can see the dependencies are included in the jar files I added to the project, but I must be missing something? My buildpath looks as follows: Tika-parsers-1.4.jar Tika-core-1.4.jar Commons-io-2.5.jar Httpclient-4.5.3 Httpcore-4.4.6.jar Httpmime-4.5.3.jar Slf4j-api1-7-24.jar Jcl-over--slf4j-1.7.24.jar Solr-cell-7.5.0.jar Solr-core-7.5.0.jar Solr-solrj-7.5.0.jar Noggit-0.8.jar -----Original Message----- From: Tim Allison <talli...@apache.org> Sent: 25. oktober 2018 20:21 To: solr-user@lucene.apache.org Subject: Re: Reading data using Tika to Solr To follow up w Erick’s point, there are a bunch of transitive dependencies from tika-parsers. If you aren’t using maven or similar build system to grab the dependencies, it can be tricky to get it right. If you aren’t using maven, and you can afford the risks of jar hell, consider using tika-app or, better perhaps, tika-server. Stay tuned for SOLR-11721... On Thu, Oct 25, 2018 at 1:08 PM Erick Erickson <erickerick...@gmail.com> wrote: > Martin: > > The mail server is pretty aggressive about stripping attachments, your > png didn't come though. You might also get a more informed answer on > the Tika mailing list. > > That said (and remember I can't see your png so this may be a silly > question), how are you executing the program .vs. compiling it? You > mentioned the "build path". I'm usually lazy and just execute it in > IntelliJ for development and have forgotten to set my classpath on > _numerous_ occasions when running it from a command line ;) > > Best, > Erick > > On Thu, Oct 25, 2018 at 2:55 AM Martin Frank Hansen (MHQ) <m...@kmd.dk> > wrote: > > > > Hi, > > > > > > > > I am trying to read content of msg-files using Tika and index these > > in > Solr, however I am having some problems with the OfficeParser(). I > keep getting the error java.lang.NoClassDefFoundError for the > OfficeParcer, even though both tika-core and tika-parsers are included in the > build path. > > > > > > > > > > > > I am using Java with the following code: > > > > > > > > > > > > public static void main(final String[] args) throws > IOException,SAXException, TikaException { > > > > > > > > processDocument(pathtofile) > > > > > > > > } > > > > > > > > private static void > > processDocument(String > pathfilename) { > > > > > > > > > > > > try { > > > > > > > > File file = > > new > File(pathfilename); > > > > > > > > Metadata > > meta = > new Metadata(); > > > > > > > > InputStream > input = TikaInputStream.get(file); > > > > > > > > > BodyContentHandler handler = new BodyContentHandler(); > > > > > > > > Parser > > parser = > new OfficeParser(); > > > > > > ParseContext > context = new ParseContext(); > > > > > parser.parse(input, handler, meta, context); > > > > > > > > String > doccontent = handler.toString(); > > > > > > > > > > > > > System.out.println(doccontent); > > > > > System.out.println(meta); > > > > > > > > } > > > > } > > > > In the buildpath I have the following dependencies: > > > > > > > > > > > > Any help is appreciate. > > > > > > > > Thanks in advance. > > > > > > > > Best regards, > > > > > > > > Martin Hansen > > > > > > > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her > > finder > du KMD’s Privatlivspolitik, der fortæller, hvordan vi behandler > oplysninger om dig. > > > > Protection of your personal data is important to us. Here you can > > read > KMD’s Privacy Policy outlining how we process your personal data. > > > > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig > information. Hvis du ved en fejltagelse modtager e-mailen, beder vi > dig venligst informere afsender om fejlen ved at bruge svarfunktionen. > Samtidig beder vi dig slette e-mailen i dit system uden at videresende > eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter > vores overbevisning er fri for virus og andre fejl, som kan påvirke > computeren eller it-systemet, hvori den modtages og læses, åbnes den > på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og > skade, som er opstået i forbindelse med at modtage og bruge e-mailen. > > > > Please note that this message may contain confidential information. > > If > you have received this message by mistake, please inform the sender of > the mistake by sending a reply, then delete the message from your > system without making, distributing or retaining any copies of it. > Although we believe that the message and any attachments are free from > viruses and other errors that might affect the computer or it-system > where it is received and read, the recipient opens the message at his or her > own risk. > We assume no responsibility for any loss or damage arising from the > receipt or use of this message. >