Hi again, Never mind, I got manage to get the content of the msg-files as well using the following link as inspiration: https://wiki.apache.org/tika/RecursiveMetadata
But thanks again for all your help! -----Original Message----- From: Martin Frank Hansen (MHQ) <m...@kmd.dk> Sent: 26. oktober 2018 10:14 To: solr-user@lucene.apache.org Subject: RE: Reading data using Tika to Solr Hi Tim, It is msg files and I added tika-app-1.14.jar to the build path - and now it works 😊 But how do I get it to read the attachments as well? -----Original Message----- From: Tim Allison <talli...@apache.org> Sent: 25. oktober 2018 21:57 To: solr-user@lucene.apache.org Subject: Re: Reading data using Tika to Solr If you’re processing actual msg (not eml), you’ll also need poi and poi-scratchpad and their dependencies, but then those msgs could have attachments, at which point, you may as just add tika-app. :D On Thu, Oct 25, 2018 at 2:46 PM Martin Frank Hansen (MHQ) <m...@kmd.dk> wrote: > Hi Erick and Tim, > > Thanks for your answers, I can see that my mail got messed up on the > way through the server. It looked much more readable at my end 😉 The > attachment simply included my build-path. > > @Erick I am compiling the program using Netbeans at the moment. > > I updated to tika-1.7 but that did not help, and I haven't tried maven > yet but will probably have to give that a chance. I just find it a bit > odd that I can see the dependencies are included in the jar files I > added to the project, but I must be missing something? > > My buildpath looks as follows: > > Tika-parsers-1.4.jar > Tika-core-1.4.jar > Commons-io-2.5.jar > Httpclient-4.5.3 > Httpcore-4.4.6.jar > Httpmime-4.5.3.jar > Slf4j-api1-7-24.jar > Jcl-over--slf4j-1.7.24.jar > Solr-cell-7.5.0.jar > Solr-core-7.5.0.jar > Solr-solrj-7.5.0.jar > Noggit-0.8.jar > > > > -----Original Message----- > From: Tim Allison <talli...@apache.org> > Sent: 25. oktober 2018 20:21 > To: solr-user@lucene.apache.org > Subject: Re: Reading data using Tika to Solr > > To follow up w Erick’s point, there are a bunch of transitive > dependencies from tika-parsers. If you aren’t using maven or similar > build system to grab the dependencies, it can be tricky to get it > right. If you aren’t using maven, and you can afford the risks of jar > hell, consider using tika-app or, better perhaps, tika-server. > > Stay tuned for SOLR-11721... > > On Thu, Oct 25, 2018 at 1:08 PM Erick Erickson > <erickerick...@gmail.com> > wrote: > > > Martin: > > > > The mail server is pretty aggressive about stripping attachments, > > your png didn't come though. You might also get a more informed > > answer on the Tika mailing list. > > > > That said (and remember I can't see your png so this may be a silly > > question), how are you executing the program .vs. compiling it? You > > mentioned the "build path". I'm usually lazy and just execute it in > > IntelliJ for development and have forgotten to set my classpath on > > _numerous_ occasions when running it from a command line ;) > > > > Best, > > Erick > > > > On Thu, Oct 25, 2018 at 2:55 AM Martin Frank Hansen (MHQ) > > <m...@kmd.dk> > > wrote: > > > > > > Hi, > > > > > > > > > > > > I am trying to read content of msg-files using Tika and index > > > these in > > Solr, however I am having some problems with the OfficeParser(). I > > keep getting the error java.lang.NoClassDefFoundError for the > > OfficeParcer, even though both tika-core and tika-parsers are > > included > in the build path. > > > > > > > > > > > > > > > > > > I am using Java with the following code: > > > > > > > > > > > > > > > > > > public static void main(final String[] args) throws > > IOException,SAXException, TikaException { > > > > > > > > > > > > processDocument(pathtofile) > > > > > > > > > > > > } > > > > > > > > > > > > private static void > > > processDocument(String > > pathfilename) { > > > > > > > > > > > > > > > > > > try { > > > > > > > > > > > > File file > > > = new > > File(pathfilename); > > > > > > > > > > > > Metadata > > > meta = > > new Metadata(); > > > > > > > > > > > > > > > InputStream > > input = TikaInputStream.get(file); > > > > > > > > > > > > > > BodyContentHandler handler = new BodyContentHandler(); > > > > > > > > > > > > Parser > > > parser = > > new OfficeParser(); > > > > > > > > > ParseContext > > context = new ParseContext(); > > > > > > > > parser.parse(input, handler, meta, context); > > > > > > > > > > > > String > > doccontent = handler.toString(); > > > > > > > > > > > > > > > > > > > > System.out.println(doccontent); > > > > > > > > System.out.println(meta); > > > > > > > > > > > > } > > > > > > } > > > > > > In the buildpath I have the following dependencies: > > > > > > > > > > > > > > > > > > Any help is appreciate. > > > > > > > > > > > > Thanks in advance. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Martin Hansen > > > > > > > > > > > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her > > > finder > > du KMD’s Privatlivspolitik, der fortæller, hvordan vi behandler > > oplysninger om dig. > > > > > > Protection of your personal data is important to us. Here you can > > > read > > KMD’s Privacy Policy outlining how we process your personal data. > > > > > > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig > > information. Hvis du ved en fejltagelse modtager e-mailen, beder vi > > dig venligst informere afsender om fejlen ved at bruge svarfunktionen. > > Samtidig beder vi dig slette e-mailen i dit system uden at > > videresende eller kopiere den. Selvom e-mailen og ethvert vedhæftet > > bilag efter vores overbevisning er fri for virus og andre fejl, som > > kan påvirke computeren eller it-systemet, hvori den modtages og > > læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke > > noget ansvar for tab og skade, som er opstået i forbindelse med at modtage > > og bruge e-mailen. > > > > > > Please note that this message may contain confidential information. > > > If > > you have received this message by mistake, please inform the sender > > of the mistake by sending a reply, then delete the message from your > > system without making, distributing or retaining any copies of it. > > Although we believe that the message and any attachments are free > > from viruses and other errors that might affect the computer or > > it-system where it is received and read, the recipient opens the > > message at his or > her own risk. > > We assume no responsibility for any loss or damage arising from the > > receipt or use of this message. > > >