I'm not using solr-app.jar.  I need to stick with Tika JARs that come with
Solr 5.2 and yet get the full text extraction feature of Tika (all file
types it supports).

At first, I started to include Tika JARs as needed; I now have all Tika
related JARs that come with Solr and yet it is not working.  Here is the
list: tika-core-1.7.jar, tika-java7-1.7.jar, tika-parsers-1.7.jar,
tika-xmp-1.7.jar,
vorbis-java-tika-0.6.jar, kite-morphlines-tika-core-0.12.1.jar
and kite-morphlines-tika-decompress-0.12.1.jar.  As part of my program, I
also have SolrJ JARs and their dependency: solr-solrj-5.2.1.jar,
solr-core-5.2.1.jar, etc.

You said "Might not have the parsers on your path within your Solr
framework?".  I"m using Tika outside Solr framework.  I'm trying to use
Tika from my own crawler application that uses SojrJ to send the raw text
to Solr for indexing.

What is it that I am missing?!

Steve

On Tue, Feb 2, 2016 at 3:03 PM, Allison, Timothy B. <talli...@mitre.org>
wrote:

> Might not have the parsers on your path within your Solr framework?
>
> Which tika jars are on your path?
>
> If you want the functionality of all of Tika, use the standalone
> tika-app.jar, but do not use the app in the same JVM as Solr...without a
> custom class loader.  The Solr team carefully prunes the dependencies when
> integrating Tika and makes sure that the main parsers _just work_.
>
>
> -----Original Message-----
> From: Steven White [mailto:swhite4...@gmail.com]
> Sent: Tuesday, February 02, 2016 2:53 PM
> To: solr-user@lucene.apache.org
> Subject: Using Tika that comes with Solr 5.2
>
> Hi,
>
> I'm trying to use Tika that comes with Solr 5.2.  The following code is not
> working:
>
> public static void parseWithTika() throws Exception {
>     File file = new File("C:\\temp\\test.pdf");
>
>     FileInputStream in = new FileInputStream(file);
>     AutoDetectParser parser = new AutoDetectParser();
>     Metadata metadata = new Metadata();
>     metadata.add(Metadata.RESOURCE_NAME_KEY, file.getName());
>     BodyContentHandler contentHandler = new BodyContentHandler();
>
>     parser.parse(in, contentHandler, metadata);
>
>     String content = contentHandler.toString();   <=== 'content' is always
> empty
>
>     in.close();
> }
>
> 'content' is always empty string unless when the file I pass to Tika is a
> text file.  Any idea what's the issue?
>
> I have also tried sample codes off
> https://tika.apache.org/1.8/examples.html
> with the same result.
>
>
> Thanks !!
>
> Steve
>

Reply via email to