Hi Tod,

On Jun 22, 2011, at 6:00 AM, Tod wrote:

>> Mattmann, Chris A (388J <chris.a.mattmann <at> jpl.nasa.gov> writes:
>> 
>>>> 
>>>> Hi Jo,
>>>> 
>>>> You may consider checking out Tika trunk, where we recently have a Tika 
>>>> JAX-RS
>> web service [1] committed as
>>>> part of the tika-server module. You could probably wire DIH into it and
>> accomplish the same thing.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/TIKA-593
> 
> 
> Chris - could you elaborate on using Tika Jax-RS and DIH?  How 
> production ready is it?  

Sure. I know that Maxim Valyanskiy has done a bunch of work with the Tika 
Jax-RS layer. It's a simple exposing of Tika met extraction and unpackaging 
capabilities via the JSR 311 spec. So you get REST services like:

/meta 
HTTP PUTs a document to the /meta service and you get back "text/csv" of the 
metadata.

/tika

HTTP PUTs a document to the /tika service and you get back the extracted text.
HTTP GET prints a greeting stating the server is up.

/unpacker

HTTP PUTs an embedded document type to the /unpacker service and you get back a 
zip of the extracted text for each resource filename in the original PUT 
embedded document type.


> Could you summarize the steps necessary to get 
> it to work?  Any examples yet?

Basically you just build the tika-server WAR file, drop it onto a Servlet App 
Server (Tomcat, Jetty, etc.) and then you've got a Tika JAX-RS server.

> 
> I'd be happy to work with you to get something out to the group.

Awesome! I've created a Tika Wiki page here:

http://wiki.apache.org/tika/TikaJAXRS

Since this is really also Tika related, please feel free to join 
u...@tika.apache.org or d...@tika.apache.org by sending emails to:

user-subscr...@tika.apache.org
dev-subscr...@tik.apache.org

Then you can move the Tika portions of the conversation there. For the Solr/DIH 
side, this is the right list.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to