Hi Tod, On Jun 22, 2011, at 6:00 AM, Tod wrote:
>> Mattmann, Chris A (388J <chris.a.mattmann <at> jpl.nasa.gov> writes: >> >>>> >>>> Hi Jo, >>>> >>>> You may consider checking out Tika trunk, where we recently have a Tika >>>> JAX-RS >> web service [1] committed as >>>> part of the tika-server module. You could probably wire DIH into it and >> accomplish the same thing. >>>> >>>> Cheers, >>>> Chris >>>> >>>> [1] https://issues.apache.org/jira/browse/TIKA-593 > > > Chris - could you elaborate on using Tika Jax-RS and DIH? How > production ready is it? Sure. I know that Maxim Valyanskiy has done a bunch of work with the Tika Jax-RS layer. It's a simple exposing of Tika met extraction and unpackaging capabilities via the JSR 311 spec. So you get REST services like: /meta HTTP PUTs a document to the /meta service and you get back "text/csv" of the metadata. /tika HTTP PUTs a document to the /tika service and you get back the extracted text. HTTP GET prints a greeting stating the server is up. /unpacker HTTP PUTs an embedded document type to the /unpacker service and you get back a zip of the extracted text for each resource filename in the original PUT embedded document type. > Could you summarize the steps necessary to get > it to work? Any examples yet? Basically you just build the tika-server WAR file, drop it onto a Servlet App Server (Tomcat, Jetty, etc.) and then you've got a Tika JAX-RS server. > > I'd be happy to work with you to get something out to the group. Awesome! I've created a Tika Wiki page here: http://wiki.apache.org/tika/TikaJAXRS Since this is really also Tika related, please feel free to join u...@tika.apache.org or d...@tika.apache.org by sending emails to: user-subscr...@tika.apache.org dev-subscr...@tik.apache.org Then you can move the Tika portions of the conversation there. For the Solr/DIH side, this is the right list. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++