Re: How to get raw text of a document

Jack Krupansky Fri, 17 Aug 2012 09:45:05 -0700

You need a "response writer" that returns only text. The "wt" paramterselects the response writer. You specified "json", so that's what you got.Maybe "csv" would be closer to what you want.


-- Jack Krupansky

-----Original Message-----From: Alexander Cougarman

Sent: Friday, August 17, 2012 12:17 PM
To: solr-user@lucene.apache.org
Subject: How to get raw text of a document

Hi. I asked this on the Tika group and the recommendation was to ask ithere. I am using the following C# code to call Tika and would like it toreturn the raw text without any XML or JSON. So if the Word documentcontains "Hello World", this should return only that text and no XML oranything else to wrap it in -- just the raw text.

This code returns JSON of the XML, which in turn contains the text of thedocument. I need it to return the raw text only, no XML. Thanks.


var url = @"http://localhost:8983/solr/update/extract";;

var client = new WebClient();
client.QueryString.Add("extractOnly","true");
client.QueryString.Add("wt","json");
var data = client.UploadFile(url, "input.txt");
var json = ASCIIEncoding.ASCII.GetString(data);



Sincerely,

Alex

Re: How to get raw text of a document

Reply via email to