On 10/5/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 10/5/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: > On Oct 5, 2006, at 7:17 AM, Marcio Pinto Motta wrote: > > <br><p> A Brasil Telecom ... </str> > > > > the html code was "changed". > > It wasn't "changed" per se... but rather it was encoded. If you use > an XML API to read the response you would not see these encoded > characters. You can also use a different output syntax to verify that the internal form is unchanged... for example, add a wt=json to the HTTP parameters to see the results in JSON format. See HTMLStripWhitespaceTokenizerFactory if you don't want XML/HTML tags indexed. As Erik said, regardless of how you analyze a field, you can always get an un-analyzed version back when you markthe field as "stored". -Yonik
Hi folks, What I want is avoid Data Base Server as much as it possible. I don't want to allow "<>" searches, but is vital to retrieve the "text" in html content. But also, I need the content ready to be show as soon as possible. Approaches like solr.HTMLStripWhitespaceTokenizerFactory and Json in Solr are amazing, and very productive(saving a lot of code to be write). More I test, more I became amazed about it, and I don't test the replication yet (which is my main goal) J Thanks a lot for all responses (very quick J). BR, Marcio