Re: Solr Newbie question: doubts about how to handle html content

2006-10-05 Thread Marcio Pinto Motta
On 10/5/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 10/5/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: > On Oct 5, 2006, at 7:17 AM, Marcio Pinto Motta wrote: > >

A Brasil Telecom ... > > > > the html code was "changed". > > It wasn't "changed" per se... but rather it was encoded. If


Re: Solr Newbie question: doubts about how to handle html content

2006-10-05 Thread Yonik Seeley
On 10/5/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Oct 5, 2006, at 7:17 AM, Marcio Pinto Motta wrote: >

A Brasil Telecom ... > > the html code was "changed". It wasn't "changed" per se... but rather it was encoded. If you use an XML API to read the response you would not see these


Re: Solr Newbie question: doubts about how to handle html content

2006-10-05 Thread Panayiotis Papadopoulos
I think is not the best approach you can have... And there is no need to index code since there are no results of any use... Personally i would index the pure text and keep in a database the code plus an id so my db would like let 's say id text text+code so i would send to lucene id + text

Re: Solr Newbie question: doubts about how to handle html content

2006-10-05 Thread Erik Hatcher
On Oct 5, 2006, at 7:17 AM, Marcio Pinto Motta wrote: My "current" problem is to know the best approach to handle content which have html code. I have some docs that may or may not have html tag. My first attempt, I defined a field "text" in my schema.xml : A Brasil Telecom … ]]

Solr Newbie question: doubts about how to handle html content

2006-10-05 Thread Marcio Pinto Motta
Solr Newbie question: doubts about html content My "current" problem is to know the best approach to handle content which have html code. I have some docs that may or may not have html tag. My first attempt, I defined a field "text" in my schema.xml : A Brasil Telecom … ]]> But