findbestopensource wrote:
>
> Could you tell us your schema used for indexing. In my opinion, using
> standardanalyzer / Snowball analyzer will do the best. They will not break
> the URLs. Add href, and other related html tags as part of stop words and
> it
> will removed while indexing.
>
Thi
Could you tell us your schema used for indexing. In my opinion, using
standardanalyzer / Snowball analyzer will do the best. They will not break
the URLs. Add href, and other related html tags as part of stop words and it
will removed while indexing.
Regards
Aditya
www.findbestopensource.com
On
Lance Norskog-2 wrote:
>
> The PatternReplace and HTMPStrip tokenizers might be the right bet.
> The easiest way to go about this is to make a bunch of text fields
> with different analysis stacks and investigate them in the Scema
> Browser. You can paste an HTML document into the text box and s
The PatternReplace and HTMPStrip tokenizers might be the right bet.
The easiest way to go about this is to make a bunch of text fields
with different analysis stacks and investigate them in the Scema
Browser. You can paste an HTML document into the text box and see
exactly how the words & markup ge