Re: Indexing link targets in HTML fragments

2010-06-07 Thread Andrew Clegg
findbestopensource wrote: > > Could you tell us your schema used for indexing. In my opinion, using > standardanalyzer / Snowball analyzer will do the best. They will not break > the URLs. Add href, and other related html tags as part of stop words and > it > will removed while indexing. > Thi

Re: Indexing link targets in HTML fragments

2010-06-07 Thread findbestopensource
Could you tell us your schema used for indexing. In my opinion, using standardanalyzer / Snowball analyzer will do the best. They will not break the URLs. Add href, and other related html tags as part of stop words and it will removed while indexing. Regards Aditya www.findbestopensource.com On

Re: Indexing link targets in HTML fragments

2010-06-06 Thread Andrew Clegg
Lance Norskog-2 wrote: > > The PatternReplace and HTMPStrip tokenizers might be the right bet. > The easiest way to go about this is to make a bunch of text fields > with different analysis stacks and investigate them in the Scema > Browser. You can paste an HTML document into the text box and s

Re: Indexing link targets in HTML fragments

2010-06-06 Thread Lance Norskog
The PatternReplace and HTMPStrip tokenizers might be the right bet. The easiest way to go about this is to make a bunch of text fields with different analysis stacks and investigate them in the Scema Browser. You can paste an HTML document into the text box and see exactly how the words & markup ge