Re: Indexing link targets in HTML fragments

Andrew Clegg Mon, 07 Jun 2010 06:56:34 -0700


findbestopensource wrote:
> 
> Could you tell us your schema used for indexing. In my opinion, using
> standardanalyzer / Snowball analyzer will do the best. They will not break
> the URLs. Add href, and other related html tags as part of stop words and
> it
> will removed while indexing.
>


This project's still in the planning stages -- I haven't designed the
pipeline yet.

But you're right, maybe starting with everything and just stopping out the
tag and attribute names is the most fail-safe approach.

Then at least if I get something wrong I won't miss anything. Worst case
scenario, I just end up with some extra terms in the index.

Thanks,

Andrew.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-link-targets-in-HTML-fragments-tp874547p876343.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing link targets in HTML fragments

Reply via email to