findbestopensource wrote: > > Could you tell us your schema used for indexing. In my opinion, using > standardanalyzer / Snowball analyzer will do the best. They will not break > the URLs. Add href, and other related html tags as part of stop words and > it > will removed while indexing. >
This project's still in the planning stages -- I haven't designed the pipeline yet. But you're right, maybe starting with everything and just stopping out the tag and attribute names is the most fail-safe approach. Then at least if I get something wrong I won't miss anything. Worst case scenario, I just end up with some extra terms in the index. Thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-link-targets-in-HTML-fragments-tp874547p876343.html Sent from the Solr - User mailing list archive at Nabble.com.