Wouldn't the usage of the NeckoHTML (as an XML-parser) and XPath be safer?
I guess it all depends on the "quality" of the source document.

paul


Le 25-août-10 à 02:09, Lance Norskog a écrit :

I would do this with regular expressions. There is a Pattern Analyzer
and a Tokenizer which do regular expression-based text chopping. (I'm
not sure how to make them do what you want). A more precise tool is
the RegexTransformer in the DataImportHandler.

Lance

On Tue, Aug 24, 2010 at 7:08 AM, Andrew Cogan
<aco...@wordsearchbible.com> wrote:
I'm quite new to SOLR and wondering if the following is possible: in
addition to normal full text search, my users want to have the option to search only HTML heading innertext, i.e. content inside of <H1>, <H2>, or
<H3> tags.

Reply via email to