I also couldn't get the exact results I wanted for indexing URL components
using WordDelimeterFilter or patternTokenizer, so resorted to adding a new
field ('pathparts'), plus a few lines of code to generate the tokens in our
content preprocessor which submits documents to SOLR for indexing.
-Si
: Actually I want to use anything that is not alphabet or digit to be the
: separator - anything between them will be a word (so that I can use the URL
: fragment to see what is indexed about this site)...any suggestion?
In addition to Mike's suggestion of trying out the WordDelimiterFilter,
tak
s generation on only (no catenation), and an additiona stopwords
> like that excludes a few tokens like 'http'.
>
> -Mike
>
>
--
View this message in context:
http://www.nabble.com/Indexing-a-word-in-url-tp16397739p16411091.html
Sent from the Solr - User mailing list archive at Nabble.com.
On 31-Mar-08, at 10:50 AM, Vinci wrote:
Hi all,
I would like to ask, if I want to index word in a URL, which data
type and
parser should I use?
Depends on how you want to search it. I use WordDelimiterFilter with
parts generation on only (no catenation), and an additiona stopwords
li
Hi all,
I would like to ask, if I want to index word in a URL, which data type and
parser should I use?
Thank you,
Vinci
--
View this message in context:
http://www.nabble.com/Indexing-a-word-in-url-tp16397739p16397739.html
Sent from the Solr - User mailing list archive at Nabble.com.