Hello there!
Currently we're having a problem in here and we're looking for some
solutions. Right now we use the Standard Tokenizer to separate tokens
and we just found out that we cannot search for "c++" in our index
because it is not considered a word.
Since we need this search to work properly (including a search for C#)
we'd like to know what are you guys doing when people search for words
that have symbols, like these programming languages. I thought there
could be a list of "protected words" in the standard tokenizer, so that
we could protect these tokens. Another possibility would be using the
Pattern Tokenizer, but it seems it is kinda slow when it comes to index
a huge amount of data, which is our case.
What do you think the best solution would be?
Best,
Leonardo
--