Hello there!

Currently we're having a problem in here and we're looking for some solutions. Right now we use the Standard Tokenizer to separate tokens and we just found out that we cannot search for "c++" in our index because it is not considered a word.

Since we need this search to work properly (including a search for C#) we'd like to know what are you guys doing when people search for words that have symbols, like these programming languages. I thought there could be a list of "protected words" in the standard tokenizer, so that we could protect these tokens. Another possibility would be using the Pattern Tokenizer, but it seems it is kinda slow when it comes to index a huge amount of data, which is our case.

What do you think the best solution would be?

Best,

Leonardo

--


Reply via email to