Hi everyone, I'm indexing several documents that contain words that the StandardTokenizer cannot detect as tokens. These are words like C# .NET C++ which are important for users to be able to search for, but get treated as "C", "NET", and "C".
How can I create a list of words that should be understood to be indivisible tokens? Is my only option somehow stringing together a lot of PatternTokenizers? I'd love to do something like <tokenizer class="StandardTokenizer" tokenwhitelist=".NET C++ C#" />. Thanks in advance!