dadoonet opened a new issue, #15196: URL: https://github.com/apache/lucene/issues/15196
### Description As I reported at https://github.com/elastic/elasticsearch/issues/133989, I'd love to have a way to support multiple delimiters for the Path Hierarchy Tokenizer. Currently, it only supports a single pattern for the `delimiter` parameter (default is `/`). This makes it difficult to tokenize both Windows (`\\`) and Linux (`/`) paths efficiently in the same index. Supporting multiple delimiters (such as both `/` and `\\`) would greatly improve usability for systems dealing with cross-platform file paths. For example, a user may need to index file paths from both Windows and Linux environments and expects the analysis to work seamlessly regardless of path format. At the moment, the only workaround I found is to preprocess the data to normalize delimiters, which adds extra complexity. **Feature Request**: Allow the `path_hierarchy` tokenizer to accept multiple delimiter patterns (e.g., an array of delimiters) so both `/` and `\\` can be handled simultaneously. Another possible implementation would be to create a new `PathsHierarchyTokenizer` (note the `s`) which implements this behavior. Before working a such PR, I'd like to get your views about this proposal... May be I'm just wrong trying to do so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
