Sounds like a possible application of solr.PatternTokenizerFactory http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokenizerFactory.html
You could use copyField to copy the entire string to a separate field (or set of fields) that are processed by patterns. JRJ -----Original Message----- From: Memory Makers [mailto:memmakers...@gmail.com] Sent: Tuesday, October 25, 2011 9:27 AM To: solr-user@lucene.apache.org Subject: Points to processing hastags Greetings, I am trying to index hashtags from twitter -- so they are tokens that start with a # symbol and can have any number of alpha numeric characters. Examples: 1. #jane 2. #Jane 3. #Jane! At a high level I'd like to be able to: 1. differentiate between say #jane and #jane! 2. differentiate between a hashtag such as #jane and a regular text token jane 3. ask for variation on #jane -- by this I mean #jane? #jane!!! #jane!?!?? are all variations of jane I'd appreciate points to what my considerations should be when I attempt to do the above. Thanks, MM.