Greetings,

I am trying to index hashtags from twitter -- so they are tokens that start
with a # symbol and can have any number of alpha numeric characters.

Examples:
1. #jane
2. #Jane
3. #Jane!

At a high level I'd like to be able to:
1. differentiate between say #jane and #jane!
2. differentiate between a hashtag such as #jane and a regular text token
jane
3. ask for variation on #jane -- by this I mean #jane? #jane!!! #jane!?!??
are all variations of jane

I'd appreciate points to what my considerations should be when I attempt to
do the above.

Thanks,

MM.

Reply via email to