Re: Query regarding URL Analysers

2014-08-28 Thread Jack Krupansky
- From: Sathyam Sent: Thursday, August 28, 2014 6:21 AM To: solr-user@lucene.apache.org Subject: Re: Query regarding URL Analysers Gentle Reminder On 21 August 2014 18:05, Sathyam wrote: Hi, I needed to generate tokens out of a URL such that I am able to get hierarchical units of the URL as

Re: Query regarding URL Analysers

2014-08-28 Thread Sathyam
Gentle Reminder On 21 August 2014 18:05, Sathyam wrote: > Hi, > > I needed to generate tokens out of a URL such that I am able to get > hierarchical units of the URL as well as each individual entity as tokens. > For example: > *Given a URL : * > > http://www.google.com/abcd/efgh/ijkl/mnop.php?

Re: Query regarding URL Analysers

2014-08-21 Thread Steve Rowe
UAX29URLEmailTokenizer recognizes URLs (among other things) - you could start with its JFlex grammar and modify it to do what you want. Steve www.lucidworks.com On Aug 21, 2014, at 8:35 AM, Sathyam wrote: > Hi, > > I needed to generate tokens out of a URL such that I am able to get > hierarc

Re: Query regarding URL Analysers

2014-08-21 Thread Aurélien MAZOYER
Hi, Maybe I am wrong but I am not that you can find such a tokenizer in solr out-of-the-box. I can suggest to have a look to PatternTokenizer and PathTokenizer. Note that you can also implement your own tokenizer and add it to Solr as a plugin. Regards, Aurélien MAZOYER Le 21/08/2014 14:35

Query regarding URL Analysers

2014-08-21 Thread Sathyam
Hi, I needed to generate tokens out of a URL such that I am able to get hierarchical units of the URL as well as each individual entity as tokens. For example: *Given a URL : * http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10&b=20&c=30#xyz The tokens that I need are : *Hierarchical subsets of