-
From: Sathyam
Sent: Thursday, August 28, 2014 6:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding URL Analysers
Gentle Reminder
On 21 August 2014 18:05, Sathyam wrote:
Hi,
I needed to generate tokens out of a URL such that I am able to get
hierarchical units of the URL as
Gentle Reminder
On 21 August 2014 18:05, Sathyam wrote:
> Hi,
>
> I needed to generate tokens out of a URL such that I am able to get
> hierarchical units of the URL as well as each individual entity as tokens.
> For example:
> *Given a URL : *
>
> http://www.google.com/abcd/efgh/ijkl/mnop.php?
UAX29URLEmailTokenizer recognizes URLs (among other things) - you could start
with its JFlex grammar and modify it to do what you want.
Steve
www.lucidworks.com
On Aug 21, 2014, at 8:35 AM, Sathyam wrote:
> Hi,
>
> I needed to generate tokens out of a URL such that I am able to get
> hierarc
Hi,
Maybe I am wrong but I am not that you can find such a tokenizer in solr
out-of-the-box.
I can suggest to have a look to PatternTokenizer and PathTokenizer. Note
that you can also implement your own tokenizer and add it to Solr as a
plugin.
Regards,
Aurélien MAZOYER
Le 21/08/2014 14:35
Hi,
I needed to generate tokens out of a URL such that I am able to get
hierarchical units of the URL as well as each individual entity as tokens.
For example:
*Given a URL : *
http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10&b=20&c=30#xyz
The tokens that I need are :
*Hierarchical subsets of