Sorry for the delay... take a look at the URL Classify update processor, which parses a URL and distributes the components to various fields:
http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessorFactory.html
http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html

The official doc is... pitiful, but I have doc and examples in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-----Original Message----- From: Sathyam
Sent: Thursday, August 28, 2014 6:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding URL Analysers

Gentle Reminder


On 21 August 2014 18:05, Sathyam <sathyam.dorasw...@gmail.com> wrote:

Hi,

I needed to generate tokens out of a URL such that I am able to get
hierarchical units of the URL as well as each individual entity as tokens.
For example:
*Given a URL : *

http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10&b=20&c=30#xyz

The tokens that I need are :

*Hierarchical subsets of the URL*

1 http://

2 http://www.google.com/

3 http://www.google.com/abcd/

 4 http://www.google.com/abcd/efgh/

5 http://www.google.com/abcd/efgh/ijkl/

 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php

*Individual elements in the path to the resource*

7 abcd

8 efgh

9 ijkl

10 mnop.php

*Query Terms*

11 a=10

12 b=20

13 c=30

*Fragment*
14 xyz

This comes to a total of 14 tokens for the given URL.
Basically a URL analyzer that creates tokens based on the categories
mentioned in bold. Also a separate token for port(if mentioned).

I would like to know how this can be achieved by using a single analyzer
that uses a combination of the tokenizers and filters provided by solr.
Also curious to know why there is a restriction of only *one  *tokenizer
to be used in an analyzer.
Looking forward to a response from your side telling the best possible way
to achieve the closest to what I need.

Thanks.
--
Sathyam Doraswamy






--
Sathyam Doraswamy

Reply via email to