As Jan indicates, your users could perform regular expression queries on a URL string field, but maybe you should tell us more about your use case and how your users really want to search.

One technique is to copy the URL to a tokenized text field. Then, users can search for names and sub-sequences that occur in the URL without the need for wildcards or regular expressions.

-- Jack Krupansky

-----Original Message----- From: Jan Høydahl
Sent: Tuesday, June 25, 2013 6:28 AM
To: solr-user@lucene.apache.org
Subject: Re: URL search and indexing

Probably a good match for the RegExp feature of Solr (given that your url is not tokenized)
e.g. q=url:/.*\.it$/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 12:17 skrev Flavio Pompermaier <pomperma...@okkam.it>:

Hi to everybody,
I'm quite new to Solr so maybe my question could be trivial for you..
In my use case I have to index stuff contained in some URL so i use url as
key of my document and I treat it like a string.

However I'd like to be able to query by domain name, like *.it or *.
somesite.com, what's the best strategy? I tought to made a URL to path
transfromation and indexed using solr.PathHierarchyTokenizerFactory but
maybe there's a simpler solution..isn't it?

Best,
Flavio

--

Flavio Pompermaier
*Development Department
*_______________________________________________
*OKKAM**Srl **- www.okkam.it*

*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* f.pomperma...@okkam.it
*Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
*Registered office:* Trento (Italy), via Segantini 23

Confidentially notice. This e-mail transmission may contain legally
privileged and/or confidential information. Please do not read it if you
are not the intended recipient(S). Any use, distribution, reproduction or
disclosure by any other person is strictly prohibited. If you have received
this e-mail in error, please notify the sender and destroy the original
transmission and its attachments without reading or saving it in any manner.

Reply via email to