Sure you can query the url directly. Or if you choose you can split it up in multiple components, e.g. using http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 25. juni 2013 kl. 14:10 skrev Flavio Pompermaier <pomperma...@okkam.it>: > Sorry but maybe I miss something here..could I declare url as key field and > query it too..? > At the moment, my schema.xml looks like: > > <fields> > <field name="url" type="string" indexed="true" stored="true" > required="true" multiValued="false" /> > > <field name="category" type="string" indexed="true" stored="true"/> > <field name="language" type="string" indexed="true" stored="true"/> > ... > <field name="_version_" type="long" indexed="true" stored="true"/> > > </fields> > <uniqueKey>url</uniqueKey> > > Is it ok? or should I add a "baseurl" field of some kind to be able to > query all url coming from a certain domain (1st or 2nd level as well)? > > Best, > Flavio > > > On Tue, Jun 25, 2013 at 12:28 PM, Jan Høydahl <jan....@cominvent.com> wrote: > >> Probably a good match for the RegExp feature of Solr (given that your url >> is not tokenized) >> e.g. q=url:/.*\.it$/ >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier <pomperma...@okkam.it>: >> >>> Hi to everybody, >>> I'm quite new to Solr so maybe my question could be trivial for you.. >>> In my use case I have to index stuff contained in some URL so i use url >> as >>> key of my document and I treat it like a string. >>> >>> However I'd like to be able to query by domain name, like *.it or *. >>> somesite.com, what's the best strategy? I tought to made a URL to path >>> transfromation and indexed using solr.PathHierarchyTokenizerFactory but >>> maybe there's a simpler solution..isn't it? >>> >>> Best, >>> Flavio >>> >>> -- >>> >>> Flavio Pompermaier >>> *Development Department >>> *_______________________________________________ >>> *OKKAM**Srl **- www.okkam.it* >>> >>> *Phone:* +(39) 0461 283 702 >>> *Fax:* + (39) 0461 186 6433 >>> *Email:* f.pomperma...@okkam.it >>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2 >>> *Registered office:* Trento (Italy), via Segantini 23 >>> >>> Confidentially notice. This e-mail transmission may contain legally >>> privileged and/or confidential information. Please do not read it if you >>> are not the intended recipient(S). Any use, distribution, reproduction or >>> disclosure by any other person is strictly prohibited. If you have >> received >>> this e-mail in error, please notify the sender and destroy the original >>> transmission and its attachments without reading or saving it in any >> manner. >> >> > > > -- > > Flavio Pompermaier > *Development Department > *_______________________________________________ > *OKKAM**Srl **- www.okkam.it* > > *Phone:* +(39) 0461 283 702 > *Fax:* + (39) 0461 186 6433 > *Email:* f.pomperma...@okkam.it > *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2 > *Registered office:* Trento (Italy), via Segantini 23 > > Confidentially notice. This e-mail transmission may contain legally > privileged and/or confidential information. Please do not read it if you > are not the intended recipient(S). Any use, distribution, reproduction or > disclosure by any other person is strictly prohibited. If you have received > this e-mail in error, please notify the sender and destroy the original > transmission and its attachments without reading or saving it in any manner.