I am not absolutely sure about what I am saying but I think after tokenization I get the URLs as single tokens but with all the "interesting symbols" :) like "/",":" removed from the token. Is it normal? Is there a chance I misconfigured something?
Best regards, Bogdan On Wed, Jan 20, 2010 at 7:03 PM, Erick Erickson <erickerick...@gmail.com>wrote: > I guess it depends on what you mean by "extract". There's > nothing that I know of that, say, stores them to a file or > separate field, or even does anything special with them. > > I think StandardTokenizerFactory tries to keep URLs > together as a token in the field, but it's just another > token... You should check though.... > > FWIW > Erick > > On Wed, Jan 20, 2010 at 9:52 AM, Bogdan Vatkov <bogdan.vat...@gmail.com > >wrote: > > > Sorry, I meant completely server-side - even more I want that at indexing > > time (I do not care about query-time as I am reading later the whole > index > > anyway). > > > > On Wed, Jan 20, 2010 at 2:40 AM, Erick Erickson <erickerick...@gmail.com > > >wrote: > > > > > Do you mean you want the URLs to be extracted on the client? > > > If so, no. Filters/analyzers reside on the server, not the client. > > > You'll have to do it with custom code.... > > > > > > Erick > > > > > > On Tue, Jan 19, 2010 at 5:48 PM, Bogdan Vatkov < > bogdan.vat...@gmail.com > > > >wrote: > > > > > > > Hi, > > > > > > > > I want to extract URLs (http://..., as well as file://... or even > > > //.....) > > > > while pushing documents into Solr. > > > > Is it possible with the Filters/Analyzers available nowadays? > > > > I looked into the doc but could not find anything related to it. > > > > > > > > Best regards, > > > > Bogdan > > > > > > > > > > > > > > > -- > > Best regards, > > Bogdan > > >