Re: URL search and indexing

2013-06-28 Thread Upayavira
q=itemid:5000 gave me the same score for those docs, while > > I > > > was expecting different term frequencies between the first and the > > second. > > > In fact, using java to upload documents lead to correct results (3 > > > occurrences of item 1000

Re: URL search and indexing

2013-06-28 Thread Flavio Pompermaier
st and the > second. > > In fact, using java to upload documents lead to correct results (3 > > occurrences of item 1000 in the first doc and 1 in the second), e.g.: > > document1.addField("itemid", "1000"); > > document1.addField("itemid",

Re: URL search and indexing

2013-06-27 Thread Erick Erickson
ght or am I missing something else? > > > On Wed, Jun 26, 2013 at 5:18 PM, Jack Krupansky >wrote: > > > If there is a bug... we should identify it. What's a sample post command > > that you issued? > > > > > > -- Jack Krupansky > > > >

Re: URL search and indexing

2013-06-26 Thread Flavio Pompermaier
gt; >> >> lucene.apache.org >> >> apache.org >> >> apache >> >> .org >> >> org >> >> >> >> And then the user could query by any of those partial domain names. >> >> >> >> But, if you simply tokenize the URL (copy th

Re: URL search and indexing

2013-06-26 Thread Erick Erickson
t: Wednesday, June 26, 2013 10:53 AM > > To: solr-user@lucene.apache.org > Subject: Re: URL search and indexing > > I was doing exactly that and, thanks to the administration page and > explanation/debugging, I checked if results were those expected. > Unfortunately, results were

Re: URL search and indexing

2013-06-26 Thread Jack Krupansky
If there is a bug... we should identify it. What's a sample post command that you issued? -- Jack Krupansky -Original Message- From: Flavio Pompermaier Sent: Wednesday, June 26, 2013 10:53 AM To: solr-user@lucene.apache.org Subject: Re: URL search and indexing I was doing ex

Re: URL search and indexing

2013-06-26 Thread Flavio Pompermaier
by a URL > fragment, > >> such as "apache.org", ".org", "lucene.apache.org", etc. and the > >> tokenization will strip out the punctuation. > >> > >> I'll add this script to my list of examples to add in the next rev of my > &

Re: URL search and indexing

2013-06-26 Thread Erick Erickson
RL fragment, >> such as "apache.org", ".org", "lucene.apache.org", etc. and the >> tokenization will strip out the punctuation. >> >> I'll add this script to my list of examples to add in the next rev of my >> book. >> >> >> -- Ja

Re: URL search and indexing

2013-06-26 Thread Flavio Pompermaier
Jack Krupansky > > -----Original Message- From: Flavio Pompermaier > Sent: Tuesday, June 25, 2013 10:06 AM > > To: solr-user@lucene.apache.org > Subject: Re: URL search and indexing > > I bought the book and looking at the example I still don't understand if it > p

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky
book. -- Jack Krupansky -Original Message- From: Flavio Pompermaier Sent: Tuesday, June 25, 2013 10:06 AM To: solr-user@lucene.apache.org Subject: Re: URL search and indexing I bought the book and looking at the example I still don't understand if it possible query all sub-urls of

Re: URL search and indexing

2013-06-25 Thread Erik Hatcher
search. >> >> One technique is to copy the URL to a tokenized text field. Then, users >> can search for names and sub-sequences that occur in the URL without the >> need for wildcards or regular expressions. >> >> -- Jack Krupansky >> >>

Re: URL search and indexing

2013-06-25 Thread Flavio Pompermaier
users > can search for names and sub-sequences that occur in the URL without the > need for wildcards or regular expressions. > > -- Jack Krupansky > > -Original Message- From: Jan Høydahl > Sent: Tuesday, June 25, 2013 6:28 AM > > To: solr-user@lucene.apache.or

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky
-sequences that occur in the URL without the need for wildcards or regular expressions. -- Jack Krupansky -Original Message- From: Jan Høydahl Sent: Tuesday, June 25, 2013 6:28 AM To: solr-user@lucene.apache.org Subject: Re: URL search and indexing Probably a good match for the RegExp

Re: URL search and indexing

2013-06-25 Thread Flavio Pompermaier
-1/ebook/product-21079719.html> > > But... I still think you should use a tokenized text field as well - use > all three: raw string, tokenized text, and URL classification fields. > > -- Jack Krupansky > > -Original Message----- From: Flavio Pompermaier > Sent: Tuesda

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky
Krupansky -Original Message- From: Flavio Pompermaier Sent: Tuesday, June 25, 2013 9:02 AM To: solr-user@lucene.apache.org Subject: Re: URL search and indexing That's sound exactly what I'm looking for! However I cannot find an example of how to use it..could you help me please

Re: URL search and indexing

2013-06-25 Thread Flavio Pompermaier
That's sound exactly what I'm looking for! However I cannot find an example of how to use it..could you help me please? Moreover, about id field, isn't true that id field shouldn't be analyzed as suggested in http://wiki.apache.org/solr/UniqueKey#Text_field_in_the_document? On Tue, Jun 25, 2013 a

Re: URL search and indexing

2013-06-25 Thread Jan Høydahl
Sure you can query the url directly. Or if you choose you can split it up in multiple components, e.g. using http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 25. ju

Re: URL search and indexing

2013-06-25 Thread Flavio Pompermaier
Sorry but maybe I miss something here..could I declare url as key field and query it too..? At the moment, my schema.xml looks like: ... url Is it ok? or should I add a "baseurl" field of some kind to be able to query all url coming from a certain domain (1st or 2nd leve

Re: URL search and indexing

2013-06-25 Thread Jan Høydahl
Probably a good match for the RegExp feature of Solr (given that your url is not tokenized) e.g. q=url:/.*\.it$/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier : > Hi to everybody, > I'm quite new to Solr so maybe my q