Calling Solr requests from java code - examples?

2007-01-16 Thread maarten

Hi,

Could someone give me some code examples on how Solr requests can be  
called by Java code. I'm new to Java and I'm not very sure on how URLs  
+ params can be called from java code and how the responses can be  
captured. Or what th best practices are?


Grtz







Re: Calling Solr requests from java code - examples?

2007-01-16 Thread maarten

Thanks!

and how would you do it calling it from another web application, let's  
say from a servlet or so? I need to do some stuff in my web java code,  
then call the Solr service and do some more stuff afterwards



Quoting Bertrand Delacretaz <[EMAIL PROTECTED]>:

On 1/16/07, [EMAIL PROTECTED]   
<[EMAIL PROTECTED]> wrote:



...Could someone give me some code examples on how Solr requests can be
called by Java code...


Although our Java client landscape is still a bit fuzzy (there are
several variants floating around), you might want to look at the code
found in http://issues.apache.org/jira/browse/SOLR-20

If you're new to Java, I'd recommend playing with HttpClient first
(http://jakarta.apache.org/commons/httpclient/), see the tutorial
there for the basics.

The standard Java library classes are also usable to write HTTP
clients, but HttpClient will help a lot in getting the "details"
right, if you don't mind depending on that library.

-Bertrand






Converting Solr response back to pojo's, experiences?

2007-01-16 Thread maarten
Anyone having experience converting xml responses back to pojo's,  
which technologies have you used?


Anyone doing json <-> pojo's?

Grtz



Bug ? unique id

2007-03-16 Thread Maarten . De . Vilder
Hello,

we have been using Solr for a month now and we are running into a lot of 
trouble .

one of the issues is a problem with the unique id field.

can this field have analyzer, filters and tokenizers on it ??

because when we use filters or tokenizers on our unique id field, we get 
duplicate id's.

thanks in advance,
maarten

Re: Bug ? unique id

2007-03-16 Thread Maarten . De . Vilder
because we want to be able to search our unique id's :)
and we would like to use the Latin character filter and the Lowercase 
filter so our searches dont have to be case sensitive and stuff.

thanks for the quick response!

grts,m




Erik Hatcher <[EMAIL PROTECTED]> 
16/03/2007 12:09
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Bug ? unique id






Why in the world would you want to analyze your unique id?

 Erik


On Mar 16, 2007, at 6:07 AM, [EMAIL PROTECTED] wrote:

> Hello,
>
> we have been using Solr for a month now and we are running into a 
> lot of
> trouble .
>
> one of the issues is a problem with the unique id field.
>
> can this field have analyzer, filters and tokenizers on it ??
>
> because when we use filters or tokenizers on our unique id field, 
> we get
> duplicate id's.
>
> thanks in advance,
> maarten




Re: Bug ? unique id

2007-03-16 Thread Maarten . De . Vilder
yes, that is exactly what we are doing now ... copyfield with the filters 
... we figured that much :)

but we are talking about a couple of million records, so the less data we 
copy the better ...

but can someone please answer my question :'(
is it illegal to put filters on the unique id ?
or is it a bug that we get duplicate id's?
or is this a know issue (since everybody is using copyfields?)

thanks for all your replys !

grts,m




"Paul Borgermans" <[EMAIL PROTECTED]> 
16/03/2007 16:12
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Bug ? unique id






Hi Maarten

Why not copy your unique id into another field with the required filters 
and
use that for search?

Regards
Paul

On 3/16/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> because we want to be able to search our unique id's :)
> and we would like to use the Latin character filter and the Lowercase
> filter so our searches dont have to be case sensitive and stuff.
>
> thanks for the quick response!
>
> grts,m
>
>
>
>
> Erik Hatcher <[EMAIL PROTECTED]>
> 16/03/2007 12:09
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Bug ? unique id
>
>
>
>
>
>
> Why in the world would you want to analyze your unique id?
>
>  Erik
>

>
> On Mar 16, 2007, at 6:07 AM, [EMAIL PROTECTED] wrote:
>
> > Hello,
> >
> > we have been using Solr for a month now and we are running into a
> > lot of
> > trouble .
> >
> > one of the issues is a problem with the unique id field.
> >
> > can this field have analyzer, filters and tokenizers on it ??
> >
> > because when we use filters or tokenizers on our unique id field,
> > we get
> > duplicate id's.
> >
> > thanks in advance,
> > maarten
>
>
>



Re: Bug ? unique id

2007-03-19 Thread Maarten . De . Vilder
thanks for your reply... it kind of solved our problem !

we were in fact using Tokenizers that produce multiple tokens ... 
so i guess there is no other way for us then to use the copyfield 
workaround.

it would maybe be a good idea to have Lucene check the *stored* value for 
duplicate keys ... that seems so much more logical to me !
(imho, it makes no sense to check the *indexed* value for duplicate keys, 
but maybe there is a reason ?)
or maybe give us the option to choose wether Lucene should check the 
*stored* or *indexed* value for duplicate keys.

it is really confusing to get duplicate unique key *stored* values back 
from the server  (and kind of frustrating)

since we now use a copyfield to perform searches on the IDs, there is no 
more reason to index our unique key field 
what would happen if I set indexed=false on my unique id field ??

Maarten :-)





Chris Hostetter <[EMAIL PROTECTED]> 
16/03/2007 19:14
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Bug ? unique id







: but can someone please answer my question :'(
: is it illegal to put filters on the unique id ?
: or is it a bug that we get duplicate id's?
: or is this a know issue (since everybody is using copyfields?)

there's nothing illegal about using an Analyzer on your uniqueKey, but you
have to ensure that your Analyzer:
  1) never produces multiple tokens (ie: KeywordTokenizer is fine)
  2) never produces duplicate output for differnet (legal) input.

...so if your dataset can legally contain two different documnets
whose keys are "foo bar" and "Foo Bar" you certianly wouldn't want
to use a Whitspace or StandardTokenizer -- but you also wouldn't ever want
to use the LowerCaseFilter.

If however you really wanted to ignore all punctuation in keys when
clients upload documents to you, and trust that doc "1234-56-7890" is the
same as doc "1234567890" then something liek hte pattern striping filter
would be fine.


the thing to understnad is that it's the *indexed* value of the uniqueKey
that must be unique in order for Solr to do things properly ... it has to
be able to search on that uniqueKey term to delete/replace a doc properly.


-Hoss




Re: Bug ? unique id

2007-03-21 Thread Maarten . De . Vilder
ok, i'm starting to see the light :))

at this moment, we are running this for our uniqueID :
field :

and everything is working well ...

so i dont explicitly say indexed='true' ... i guess indexed is default 
true ...

i'll be sure do to do some testing with stored=false and indexed=false
but that'll be for next week when i start optimizing
i'll be sure to mail you the results of the testing

thanks again,m





Chris Hostetter <[EMAIL PROTECTED]> 
19/03/2007 20:30
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Bug ? unique id







: it would maybe be a good idea to have Lucene check the *stored* value 
for
: duplicate keys ... that seems so much more logical to me !
: (imho, it makes no sense to check the *indexed* value for duplicate 
keys,
: but maybe there is a reason ?)

it's probably a terminology issue ... stored fields are nothing more then
Payloads .. LUcene (and Solr) don't do anything with them but hang on to
them for you and return them to you later.

the "indexed" value is the value that matters in the "index" ... it's the
one searches/lookups and sorting can be performed on.

: or maybe give us the option to choose wether Lucene should check the
: *stored* or *indexed* value for duplicate keys.

if Solr were to try and deal with your uniqueKey using hte Stored value,
it would have to do the same copyField stuff under the coveres in order
for that Stored value to be "indexed" in a way it can see it.

: since we now use a copyfield to perform searches on the IDs, there is no
: more reason to index our unique key field 
: what would happen if I set indexed=false on my unique id field ??

it wouldn't work at all ... as i said, the indexed value is all that Solr
really cares about -- i think you could probably mark your uniqueKey as
stored=false, but if it's indexed-false then at best you'll get a nice
error telling you it must be indexed, and at worst it will crash and burn
in a non-obvious way -- possibly silently.

(if you want to try it out, and the later happens, please file abug we
should definitely have a nice error message in that case)




-Hoss




Re: How to assure a permanent index.

2007-03-21 Thread Maarten . De . Vilder
the documents are only deleted when you do a commit ...
so you should never have an empty index (or at least not for more then a 
couple of seconds)

note that you dont have to delete all documents  you can just upload 
new documents with the same UniqueID and Solr will delete the old 
documents automaticly ... this way you are guaranteed not to have an empty 
index

grts,m





"Thierry Collogne" <[EMAIL PROTECTED]> 
21/03/2007 09:22
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: How to assure a permanent index.






Sorry. Did a send by accident. This the next part of the mail.

I mean if I do the following.

 -  delete all documents from the index
 -  add all documents
 -  do a commit.

Will this result in a temporary empty index, or will I always have 
results?



Re: Problems with special characters

2007-03-21 Thread Maarten . De . Vilder
hey,

we had the same problem with the Solr Java Client ...

they forgot to put UTF-8 encoding on the stream ...

i posted our fix on http://issues.apache.org/jira/browse/SOLR-20 
it's this post : 
http://issues.apache.org/jira/browse/SOLR-20#action_12478810
Frederic Hennequin [07/Mar/07 08:27 AM] 

grts,m 





"Bertrand Delacretaz" <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
21/03/2007 11:19
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Problems with special characters






On 3/21/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:

> I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar 
call
> and the problem isn't there anymore...

ok, thanks for the feedback!

-Bertrand



Re: Problems with special characters

2007-03-21 Thread Maarten . De . Vilder
we didnt use it, but i took a quick look :

you need to implement the "hl=on" attribute in the getquerystring() method 
of the solrqueryImpl

the resultdocs allready contain highlighting, that's why you found 
processHighlighting in the Resultparser

good luck !
m




"Thierry Collogne" <[EMAIL PROTECTED]> 
21/03/2007 17:04
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Problems with special characters






Thank you. When I add the code you described, the Solr Java Client works.
One more question about the Solr Java Client.

Does it allow the use of highlighting? I void a processHighlighting method
in ResultsParser.java, but I can't find a way of enabling it.

Did you use highlighting?

On 21/03/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> 
wrote:
>
> hey,
>
> we had the same problem with the Solr Java Client ...
>
> they forgot to put UTF-8 encoding on the stream ...
>
> i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> it's this post :
> http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> Frederic Hennequin [07/Mar/07 08:27 AM]
>
> grts,m
>
>
>
>
>
> "Bertrand Delacretaz" <[EMAIL PROTECTED]>
> Sent by: [EMAIL PROTECTED]
> 21/03/2007 11:19
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> On 3/21/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:
>
> > I used the new jar file and removed -Dfile.encoding=UTF-8 from my jar
> call
> > and the problem isn't there anymore...
>
> ok, thanks for the feedback!
>
> -Bertrand
>
>



Re: How to assure a permanent index.

2007-03-21 Thread Maarten . De . Vilder
well, yes indeed :) 
but i do think it is easier to put up synchronisation for deleted 
documents as well
clearing the whole index is kind of overkill

when you do this : 
* delete all documents
* submit all documents
* commit
you should also keep in mind that Solr will do an autocommit after a 
certain number of documents ...
so if the process takes a couple of minutes/hours, you might end up with 
an empty index and no results for the users !

cheers,
m




Walter Underwood <[EMAIL PROTECTED]> 
21/03/2007 17:32
Please respond to
solr-user@lucene.apache.org


To

cc

Subject
Re: How to assure a permanent index.






On 3/21/07 1:33 AM, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:

> note that you dont have to delete all documents  you can just upload
> new documents with the same UniqueID and Solr will delete the old
> documents automaticly ... this way you are guaranteed not to have an 
empty
> index

That works if you keep track of all documents that have disappeared
since the last index run. Otherwise, you end up with orphans in
the search index, documents that exist in search, but not in the
real world, also known as "serving 404's in results".

wunder
-- 
Walter Underwood
Search Guru, Netflix





Re: Problems with special characters

2007-03-22 Thread Maarten . De . Vilder
nice one !




"Thierry Collogne" <[EMAIL PROTECTED]> 
22/03/2007 09:00
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Problems with special characters






Thanks. I made some modifications to SolrQuery.java to allow highlighting. 
I
will post the code on

http://issues.apache.org/jira/browse/SOLR-20



On 22/03/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> 
wrote:
>
> we didnt use it, but i took a quick look :
>
> you need to implement the "hl=on" attribute in the getquerystring() 
method
> of the solrqueryImpl
>
> the resultdocs allready contain highlighting, that's why you found
> processHighlighting in the Resultparser
>
> good luck !
> m
>
>
>
>
> "Thierry Collogne" <[EMAIL PROTECTED]>
> 21/03/2007 17:04
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>

>
>
>
>
>
> Thank you. When I add the code you described, the Solr Java Client 
works.
> One more question about the Solr Java Client.
>
> Does it allow the use of highlighting? I void a processHighlighting 
method
> in ResultsParser.java, but I can't find a way of enabling it.
>
> Did you use highlighting?
>
> On 21/03/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> wrote:
> >
> > hey,
> >
> > we had the same problem with the Solr Java Client ...
> >
> > they forgot to put UTF-8 encoding on the stream ...
> >
> > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > it's this post :
> > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > Frederic Hennequin [07/Mar/07 08:27 AM]
> >
> > grts,m
> >
> >
> >
> >
> >
> > "Bertrand Delacretaz" <[EMAIL PROTECTED]>
> > Sent by: [EMAIL PROTECTED]
> > 21/03/2007 11:19
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
> >
> >
> >
> >
> >
> > On 3/21/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:
> >
> > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my 
jar
> > call
> > > and the problem isn't there anymore...
> >
> > ok, thanks for the feedback!
> >
> > -Bertrand
> >
> >
>
>



Re: Problems with special characters

2007-03-22 Thread Maarten . De . Vilder
No, i didn't try to use it (on account of the fact that we dont use Solr 
to display the results)
the only thing our Solr server returns are ID's ... so there is nothing to 
put highlights on

but the code doesnt look half bad :)
lets hope the Client Developers pick up on it :)




"Thierry Collogne" <[EMAIL PROTECTED]> 
22/03/2007 11:27
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Problems with special characters






Thanks. Did you also try using it?

On 22/03/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> 
wrote:
>
> nice one !
>
>
>
>
> "Thierry Collogne" <[EMAIL PROTECTED]>
> 22/03/2007 09:00
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Problems with special characters
>
>
>
>
>
>
> Thanks. I made some modifications to SolrQuery.java to allow 
highlighting.
> I
> will post the code on
>
> http://issues.apache.org/jira/browse/SOLR-20
>
>
>
> On 22/03/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> wrote:
> >
> > we didnt use it, but i took a quick look :
> >
> > you need to implement the "hl=on" attribute in the getquerystring()
> method
> > of the solrqueryImpl
> >
> > the resultdocs allready contain highlighting, that's why you found
> > processHighlighting in the Resultparser
> >
> > good luck !
> > m
> >
> >
> >
> >
> > "Thierry Collogne" <[EMAIL PROTECTED]>
> > 21/03/2007 17:04
> > Please respond to
> > solr-user@lucene.apache.org
> >
> >
> > To
> > solr-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: Problems with special characters
> >
>
> >
> >
> >
> >
> >
> > Thank you. When I add the code you described, the Solr Java Client
> works.
> > One more question about the Solr Java Client.
> >
> > Does it allow the use of highlighting? I void a processHighlighting
> method
> > in ResultsParser.java, but I can't find a way of enabling it.
> >
> > Did you use highlighting?
> >
> > On 21/03/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > hey,
> > >
> > > we had the same problem with the Solr Java Client ...
> > >
> > > they forgot to put UTF-8 encoding on the stream ...
> > >
> > > i posted our fix on http://issues.apache.org/jira/browse/SOLR-20
> > > it's this post :
> > > http://issues.apache.org/jira/browse/SOLR-20#action_12478810
> > > Frederic Hennequin [07/Mar/07 08:27 AM]
> > >
> > > grts,m
> > >
> > >
> > >
> > >
> > >
> > > "Bertrand Delacretaz" <[EMAIL PROTECTED]>
> > > Sent by: [EMAIL PROTECTED]
> > > 21/03/2007 11:19
> > > Please respond to
> > > solr-user@lucene.apache.org
> > >
> > >
> > > To
> > > solr-user@lucene.apache.org
> > > cc
> > >
> > > Subject
> > > Re: Problems with special characters
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 3/21/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:
> > >
> > > > I used the new jar file and removed -Dfile.encoding=UTF-8 from my
> jar
> > > call
> > > > and the problem isn't there anymore...
> > >
> > > ok, thanks for the feedback!
> > >
> > > -Bertrand
> > >
> > >
> >
> >
>
>



Re: multiple indexes

2007-03-23 Thread Maarten . De . Vilder
> Why not create a multivalued field that stores the customer perms?
> add has_access:cust1 has_access:cust2, etc to the document at index
> time, and turn this into a filter query at query time?

that is what we are doing at the moment, and i must say, it works very and 
does not slow the server down at all (because of the efficient indexes 
that solr builds)





"Mike Klaas" <[EMAIL PROTECTED]> 
22/03/2007 19:15
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: multiple indexes






On 3/22/07, Kevin Osborn <[EMAIL PROTECTED]> wrote:
> Here is an issue that I am trying to resolve. We have a large catalog of 
documents, but our customers (several hundred) can only see a subset of 
those documents. And the subsets vary in size greatly. And some of these 
customers will be creating a lot of traffic. Also, there is no way to map 
the subsets to a query. The customer either has access to a document or 
they don't.
>
> Has anybody worked on this issue before? If I use one large index and do 
the filtering in my application, then Solr will be serving a lot of 
useless documents. The counts would also be screwed up for facet queries. 
Is the best solution to extend Solr and do the filtering there?
>
> The other potential solution is to have one index per customer. This 
would require one instance of the servlet per index, correct? It just 
seems like this would require a lot of hardware and complexity 
(configuring the memory of each servlet instance to index size and 
traffic).

Why not create a multivalued field that stores the customer perms?
add has_access:cust1 has_access:cust2, etc to the document at index
time, and turn this into a filter query at query time?

-Mike



Re: Reposting unABLE to match

2007-03-27 Thread Maarten . De . Vilder
what exactly is the problem ?

seems like you end up with the same term text in both query and index 
analyzer ... you should have found a match...





Shridhar Venkatraman <[EMAIL PROTECTED]> 
27/03/2007 14:08
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Reposting unABLE to match






Solr 


  Solr Admin (GENIE)

ShridharVAIO:8084
cwd=C:\Program Files\netbeans-5.5\enterprise3\apache-tomcat-5.5.17\bin
SolrHome=c:\Documents and
Settings\Shridhar\Desktop\Public\Sana\KN\Genie\GenieConf/


Field Analysis

*Field name* 
*Field value (Index)*
verbose output
highlight matches"unABLE TO CONNECT"
*Field value (Query)*
verbose output   "unABLE TO CONNECT"
 


  Index Analyzer


org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory {}

term position1   2   3
term text"unABLE TO  CONNECT"
term typewordwordword
source start,end 0,7 8,1011,19


org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}

term position1   2   3
term text"unABLE TO  CONNECT"
term typewordwordword
source start,end 0,7 8,1011,19


org.apache.solr.analysis.StandardFilterFactory {}

term position1   2   3
term text"unABLE TO  CONNECT"
term typewordwordword
source start,end 0,7 8,1011,19


org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position1   2
term text"unABLE CONNECT"
term typewordword
source start,end 0,7 11,19


org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, generateWordParts=1,
catenateAll=1, catenateNumbers=1}

term position1   2   3
term textun  ABLECONNECT
 unABLE
term typewordwordword
word
source start,end 1,3 3,7 11,18
 1,7


org.apache.solr.analysis.LowerCaseFilterFactory {}

term position1   2   3
term textun  ableconnect
 unable
term typewordwordword
word
source start,end 1,3 3,7 11,18
 1,7


org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position1   2   3
term textun  ableconnect
 unable
term typewordwordword
word
source start,end 1,3 3,7 11,18
 1,7


  Query Analyzer


org.apache.solr.analysis.HTMLStripStandardTokenizerFactory {}

term position1   2   3
term textunABLE  TO  CONNECT
term type   
source start,end 1,7 8,1011,18


org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}

term position1   2   3
term textunABLE  TO  CONNECT
term type   
source start,end 1,7 8,1011,18


org.apache.solr.analysis.StandardFilterFactory {}

term position1   2   3
term textunABLE  TO  CONNECT
term type   
source start,end 1,7 8,1011,18


org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position1   2
term textunABLE  CONNECT
term type  
source start,end 1,7 11,18


org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, ge

Re: Reposting unABLE to match

2007-03-27 Thread Maarten . De . Vilder
the only thing i can think of is the fact that in the index analysis the 
term-type is "word"
and in the query analysis the term-type is "alphanumeric"

you should be getting a match if that doesnt matter ... you get exactly 
the same term texts ...





Shridhar Venkatraman <[EMAIL PROTECTED]> 
27/03/2007 14:08
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Reposting unABLE to match






Solr 


  Solr Admin (GENIE)

ShridharVAIO:8084
cwd=C:\Program Files\netbeans-5.5\enterprise3\apache-tomcat-5.5.17\bin
SolrHome=c:\Documents and
Settings\Shridhar\Desktop\Public\Sana\KN\Genie\GenieConf/


Field Analysis

*Field name* 
*Field value (Index)*
verbose output
highlight matches"unABLE TO CONNECT"
*Field value (Query)*
verbose output   "unABLE TO CONNECT"
 


  Index Analyzer


org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory {}

term position1   2   3
term text"unABLE TO  CONNECT"
term typewordwordword
source start,end 0,7 8,1011,19


org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}

term position1   2   3
term text"unABLE TO  CONNECT"
term typewordwordword
source start,end 0,7 8,1011,19


org.apache.solr.analysis.StandardFilterFactory {}

term position1   2   3
term text"unABLE TO  CONNECT"
term typewordwordword
source start,end 0,7 8,1011,19


org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position1   2
term text"unABLE CONNECT"
term typewordword
source start,end 0,7 11,19


org.apache.solr.analysis.WordDelimiterFilterFactory
{generateNumberParts=1, catenateWords=1, generateWordParts=1,
catenateAll=1, catenateNumbers=1}

term position1   2   3
term textun  ABLECONNECT
 unABLE
term typewordwordword
word
source start,end 1,3 3,7 11,18
 1,7


org.apache.solr.analysis.LowerCaseFilterFactory {}

term position1   2   3
term textun  ableconnect
 unable
term typewordwordword
word
source start,end 1,3 3,7 11,18
 1,7


org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position1   2   3
term textun  ableconnect
 unable
term typewordwordword
word
source start,end 1,3 3,7 11,18
 1,7


  Query Analyzer


org.apache.solr.analysis.HTMLStripStandardTokenizerFactory {}

term position1   2   3
term textunABLE  TO  CONNECT
term type   
source start,end 1,7 8,1011,18


org.apache.solr.analysis.SynonymFilterFactory
{synonyms=synonyms.txt, expand=true, ignoreCase=true}

term position1   2   3
term textunABLE  TO  CONNECT
term type   
source start,end 1,7 8,1011,18


org.apache.solr.analysis.StandardFilterFactory {}

term position1   2   3
term textunABLE  TO  CONNECT
term type   
source start,end 1,7 8,1011,18


org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position1   2
term textunABLE  CONNECT
term type  
source start,end 1,7 11,18


o

Item Search Database

2007-03-28 Thread Maarten . De . Vilder
hi,

i have a performance question...

we need to implement a feature called 'Item Search Database', which 
basically means we have to limit the documents a user can search ...

example :
Item1 is in database1
item2 is in database2
item3 is in database1 and database2
and the client can only see the items in database1

we currently solve this by making a new solrcolumn for each 
searchdatabase... so it looks like this :
ITEMNAMEDB1 DB2
-   --  --
Item1   truefalse
Item2   false   true
Item3   truetrue

and we limit the result of a search by putting "db1:true" in the 
querystring

but i have been reading about another method :
we could also use just one solrcolum and put the names of the database in 
it...
like so :
ITEMNAMEDB
-   -
Item1   DB1
Item2   DB2
Item3   DB1 DB2

and limit the results by putting 'db:db1' in the querystring

and now for my question :
which of these options will be more performant ?

my guess is the first option will be the most performant since the indexes 
will be better constructed
but i would really like a professional opinion on this ...

as i said, we are currently using the first option on 300.000 testrecords 
and it is really performant.
some SearchDatabases have only 12 records in it and it takes less then 1ms 
to get those 12 records back... so i'm guessing Solr is not searching the 
full 300.000 records and i am kind of afraid that with the second option 
Solr will have to search more records/indexes to get the same result...

well, hope you understand my question and thanks in advance !
- Maarten

PS: thank you to everybody on this list for the help and thank you to all 
of the Solr/Lucene developers, great stuff !!

Re: C# API for Solr

2007-04-01 Thread Maarten . De . Vilder
Well, i think there will be a lot of people who will be very happy with 
this C# client.

grts,m 




"Jeff Rodenburg" <[EMAIL PROTECTED]> 
31/03/2007 18:00
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
C# API for Solr






We built our first search system architecture around Lucene.Net back in 
2005
and continued to make modifications through 2006.  We quickly learned that
search management is so much more than query algorithms and indexing
choices.  We were not readily prepared for the operational overhead that 
our
Lucene-based search required: always-on availability, fast response times,
batch and real-time updates, etc.

Fast forward to 2007.  Our front-end is Microsoft-based, but we needed to
support parallel development on non-Microsoft architecture, and thus 
needed
a cross-platform search system.  Hello Solr!  We've transitioned our 
search
system to Solr with a Linux/Tomcat back-end, and it's been a champ.  We 
now
use solr not only for standard keyword search, but also to drive queries 
for
lots of different content sections on our site.  Solr has moved beyond
mission critical in our operation.

As we've proceeded, we've built out a nice C# client library to abstract 
the
interaction from C# to Solr.  It's mostly generic and designed for

extensibilty.  With a few modifications, this could be a stand-alone 
library
that works for others.

I have clearance from the organization to contribute our library to the
community if there's interest.  I'd first like to gauge the interest of
everyone before doing so; please reply if you do.

cheers,
jeff r.



Re: Solr logo poll

2007-04-09 Thread Maarten . De . Vilder
i would use the first one, much more professional




"Yonik Seeley" <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
06/04/2007 19:51
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Solr logo poll






Quick poll...  Solr 2.1 release planning is underway, and a new logo
may be a part of that.
What "form" of logo do you prefer, A or B?  There may be further
tweaks to these pictures, but I'd like to get a sense of what the user
community likes.

A) 
http://issues.apache.org/jira/secure/attachment/12349897/logo-solr-d.jpg

B) 
http://issues.apache.org/jira/secure/attachment/12353535/12353535_solr-nick.gif


Just respond to this thread with your preference.

-Yonik



Leading wildcards

2007-04-18 Thread Maarten . De . Vilder
hi,

we have been trying to get the leading wildcards to work.

we have been looking around the Solr website, the Lucene website, wiki's 
and the mailing lists etc ...
but we found a lot of contradictory information.

so we have a few question : 
- is the latest version of lucene capable of handling leading wildcards ? 
- is the latest version of solr capable of handling leading wildcards ?
- do we need to make adjustments to the solr source code ?
- if we need to adjust the solr source, what do we need to change ?

thanks in advance !
Maarten


Re: AW: Leading wildcards

2007-04-20 Thread Maarten . De . Vilder
thanks, this worked like a charm !!

we built a custom "QueryParser" and we integrated the *foo** in it, so 
basically we can now search leading, trailing and both ...

only crappy thing is the max Boolean clauses, but i'm going to look into 
that after the weekend

for the next release of Solr :
do not make this default, too many risks
but do make an option in the config to enable it, it's a very nice feature 


thanks everybody for the help and have a nice weekend,
maarten





"Burkamp, Christian" <[EMAIL PROTECTED]> 
19/04/2007 12:37
Please respond to
solr-user@lucene.apache.org


To

cc

Subject
AW: Leading wildcards






Hi there,

Solr does not support leading wildcards, because it uses Lucene's standard 
QueryParser class without changing the defaults. You can easily change 
this by inserting the line

parser.setAllowLeadingWildcards(true);

in QueryParsing.java line 92. (This is after creating a QueryParser 
instance in QueryParsing.parseQuery(...))

and it obviously means that you have to change solr's source code. It 
would be nice to have an option in the schema to switch leading wildcards 
on or off per field. Leading wildcards really make no sense on richly 
populated fields because queries tend to result in too many clauses 
exceptions most of the time.

This works for leading wildcards. Unfortunately it does not enable 
searches with leading AND trailing wildcards. (E.g. searching for "*lega*" 
does not find results even if the term "elegance" is in the index. If you 
put a second asterisk at the end, the term "elegance" is found. (search 
for "*lega**" to get hits).
Can anybody explain this though it seems to be more of a lucene 
QueryParser issue?

-- Christian

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 19. April 2007 08:35
An: solr-user@lucene.apache.org
Betreff: Leading wildcards


hi,

we have been trying to get the leading wildcards to work.

we have been looking around the Solr website, the Lucene website, wiki's 
and the mailing lists etc ...
but we found a lot of contradictory information.

so we have a few question : 
- is the latest version of lucene capable of handling leading wildcards ? 
- is the latest version of solr capable of handling leading wildcards ?
- do we need to make adjustments to the solr source code ?
- if we need to adjust the solr source, what do we need to change ?

thanks in advance !
Maarten




Re: AW: Leading wildcards

2007-04-23 Thread Maarten . De . Vilder
hey,

i'm sorry for the confusion : our "custom query parser" is not a Lucene 
query parser 

it is something we built for the client-side of Solr ...

it basically transforms some search arguments into an Solr query URL

example : method query( searchID, searchQuery, category, ) returns 
http://solrhost/solr/select/?q=id%3AsearchString+OR+query%3AsearchString&version=2.2&start=0&rows=10&indent=on
(that is what i mean by "query parsing")
this method will perform a series of operations on the keywords and return 
a working Solr-query

we are using the Java solr client and we built a framework around it to 
simplify our actions.

example for the wildcards :
we basically check if there is a keyword that starts and ends with an * 
(by using regular expressions)
and if such a keyword is found, we add a second * at the end ...
by doing this we make sure we send a working query to the Solr server

we also escape special characters and other wildcards this way

and we also built in highlighting for wildcard queries :
if we see the user is using wildcards, we dont use the standard 
solr-highlighting (which doesnt work with wildcards)
in stead we use regular expression to highlight the results after we get 
them back from the server
example : 
*foo*  in solr query becomes .*foo.* in regular expression... ( .* means a 
series of characters in RE)
then we check if our result contains this regular expression and put some 
-tags around the matching words
and before we knew it, our wildcard searches were highlighted

wether this is a good way of handling these things is open for discussion, 
if we have more time we might actually change the Solr-server code to fix 
these things.
it's just a full proof work-around at this moment.

grts,m





"Michael Kimsal" <[EMAIL PROTECTED]> 
20/04/2007 16:30
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: AW: Leading wildcards






Maarten:

Would you mind sharing your custom query parser?


On 4/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> thanks, this worked like a charm !!
>
> we built a custom "QueryParser" and we integrated the *foo** in it, so
> basically we can now search leading, trailing and both ...
>
> only crappy thing is the max Boolean clauses, but i'm going to look into
> that after the weekend
>
> for the next release of Solr :
> do not make this default, too many risks
> but do make an option in the config to enable it, it's a very nice 
feature
>
>
> thanks everybody for the help and have a nice weekend,
> maarten
>
>
>
>
>
> "Burkamp, Christian" <[EMAIL PROTECTED]>
> 19/04/2007 12:37
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> 
> cc
>
> Subject
> AW: Leading wildcards
>
>
>
>
>
>
> Hi there,
>
> Solr does not support leading wildcards, because it uses Lucene's 
standard
> QueryParser class without changing the defaults. You can easily change
> this by inserting the line
>
> parser.setAllowLeadingWildcards(true);
>
> in QueryParsing.java line 92. (This is after creating a QueryParser
> instance in QueryParsing.parseQuery(...))
>
> and it obviously means that you have to change solr's source code. It
> would be nice to have an option in the schema to switch leading 
wildcards
> on or off per field. Leading wildcards really make no sense on richly
> populated fields because queries tend to result in too many clauses
> exceptions most of the time.
>
> This works for leading wildcards. Unfortunately it does not enable
> searches with leading AND trailing wildcards. (E.g. searching for 
"*lega*"
> does not find results even if the term "elegance" is in the index. If 
you
> put a second asterisk at the end, the term "elegance" is found. (search
> for "*lega**" to get hits).
> Can anybody explain this though it seems to be more of a lucene
> QueryParser issue?
>
> -- Christian
>
> -Ursprüngliche Nachricht-
> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Gesendet: Donnerstag, 19. April 2007 08:35
> An: solr-user@lucene.apache.org
> Betreff: Leading wildcards
>
>
> hi,
>
> we have been trying to get the leading wildcards to work.
>
> we have been looking around the Solr website, the Lucene website, wiki's
> and the mailing lists etc ...
> but we found a lot of contradictory information.
>
> so we have a few question :
> - is the latest version of lucene capable of handling leading wildcards 
?
> - is the latest version of solr capable of handling leading wildcards ?
> - do we need to make adjustments to the solr source code ?
> - if we need to adjust the solr source, what do we need to change ?
>
> thanks in advance !
> Maarten
>
>
>


-- 
Michael Kimsal
http://webdevradio.com



Re: AW: Leading wildcards

2007-04-23 Thread Maarten . De . Vilder
hey,

we've stumbled on something weird while using wildcards 

we enabled leading wildcards in solr (see previous message from Christian 
Burkamp)

when we do a search on a nonexisting field, we get a  SolrException: 
undefined field
(this was for query "nonfield:test")

but when we use wildcards in our query, we dont get the undefined field 
exception,
so the query "nonfield:*test" works fine ... just zero results...

is this normal behaviour ? 




"Burkamp, Christian" <[EMAIL PROTECTED]> 
19/04/2007 12:37
Please respond to
solr-user@lucene.apache.org


To

cc

Subject
AW: Leading wildcards






Hi there,

Solr does not support leading wildcards, because it uses Lucene's standard 
QueryParser class without changing the defaults. You can easily change 
this by inserting the line

parser.setAllowLeadingWildcards(true);

in QueryParsing.java line 92. (This is after creating a QueryParser 
instance in QueryParsing.parseQuery(...))

and it obviously means that you have to change solr's source code. It 
would be nice to have an option in the schema to switch leading wildcards 
on or off per field. Leading wildcards really make no sense on richly 
populated fields because queries tend to result in too many clauses 
exceptions most of the time.

This works for leading wildcards. Unfortunately it does not enable 
searches with leading AND trailing wildcards. (E.g. searching for "*lega*" 
does not find results even if the term "elegance" is in the index. If you 
put a second asterisk at the end, the term "elegance" is found. (search 
for "*lega**" to get hits).
Can anybody explain this though it seems to be more of a lucene 
QueryParser issue?

-- Christian

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 19. April 2007 08:35
An: solr-user@lucene.apache.org
Betreff: Leading wildcards


hi,

we have been trying to get the leading wildcards to work.

we have been looking around the Solr website, the Lucene website, wiki's 
and the mailing lists etc ...
but we found a lot of contradictory information.

so we have a few question : 
- is the latest version of lucene capable of handling leading wildcards ? 
- is the latest version of solr capable of handling leading wildcards ?
- do we need to make adjustments to the solr source code ?
- if we need to adjust the solr source, what do we need to change ?

thanks in advance !
Maarten




Collection Distirbution in windows

2007-05-02 Thread Maarten . De . Vilder
i know this is a stupid question, but are there any collection 
distribution scripts for windows available ?

thanks !

Re: Collection Distirbution in windows

2007-05-03 Thread Maarten . De . Vilder
damn, there goes the platform independance ...

is there anybody with a lillte more experience when it comes to collection 
distribution on Windows ?

tnx in advance !





"Bill Au" <[EMAIL PROTECTED]> 
02/05/2007 15:09
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Collection Distirbution in windows






The collection distribution scripts relies on hard links and rsync.  It
seems that both maybe avaialble on Windows

hard links:
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_hardlink.mspx?mfr=true


rsync:
http://samba.anu.edu.au/rsync/download.html

I say maybe because I don't know if hard link on windows work the same way
as hard link on Linux/Unix.

You will also need something like cygwin to run the bash scripts.

Bill

On 5/2/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> i know this is a stupid question, but are there any collection
> distribution scripts for windows available ?
>
> thanks !