Strange Sorting results on a Text Field

2006-09-11 Thread Tom Weber

Hello,

  have a strange response in a query with sorting.

  I sort on a field which is :

  multiValued="true"/>


  in this field mostly 32 byte md5's are saved, mostly only a single  
entry but also up to 5.


  when I do a search like this : "+testfield: 
(fde34c51739462d9486140601dcfb7bf 63af20144c2cbae1ec4dc0bc2e9d2c2f  
3cf8e32bf2b9384447d52318a72fd4b1) ;testfield asc"


  I get the following results:
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
4302516b91b743a8972120f52d309a72
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16

  I have no idea why position 6 is in this search, because the XML  
entries are correct too.


  Any Idea where I may search for the error ?

  Also, does somebody has a link where the benefits of "multiValued"  
are explained ?


  Thanks,

  Tom




Re: Strange Sorting results on a Text Field

2006-09-11 Thread Yonik Seeley

On 9/11/06, Tom Weber <[EMAIL PROTECTED]> wrote:

Hello,

   have a strange response in a query with sorting.

   I sort on a field which is :

   


I think you probably want a type="string" instead.  Text fields have
text analysis (stemming, lowercasing, word splitting, etc) and aren't
used for exact matching or sorting.


   in this field mostly 32 byte md5's are saved, mostly only a single
entry but also up to 5.

   when I do a search like this : "+testfield:
(fde34c51739462d9486140601dcfb7bf 63af20144c2cbae1ec4dc0bc2e9d2c2f
3cf8e32bf2b9384447d52318a72fd4b1) ;testfield asc"

   I get the following results:
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
4302516b91b743a8972120f52d309a72
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16
c10c9bf4ef3f1bc30aedf83b96a9ce16

   I have no idea why position 6 is in this search, because the XML
entries are correct too.

   Any Idea where I may search for the error ?

   Also, does somebody has a link where the benefits of "multiValued"
are explained ?


You can have multiple values for the field in a single document if
it's marked as multiValued:


 first val
 second val



-Yonik


Re: Strange Sorting results on a Text Field

2006-09-11 Thread Tom Weber

Hello Yonik,

  You are right about the string stuff, I saw while turning on the  
debugging a few minutes ago, that it is splitting the md5 sum up in  
several parts, eacht time we have a number after a letter or the  
other way round.


  Thanks also for the "multiValued" explanation, this is useful for  
my current application. But then, if I use this field and I ask for  
sorting, how will the sorting be done, alphanumeric on the first  
entry for this field ? Until now, I entered more than one entry by  
separting them with a space in the same field, like name="test">text1 text2 text3.


  Thanks,

  tom


On 11 Sep, 2006, at 15:14 , Yonik Seeley wrote:


On 9/11/06, Tom Weber <[EMAIL PROTECTED]> wrote:

Hello,

   have a strange response in a query with sorting.

   I sort on a field which is :

   


I think you probably want a type="string" instead.  Text fields have
text analysis (stemming, lowercasing, word splitting, etc) and aren't
used for exact matching or sorting.


   in this field mostly 32 byte md5's are saved, mostly only a single
entry but also up to 5.

   when I do a search like this : "+testfield:
(fde34c51739462d9486140601dcfb7bf 63af20144c2cbae1ec4dc0bc2e9d2c2f
3cf8e32bf2b9384447d52318a72fd4b1) ;testfield asc"

   I get the following results:
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16
name="testfield">4302516b91b743a8972120f52d309a72
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16
name="testfield">c10c9bf4ef3f1bc30aedf83b96a9ce16


   I have no idea why position 6 is in this search, because the XML
entries are correct too.

   Any Idea where I may search for the error ?

   Also, does somebody has a link where the benefits of "multiValued"
are explained ?


You can have multiple values for the field in a single document if
it's marked as multiValued:


 first val
 second val



-Yonik




Re: Strange Sorting results on a Text Field

2006-09-11 Thread Yonik Seeley

On 9/11/06, Tom Weber <[EMAIL PROTECTED]> wrote:

   Thanks also for the "multiValued" explanation, this is useful for
my current application. But then, if I use this field and I ask for
sorting, how will the sorting be done, alphanumeric on the first
entry for this field ? Until now, I entered more than one entry by
separting them with a space in the same field, like text1 text2 text3.


Sorting is currently only supported when there is at most one value
(or token) per document.  This is a lucene restriction.

-Yonik


Solr in production env.

2006-09-11 Thread Simon Willnauer

Hello,

I almost convinced my boss to use Solr in production for a new project
and hopefully for lots of following projects but I'm a bit confused
that there is no release available for download. Is Solr still in a
beta state, are there solr servers in production. Is it recommendable
to use it in production? I would be glad about some experience and
recommendations about this topic.


best regards Simon


Re: Solr in production env.

2006-09-11 Thread Tim Archambault

CNET has been using SOLR in production for quite some time. There are others
on this list-serv that can elaborate way beyond me.

On 9/11/06, Simon Willnauer <[EMAIL PROTECTED]> wrote:


Hello,

I almost convinced my boss to use Solr in production for a new project
and hopefully for lots of following projects but I'm a bit confused
that there is no release available for download. Is Solr still in a
beta state, are there solr servers in production. Is it recommendable
to use it in production? I would be glad about some experience and
recommendations about this topic.


best regards Simon



Re: Solr in production env.

2006-09-11 Thread Eivind Hasle Amundsen

I know about a number of production installations.
I even know of a company which mere existence is based partly on Solr. :)

There is also a public list of production installations available on the 
homepage and/or Wiki.


Eivind


Re: Solr in production env.

2006-09-11 Thread Bertrand Delacretaz

Hi Simon,


...are there solr servers in production...


You can see a list at http://wiki.apache.org/solr/PublicServers -
there's some solid stuff running on Solr already!

-Bertrand


Re: Got it working! And some questions

2006-09-11 Thread Yonik Seeley

On 9/9/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:

The main problem was that addIndex was sending 1 doc at a time to solr;
it would cause a problem after a few thousand docs because i was running
out of resources.


Sending one doc at a time should be fine... you shouldn't run out of
resources.
There must be a bug somewhere...

-Yonik


Re: Got it working! And some questions

2006-09-11 Thread Erik Hatcher


On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
 I'm still a little disappointed that I can't change the OR/AND  
parsing by just changing some parameter (like I can do for the  
number of results returned, for example); adding a OR between each  
word in the text i want to compare sounds suboptimal, but i'll  
probably do it that way; its a very minor nitpick, solr is awesome,  
as I said before.


I'm the one that added support for controlling the default operator  
of Solr's query parser, and I hadn't considered the use case of  
controlling that setting from a request parameter.  It should be easy  
enough to add.  I'll take a look at adding that support and commit it  
once I have it working.


What parameter name should be used for this?do=[AND|OR] (for  
default operator)?  We have df for default field.


Erik



Re: Got it working! And some questions

2006-09-11 Thread Michael Imbeault

Hello Erik,

Thanks for add that feature! "do" is fine with me, if "op" is already 
used (not sure about this one).


Erik Hatcher wrote:


On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
 I'm still a little disappointed that I can't change the OR/AND 
parsing by just changing some parameter (like I can do for the number 
of results returned, for example); adding a OR between each word in 
the text i want to compare sounds suboptimal, but i'll probably do it 
that way; its a very minor nitpick, solr is awesome, as I said before.


I'm the one that added support for controlling the default operator of 
Solr's query parser, and I hadn't considered the use case of 
controlling that setting from a request parameter.  It should be easy 
enough to add.  I'll take a look at adding that support and commit it 
once I have it working.


What parameter name should be used for this?do=[AND|OR] (for 
default operator)?  We have df for default field.


Erik


--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Re: Got it working! And some questions

2006-09-11 Thread Yonik Seeley

On 9/11/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:


On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
>  I'm still a little disappointed that I can't change the OR/AND
> parsing by just changing some parameter (like I can do for the
> number of results returned, for example); adding a OR between each
> word in the text i want to compare sounds suboptimal, but i'll
> probably do it that way; its a very minor nitpick, solr is awesome,
> as I said before.

I'm the one that added support for controlling the default operator
of Solr's query parser, and I hadn't considered the use case of
controlling that setting from a request parameter.  It should be easy
enough to add.  I'll take a look at adding that support and commit it
once I have it working.

What parameter name should be used for this?do=[AND|OR] (for
default operator)?  We have df for default field.


Maybe something like q.op or q.oper if it *only* applies to q.  Which
begs the question... what *does* it apply to?  At first blush, it
doesn't seem like it should apply to other queries like fq, facet
queries, and esp queries defined in solrconfig.xml.  I think that
would be very surprising.

-Yonik


Re: Got it working! And some questions

2006-09-11 Thread Chris Hostetter

: Maybe something like q.op or q.oper if it *only* applies to q.  Which
: begs the question... what *does* it apply to?  At first blush, it
: doesn't seem like it should apply to other queries like fq, facet
: queries, and esp queries defined in solrconfig.xml.  I think that
: would be very surprising.

agreed not the comment i put into SolrPluginUtils.parseFilterQueries when
i add fq support to StandardRequestHandler...

/* Ignore SolrParams.DF - could have init param FQs assuming the
 * schema default with query param DF intented to only affect Q.
 * If user doesn't want schema default, they should be explicit in the FQ.
 */

... i would think a "do" or "op" or "q.op" param should *definitely* only
influence the "q" param.





-Hoss



MoreLikeThis class in Lucene within Solr?

2006-09-11 Thread Michael Imbeault
Ok, so hopefully I resolved my problems posting to this mailing list and 
this won't show up in some thread, but as a new topic!


Is it possible in any way to use the MoreLikeThis class with solr 
(http://lucene.apache.org/java/docs/api/org/apache/lucene/search/similar/MoreLikeThis.html)? 
Right now I'm determining similar docs by just querying for the whole 
body with OR between words, and it's not very efficient performance 
wise. I never coded in Java so I really don't know where I should start...


Thanks,

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Re: Solr in production env.

2006-09-11 Thread Jeff Rodenburg

Hi Simon -

We're running Solr in production, and it's rock solid.  Of course you can't
really just take an anonymous word for it, but I would honestly put this
stack up against any other system you can find, open source or commercial.
Run it for yourself and you'll be alarmed at how sound it is, out of the
box.

I'll bet Hoss & Yonik's paycheck on it.  ;-)

-- j

P.S. Hoss & Yonik - just kidding, but couldn't resist.  Many kudos to your
efforts on this.



On 9/11/06, Simon Willnauer <[EMAIL PROTECTED]> wrote:


Hello,

I almost convinced my boss to use Solr in production for a new project
and hopefully for lots of following projects but I'm a bit confused
that there is no release available for download. Is Solr still in a
beta state, are there solr servers in production. Is it recommendable
to use it in production? I would be glad about some experience and
recommendations about this topic.


best regards Simon



Re: Solr in production env.

2006-09-11 Thread Chris Hostetter

: I'll bet Hoss & Yonik's paycheck on it.  ;-)

: P.S. Hoss & Yonik - just kidding, but couldn't resist.  Many kudos to your
: efforts on this.

I can't speak for Yonik, but I take no offense -- I bet my paycheck on
Solr every day :)

: > and hopefully for lots of following projects but I'm a bit confused
: > that there is no release available for download. Is Solr still in a
: > beta state, are there solr servers in production. Is it recommendable
: > to use it in production? I would be glad about some experience and
: > recommendations about this topic.

With any piece of software I'd personally recommend you rev your local
copy only after veting that it behaves as you expect based on your usage
of the previous version (ie:  have your own Unit Tests that you run
against it on a Dev box before deploying to production).  It's also wise
to keep snapshots of the source code and documentation each time you rev
your local copy in case the project takes a drastic turn in direction and
you find yourself wanting to fork from the last version you were happy
with.

As for Solr specificly: *I* certainly think it's suitable for production
use, and have a vested interested in making sure it doesn't change so
radically that future changes aren't backwards compatible.





-Hoss



Re: Solr in production env.

2006-09-11 Thread Chris Hostetter

: I even know of a company which mere existence is based partly on Solr. :)

now *that* sounds like i story i'd like to hear more of



-Hoss