date:20080606

Re: Solr 1.3 - leading wildcard query feature?

2008-06-06 Thread Maximilian Hütter


Erik Hatcher schrieb:


On Jun 5, 2008, at 10:13 AM, Maximilian Hütter wrote:
I haven't followed the discussion here closly but I am interested if 
Solr 1.3 will have the feature of leading wildcard in a query. I 
remember reading a discussion about it and making it configurable, 
which is perfectly fine with me.


Will this be in Solr 1.3 and when will 1. be released? :-)


It's still an unresolved issue: 



The implementation details were still being discussed.  I like the idea 
of tying this to the QueryComponent configuration, or perhaps the 
QParser implementation rather.  Seem reasonable?


I personally have interest in this kind of flexibility within Solr, I'm 
just short on time to implement it myself at the moment.  It'd be great 
to see this make it in 1.3, methinks.


Erik


Maybe I could contribute that but I don't really know the code well 
however. I found the Lucene-switch and someone described in a discussion 
on this earlier and change it, but that doesn't seem to be the way you 
would want to handle it.
Our product needs this feature but we would like to stick to the Solr 
releases.
Tying it to the QueryComponent configuration sounds good to me, I don't 
really understand where you would add it to the QParser implementation.

But this is rather a discussion for the developer list, I think.

Max

--
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  [EMAIL PROTECTED]
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich

Analytics e.g. "Top 10 searches"

2008-06-06 Thread McBride, John


 Hello,

Is anybody familiar with any SOLR-based analytical tools which would
allow us to extract "top ten seaches", for example.

I imagine at the query parse level, where the query is tokenized and
filtered would be the best place to log this, due to the many
permutations possible at the user input level.

Is there an existing plugin to do this, or could you suggest how to
architect this?

Thanks,
John

Re: scaling / sharding questions

2008-06-06 Thread Marcus Herou

Cool sharding technique.

We as well are thinking of howto "move" docs from one index to another
because we need to re-balance the docs when we add new nodes to the cluster.
We do only store id's in the index otherwise we could have moved stuff
around with IndexReader.document(x) or so. Luke (http://www.getopt.org/luke/)
is able to reconstruct the indexed Document data so it should be doable.
However I'm thinking of actually just delete the docs from the old index and
add new Documents to the new node. It would be cool to not waste cpu cycles
by reindexing already indexed stuff but...

And we as well will have data amounts in the range you are talking about. We
perhaps could share ideas ?

How do you plan to store where each document is located ? I mean you
probably need to store info about the Document and it's location somewhere
perhaps in a clustered DB ? We will probably go for HBase for this.

I think the number of documents is less important than the actual data size
(just speculating). We currently search 10M (will get much much larger)
indexed blog entries on one machine where the JVM has 1G heap, the index
size is 3G and response times are still quite fast. This is a readonly node
though and is updated every morning with a freshly optimized index. Someone
told me that you probably need twice the RAM if you plan to both index and
search at the same time. If I were you I would just test to index X entries
of your data and then start to search in the index with lower JVM settings
each round and when response times get too slow or you hit OOE then you get
a rough estimate of the bare minimum X RAM needed for Y entries.

I think we will do with something like 2G per 50M docs but I will need to
test it out.

If you get an answer in this matter please let me know.

Kindly

//Marcus

On Fri, Jun 6, 2008 at 7:21 AM, Jeremy Hinegardner <[EMAIL PROTECTED]>
wrote:

> Hi all,
>
> This may be a bit rambling, but let see how it goes.  I'm not a Lucene or
> Solr
> guru by any means, I have been prototyping with solr and understanding how
> all
> the pieces and parts fit together.
>
> We are migrating our current document storage infrastructure to a decent
> sized
> solr cluster, using 1.3-snapshots right now.  Eventually this will be in
> the
> billion+ documents, with about 1M new documents added per day.
>
> Our main sticking point right now is that a significant number of our
> documents
> will be updated, at least once, but possibly more than once.  The
> volatility of
> a document decreases over time.
>
> With this in mind, we've been considering using a cascading series of shard
> clusters.  That is :
>
>  1) a cluster of shards holding recent data ( most recent week or two )
> smaller
>indexes that take a small amount of time to commit updates and optimise,
>since this will hold the most volatile documents.
>
>  2) Following that another cluster of shards that holds some relatively
> recent
>( 3-6 months ? ), but not super volatile, documents, these are items
> that
>could potentially receive updates, but generally not.
>
>  3) A final set of 'archive' shards holding the final resting place for
>documents.  These would not receive updates.  These would be online for
>searching and analysis "forever".
>
> We are not sure if this is the best way to go, but it is the approach we
> are
> leaning toward right now.  I would like some feedback from the folks here
> if you
> think that is a reasonable approach.
>
> One of the other things I'm wondering about is how to manipulate indexes
> We'll need to roll documents around between indexes over time, or at least
> migrate indexes from one set of shards to another as the documents 'age'
> and
> merge/aggregate them with more 'stable' indexes.   I know about merging
> complete
> indexes together, but what about migrating a subset of documents from one
> index
> into another index?
>
> In addition, what is generally considered a 'manageable' index of large
> size?  I
> was attempting to find some information on the relationship between search
> response times, the amount of memory for used for a search, and the number
> of
> documents in an index, but I wasn't having much luck.
>
> I'm not sure if I'm making sense here, but just thought I would throw this
> out
> there and see what people think.  Ther eis the distinct possibility that I
> am
> not asking the right questions or considering the right parameters, so feel
> free
> to correct me, or ask questions as you see fit.
>
> And yes, I will report how we are doing things when we get this all figured
> out,
> and if there are items that we can contribute back to Solr we will.  If
> nothing
> else there will be a nice article of how we manage TB of data with Solr.
>
> enjoy,
>
> -jeremy
>
> --
> 
>  Jeremy Hinegardner  [EMAIL PROTECTED]
>
>

-- 
Marcus Herou CTO and co-founder Tailsweep AB
+4

Re: An unusual question for the experts -- term boosting for individual documents?

2008-06-06 Thread Tricia Williams

Payloads could be the answer but I don't think there is any cross over
into what I've been working on with Payloads
(https://issues.apache.org/jira/browse/SOLR-380 has what I last posted
which is pretty much what we're using now. I've also posted related
SOLR-532 and SOLR-522).

What you would have to do is write a custom Tokenizer or TokenFilter
which takes your input, breaks into tokens and then adds the numeric
value as a payload. Assuming your input is actually something like:

cat:0.99 dog:0.42 car:0.00
you could write a TokenFilter which builds on the WhitespaceTokenizer to
break each token on ":" using the first part as the token value and the
second part as the token's payload. I think the APIs are pretty clear
if you are looking for help.

I haven't looked at all at how you can query/boost using payloads, but
if Grant says that integrating the BoostingTermQuery isn't all that hard
I would believe him.

Good Luck,
Tricia

Grant Ingersoll wrote:
Hmmm, if I understand your question correctly, I think Lucene's
payloads are what you are after.

Lucene does support Payloads (i.e. per term storage in the index. See
the BoostingTermQuery in Lucene and the Token class setPayload()
method). However, this doesn't do much for you in Solr as of yet
without some work on your own. I think Tricia Williams has been
working on payloads and Solr, but I don't know that anything has been
posted. The tricky part, I believe, is how to handle indexing,
integrating the BoostingTermQuery isn't all that hard, I don't
think. Also note, there isn't anything in Solr preventing the use of
payloads, but there probably is a decent amount to do to hook them in.

HTH,
Grant

On Jun 5, 2008, at 4:52 PM, Andreas von Hessling wrote:

Hi there!
As a Solr newbie who has however worked with Lucene before, I have an
unusual question for the experts:

Question:

Can I, and if so, how do I perform index-time term boosting in
documents where each boost-value is not the same for all documents
(no global boosting of a given term) but instead can be
per-document? In other words: I understand there's a way to specify
term boost values for search queries, but is that also possible for
indexed documents?

Here's what I'm fundamentally trying to do:

I want to index and search over documents that have a special,
associative-array-like property:
Each document has a list of unique words, and each word has a numeric
value between 0 and 1. These values express similarity in the
dimensions with this word/name. For example "cat": 0.99 is similar
to "cat: 0.98", but not to "cat": 0.21. All documents have the same
set of words, and there are lots of them: about 1 million. (If
necessary, I can reduce the number of words to tens of thousands,
but then the documents would not share the same set of words any
more). Most of the word values for a typical document are 0.00.

Example:
Documents in the index:
d1:
cat: 0.99
dog: 0.42
car: 0.00

d2:
cat: 0.02
dog: 0.00
car: 0.00

Incoming search query (with these numeric term-boosts):
q:
cat: 0.99
dog: 0.11
car: 0.00 (not specified in query)

The ideal result would be that q matches d1 much more than d2.

Here's my analysis of my situation and potential solutions:

- because I have so many words, I cannot use a separate field for
each word, this would overload Solr/Lucene. This is unfortunate,
because I know there is index-time boosting on a per-field basis
(reference:
http://wiki.apache.org/solr/SolrRelevancyFAQ#head-d846ae0059c4e6b7f0d0bb2547ac336a8f18ac2f),
and because I could have used Function Queries (reference:
http://wiki.apache.org/solr/FunctionQuery).
- As a (stupid) workaround, I could convert my documents to into pure
text: the numeric values would be translated from e.g. "cat": 0.99 to
repeat the word "cat" 99 times. This would be done for a particular
document for all words and the text would be then used for regular
scoring in Solr. This approach seems doable, but inefficient and far
from elegant.

Am I reinventing the wheel here or is what I'm trying to do something
fundamentally different than what Solr and Lucene has to offer?

Any comments highly appreciated. What can I do about this?

Thanks,

Andreas

--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Release date of SOLR 1.3

2008-06-06 Thread Martin Owens

Sounds like you need a feature freeze branch.

As for legal issues about letting people know about unofficial builds...
it's open source right, so the waiver at the top is pretty explicit.

On Thu, 2008-06-05 at 16:47 -0700, Ryan Grange wrote:
> It would be nice to see some kind of update to the Solr website 
> regarding what's holding up a 1.3 release.  I look at that a lot more 
> often than I look at this mailing list to see whether or not there's a 
> new version I should be looking to test out.
> 
> Ryan Grange, IT Manager
> DollarDays International, LLC
> [EMAIL PROTECTED]
> 480-922-8155 x106
> 
> 
> 
> Noble Paul ??? ?? wrote:
> > If a feature that is  really big (say distributed search) is half
> > baked and not ready for primetime, we must hold on the release till it
> > is completely fixed. That is not to say that every possible
> > enhancements to that feature must be incorporated before we can do a
> > release. If the new changes are not going to break the existing system
> > we can go ahead.
> >
> > A faster release cycle can drive the adoption of a lot of new features
> > because users are not very confident of nightly builds and they tend
> > to stick with the latest realease available. SolrJ is a very good
> > example. So many users still have their own sweet client libraries in
> > production because they think SolrJ is yet in development and there is
> > no release.
> >
> > --Noble
> >
> > On Wed, May 21, 2008 at 11:46 PM, Chris Hostetter
> > <[EMAIL PROTECTED]> wrote:
> >   
> >> : One year between releases is a very long time for such a useful and
> >> : dynamic system.  Are project leaders willing to (re)consider the
> >> : development process to prioritize improvements/features scope into
> >> : chunks that can be accomplished in shorter time frames - say 90 days?
> >> : In my experience, short dev iteration cycles that fix time and vary
> >> : scope produce better results from all perspectives.
> >>
> >> I'm all in favor of shorter release cycles ... but not everything can be
> >> broken down into chunks that can be implmeneted in a small time frame, and
> >> even if they can, you don't always know that the solution to "chunk1" is
> >> leading down the right path.  Solr 9and hte Lucene community as a whole)
> >> has a long history and deep "cultural" believe in aggressive backwards
> >> compatibility .. there is a lot of resistence to the idea of a release
> >> that includes the first "chunk" of a larger feature without a strong
> >> confidence that the API provided by that chunk is something people are
> >> willing to maintain for a long time.
> >>
> >> At the ned of the day, hat gets people motivated to do a release is
> >> discussions on solr-dev where someone says: "i think we need ot have a
> >> rlease, and i'm willing to be the release manager.  i think we should hold
> >> of on committing patches X,Y, and Z because they don't seem ready for
> >> prime time yet, and i think we should move forward on trying to commit
> >> patches A, B, and C because they seem close to done.  what does everybody
> >> else think?"
> >>
> >>
> >>
> >>
> >> -Hoss
> >>
> >>
> >> 
> >
> >
> >

Re: scaling / sharding questions

2008-06-06 Thread Otis Gospodnetic

Hola,

That's a pretty big an open question, but here is some info.

Jeremy's sharding approach sounds OK.  We did something similar at Technorati, 
where a document/blog timestamp was the main sharding factor.  You can't really 
move individual docs without reindexing (i.e. delete docX from shard1 and index 
docX to shard2), unless all your fields are stored, which you will not want to 
do with data volumes that you are describing.


As for how much can be handled by a single machine, this is a FAQ and we really 
need to put it on Lucene/Solr FAQ wiki page if it's not there already.  The 
answer is this depends on many factors (size of index, # of concurrent 
searches, complexity of queries, number of searchers, type of disk, amount of 
RAM, cache settings, # of CPUs...)

The questions are right, it's just that there is no single non-generic answer.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Marcus Herou <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org; [EMAIL PROTECTED]
> Sent: Friday, June 6, 2008 9:14:10 AM
> Subject: Re: scaling / sharding questions
> 
> Cool sharding technique.
> 
> We as well are thinking of howto "move" docs from one index to another
> because we need to re-balance the docs when we add new nodes to the cluster.
> We do only store id's in the index otherwise we could have moved stuff
> around with IndexReader.document(x) or so. Luke (http://www.getopt.org/luke/)
> is able to reconstruct the indexed Document data so it should be doable.
> However I'm thinking of actually just delete the docs from the old index and
> add new Documents to the new node. It would be cool to not waste cpu cycles
> by reindexing already indexed stuff but...
> 
> And we as well will have data amounts in the range you are talking about. We
> perhaps could share ideas ?
> 
> How do you plan to store where each document is located ? I mean you
> probably need to store info about the Document and it's location somewhere
> perhaps in a clustered DB ? We will probably go for HBase for this.
> 
> I think the number of documents is less important than the actual data size
> (just speculating). We currently search 10M (will get much much larger)
> indexed blog entries on one machine where the JVM has 1G heap, the index
> size is 3G and response times are still quite fast. This is a readonly node
> though and is updated every morning with a freshly optimized index. Someone
> told me that you probably need twice the RAM if you plan to both index and
> search at the same time. If I were you I would just test to index X entries
> of your data and then start to search in the index with lower JVM settings
> each round and when response times get too slow or you hit OOE then you get
> a rough estimate of the bare minimum X RAM needed for Y entries.
> 
> I think we will do with something like 2G per 50M docs but I will need to
> test it out.
> 
> If you get an answer in this matter please let me know.
> 
> Kindly
> 
> //Marcus
> 
> 
> On Fri, Jun 6, 2008 at 7:21 AM, Jeremy Hinegardner 
> wrote:
> 
> > Hi all,
> >
> > This may be a bit rambling, but let see how it goes.  I'm not a Lucene or
> > Solr
> > guru by any means, I have been prototyping with solr and understanding how
> > all
> > the pieces and parts fit together.
> >
> > We are migrating our current document storage infrastructure to a decent
> > sized
> > solr cluster, using 1.3-snapshots right now.  Eventually this will be in
> > the
> > billion+ documents, with about 1M new documents added per day.
> >
> > Our main sticking point right now is that a significant number of our
> > documents
> > will be updated, at least once, but possibly more than once.  The
> > volatility of
> > a document decreases over time.
> >
> > With this in mind, we've been considering using a cascading series of shard
> > clusters.  That is :
> >
> >  1) a cluster of shards holding recent data ( most recent week or two )
> > smaller
> >indexes that take a small amount of time to commit updates and optimise,
> >since this will hold the most volatile documents.
> >
> >  2) Following that another cluster of shards that holds some relatively
> > recent
> >( 3-6 months ? ), but not super volatile, documents, these are items
> > that
> >could potentially receive updates, but generally not.
> >
> >  3) A final set of 'archive' shards holding the final resting place for
> >documents.  These would not receive updates.  These would be online for
> >searching and analysis "forever".
> >
> > We are not sure if this is the best way to go, but it is the approach we
> > are
> > leaning toward right now.  I would like some feedback from the folks here
> > if you
> > think that is a reasonable approach.
> >
> > One of the other things I'm wondering about is how to manipulate indexes
> > We'll need to roll documents around between indexes over time, or at least
> > migrate indexes from one set of shards to another a

Re: Analytics e.g. "Top 10 searches"

2008-06-06 Thread Matthew Runo

I'm nearly certain that everyone who maintains these stats does it  
themselves in their 'front end'. It's very easy to log terms and  
whatever else just before or after sending the query off to Solr.


Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Jun 6, 2008, at 3:51 AM, McBride, John wrote:



Hello,

Is anybody familiar with any SOLR-based analytical tools which would
allow us to extract "top ten seaches", for example.

I imagine at the query parse level, where the query is tokenized and
filtered would be the best place to log this, due to the many
permutations possible at the user input level.

Is there an existing plugin to do this, or could you suggest how to
architect this?

Thanks,
John

Re: solr query syntax

2008-06-06 Thread Otis Gospodnetic

Hi Cam,

Unless I'm misunderstanding your question, you should be able to do that pretty 
just like you typed it.
If you enter that on Solr /admin page, do you get an error or do you not get 
the desired results?  If you add &debugQuery=true to the URL, you will see what 
query is getting executed.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Cam Bazz <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, June 5, 2008 4:14:30 PM
> Subject: solr query syntax
> 
> Hello,
> 
> how can we specify in query so it will just bring certain field and query in
> the default field?
> 
> for example can I do a
> 
> year:1998 AND searchword
> 
> Best Regards,
> -C.B.

Re: An unusual question for the experts -- term boosting for individual documents?

2008-06-06 Thread Andreas von Hessling

Thanks to both of you.

I understand from your replies that setting the payloads for terms
(per-document) is easy and the BoostingTermQuery can be used to set
payloads on the query side. Getting this to work in Solr would require
significant work though. I wish I had the time to do that, but for my
purposes I'll go with the suboptimal workaround of repeating words.

But let me emphasize how great payloads in Solr would be: It would
open up many new options (as Grant mentions in
http://lucene.grantingersoll.com/2007/03/18/payloads/ ). In particular,
it would allow Solr to not just search over mere text documents to
search over any object that can be described numerical feature values!
Similar to a general-purpose classifier. That is, indexed items become
training examples, each of which is described by a set of features
(words) and their corresponding numerical value. Queries can then be
seen as testing examples, who query the knowledge base in a case-based
reasoning (CBR) manner. Solr would be an extremely scalable classifier
with easy setup, convenient interfaces and simple ways to change the
classification function (change the similarity/ranking function). From
an AI perspective, this would be huge!

Just a thought.

Andreas

Tricia Williams wrote:
Payloads could be the answer but I don't think there is any cross over
into what I've been working on with Payloads
(https://issues.apache.org/jira/browse/SOLR-380 has what I last posted
which is pretty much what we're using now. I've also posted related
SOLR-532 and SOLR-522).

cat:0.99 dog:0.42 car:0.00
you could write a TokenFilter which builds on the WhitespaceTokenizer
to break each token on ":" using the first part as the token value and
the second part as the token's payload. I think the APIs are pretty
clear if you are looking for help.

I haven't looked at all at how you can query/boost using payloads, but
if Grant says that integrating the BoostingTermQuery isn't all that
hard I would believe him.

Good Luck,
Tricia

Grant Ingersoll wrote:
Hmmm, if I understand your question correctly, I think Lucene's
payloads are what you are after.

Lucene does support Payloads (i.e. per term storage in the index.
See the BoostingTermQuery in Lucene and the Token class setPayload()
method). However, this doesn't do much for you in Solr as of yet
without some work on your own. I think Tricia Williams has been
working on payloads and Solr, but I don't know that anything has been
posted. The tricky part, I believe, is how to handle indexing,
integrating the BoostingTermQuery isn't all that hard, I don't
think. Also note, there isn't anything in Solr preventing the use
of payloads, but there probably is a decent amount to do to hook them
in.

HTH,
Grant

On Jun 5, 2008, at 4:52 PM, Andreas von Hessling wrote:

Hi there!
As a Solr newbie who has however worked with Lucene before, I have
an unusual question for the experts:

Question:

Can I, and if so, how do I perform index-time term boosting in
documents where each boost-value is not the same for all documents
(no global boosting of a given term) but instead can be
per-document? In other words: I understand there's a way to
specify term boost values for search queries, but is that also
possible for indexed documents?

Here's what I'm fundamentally trying to do:

I want to index and search over documents that have a special,
associative-array-like property:
Each document has a list of unique words, and each word has a
numeric value between 0 and 1. These values express similarity in
the dimensions with this word/name. For example "cat": 0.99 is
similar to "cat: 0.98", but not to "cat": 0.21. All documents have
the same set of words, and there are lots of them: about 1 million.
(If necessary, I can reduce the number of words to tens of
thousands, but then the documents would not share the same set of
words any more). Most of the word values for a typical document are
0.00.

Example:
Documents in the index:
d1:
cat: 0.99
dog: 0.42
car: 0.00

d2:
cat: 0.02
dog: 0.00
car: 0.00

Incoming search query (with these numeric term-boosts):
q:
cat: 0.99
dog: 0.11
car: 0.00 (not specified in query)

The ideal result would be that q matches d1 much more than d2.

Here's my analysis of my situation and potential solutions:

Re: highlighting fragment

2008-06-06 Thread Mike Klaas


On 5-Jun-08, at 8:31 PM, Kevin Xiao wrote:


Hi,

I have a question about highlighting fragment. I set hl.fragsize to  
100, but the return is cut off from a middle of a sentence with  
correct search term highlighting though. Is there a way to make the  
cutoff to the beginning of a sentence? Set some flag? How does  
highlighting cutoff work anyway?


It chops up the input text every hl.fragsize tokens, without regard to  
punctuation.


For example:
Solr returns: in the middle of a sentence
What I want: We are in the middle of a sentence


The RegexFragmenter (development branch/1.3) can achieve results  
similar to this.  You give it a regular expression to match fragments  
to, and a "slop" (factor by which hl.fragsize can be exceeded to fit  
the regex).   The example config shows an example for matching  
sentences.


-Mike

Re: Release date of SOLR 1.3

2008-06-06 Thread Mike Klaas

We're basically in that state already for the trunk.  I don't think  
that we need a separate branch unless there is a big movement toward  
starting a new big non-1.3 feature before 1.3 is released.  If that  
happens, we'll see what needs to be done to keep development going.


-Mike

On 6-Jun-08, at 7:25 AM, Martin Owens wrote:


Sounds like you need a feature freeze branch.

As for legal issues about letting people know about unofficial  
builds...

it's open source right, so the waiver at the top is pretty explicit.

On Thu, 2008-06-05 at 16:47 -0700, Ryan Grange wrote:

It would be nice to see some kind of update to the Solr website
regarding what's holding up a 1.3 release.  I look at that a lot more
often than I look at this mailing list to see whether or not  
there's a

new version I should be looking to test out.

Ryan Grange, IT Manager
DollarDays International, LLC
[EMAIL PROTECTED]
480-922-8155 x106



Noble Paul ??? ?? wrote:

If a feature that is  really big (say distributed search) is half
baked and not ready for primetime, we must hold on the release  
till it

is completely fixed. That is not to say that every possible
enhancements to that feature must be incorporated before we can do a
release. If the new changes are not going to break the existing  
system

we can go ahead.

A faster release cycle can drive the adoption of a lot of new  
features

because users are not very confident of nightly builds and they tend
to stick with the latest realease available. SolrJ is a very good
example. So many users still have their own sweet client libraries  
in
production because they think SolrJ is yet in development and  
there is

no release.

--Noble

On Wed, May 21, 2008 at 11:46 PM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:

: One year between releases is a very long time for such a useful  
and

: dynamic system.  Are project leaders willing to (re)consider the
: development process to prioritize improvements/features scope  
into
: chunks that can be accomplished in shorter time frames - say 90  
days?
: In my experience, short dev iteration cycles that fix time and  
vary

: scope produce better results from all perspectives.

I'm all in favor of shorter release cycles ... but not everything  
can be
broken down into chunks that can be implmeneted in a small time  
frame, and
even if they can, you don't always know that the solution to  
"chunk1" is
leading down the right path.  Solr 9and hte Lucene community as a  
whole)
has a long history and deep "cultural" believe in aggressive  
backwards
compatibility .. there is a lot of resistence to the idea of a  
release
that includes the first "chunk" of a larger feature without a  
strong
confidence that the API provided by that chunk is something  
people are

willing to maintain for a long time.

At the ned of the day, hat gets people motivated to do a release is
discussions on solr-dev where someone says: "i think we need ot  
have a
rlease, and i'm willing to be the release manager.  i think we  
should hold
of on committing patches X,Y, and Z because they don't seem ready  
for
prime time yet, and i think we should move forward on trying to  
commit
patches A, B, and C because they seem close to done.  what does  
everybody

else think?"




-Hoss

Re: boost ignored with wildcard queries

2008-06-06 Thread David Smiley @MITRE.org


Curious... Why is ConstantScoreQuery only applied to prefix queries?  Your
rationale suggests that it is also applicable wildcard query and fuzzy query
too (basically any place an analyzer isn't used).

~ David Smiley


Yonik Seeley wrote:
> 
> On Tue, Feb 26, 2008 at 7:23 PM, Head <[EMAIL PROTECTED]> wrote:
>>
>>  Using the StandardRequestHandler, it appears that the index boost values
>> are
>>  ignored when the query has a wildcard in it.   For example, if I have 2
>>  's and one has a boost of 1.0 and another has a boost of 10.0, then
>> I
>>  do a search for "bob*", both records will be returned with the same
>> score of
>>  1.0.   If I just do a normal search then the  that has the higher
>> boost
>>  has the higher score as expected.
>>
>>  Is this a bug?
> 
> A feature :-)
> Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to
> avoid getting exceptions from too many terms.
> 
> -Yonik
> 
> 
>>  ~Tom
>>
>>  p.s. Here's what my debug looks like:
>>
>>  
>>  1.0 = (MATCH)
>>  ConstantScoreQuery([EMAIL PROTECTED]), product
>> of:
>>   1.0 = boost
>>   1.0 = queryNorm
>>  
>>  
>>  1.0 = (MATCH)
>>  ConstantScoreQuery([EMAIL PROTECTED]), product
>> of:
>>   1.0 = boost
>>   1.0 = queryNorm
>>  
>>  --
>>  View this message in context:
>> http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p15703334.html
>>  Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p17701306.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: boost ignored with wildcard queries

2008-06-06 Thread Yonik Seeley

On Fri, Jun 6, 2008 at 5:16 PM, David Smiley @MITRE.org
<[EMAIL PROTECTED]> wrote:
> Curious... Why is ConstantScoreQuery only applied to prefix queries?  Your
> rationale suggests that it is also applicable wildcard query and fuzzy query
> too (basically any place an analyzer isn't used).

I think fuzzy queries may have been fixed in lucene to not exceed the
boolean query clause limit.
WildCard queries: no good reason... didn't really need it, so I never
got around to it :-)

-Yonik

> ~ David Smiley
>
>
> Yonik Seeley wrote:
>>
>> On Tue, Feb 26, 2008 at 7:23 PM, Head <[EMAIL PROTECTED]> wrote:
>>>
>>>  Using the StandardRequestHandler, it appears that the index boost values
>>> are
>>>  ignored when the query has a wildcard in it.   For example, if I have 2
>>>  's and one has a boost of 1.0 and another has a boost of 10.0, then
>>> I
>>>  do a search for "bob*", both records will be returned with the same
>>> score of
>>>  1.0.   If I just do a normal search then the  that has the higher
>>> boost
>>>  has the higher score as expected.
>>>
>>>  Is this a bug?
>>
>> A feature :-)
>> Solr uses ConstantScoreRangeQuery and ConstantScorePrefixQuery to
>> avoid getting exceptions from too many terms.
>>
>> -Yonik
>>
>>
>>>  ~Tom
>>>
>>>  p.s. Here's what my debug looks like:
>>>
>>>  
>>>  1.0 = (MATCH)
>>>  ConstantScoreQuery([EMAIL PROTECTED]), product
>>> of:
>>>   1.0 = boost
>>>   1.0 = queryNorm
>>>  
>>>  
>>>  1.0 = (MATCH)
>>>  ConstantScoreQuery([EMAIL PROTECTED]), product
>>> of:
>>>   1.0 = boost
>>>   1.0 = queryNorm
>>>  
>>>  --
>>>  View this message in context:
>>> http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p15703334.html
>>>  Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/boost-ignored-with-wildcard-queries-tp15703334p17701306.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Solr 1.3 - leading wildcard query feature?

Analytics e.g. "Top 10 searches"

Re: scaling / sharding questions

Re: An unusual question for the experts -- term boosting for individual documents?

Re: Release date of SOLR 1.3

Re: scaling / sharding questions

Re: Analytics e.g. "Top 10 searches"

Re: solr query syntax

Re: An unusual question for the experts -- term boosting for individual documents?

Re: highlighting fragment

Re: Release date of SOLR 1.3

Re: boost ignored with wildcard queries

Re: boost ignored with wildcard queries

13 matches

Site Navigation

Mail list logo

Footer information