Re: How to read values of a field efficiently

2007-07-31 Thread Martin Grotzke
On Mon, 2007-07-30 at 00:30 -0700, Chris Hostetter wrote:
> : Is it possible to get the values from the ValueSource (or from
> : getFieldCacheCounts) sorted by its natural order (from lowest to
> : highest values)?
> 
> well, an inverted term index is already a data structure listing terms
> from lowest to highest and the associated documents -- so if you want to
> iterate from low to high between a range and find matching docs you should
> just use hte TermEnum -- the whole point of the FieldCache (and
> FieldCacheSource) is to have a "reverse inverted index" so you can quickly
> fetch the indexed value if you know the docId.
Ok, I will have a look at the TermEnum and try this.

> 
> perhaps you should elaborate a little more on what it is you are trying to
> do so we can help you figure out how to do it more efficinelty ...
I want to read all values of the price field of the found docs,
and calculate the mean value and the standard deviation.
Based on the min value (mean - deviation, the max value (mean +
deviation) and the number of prices I calculate price ranges.

Then I iterate over the sorted array of prices and count how many
prices go into the current range.

This sorting (Arrays.sort) takes much time, that's why I asked if
it's possible to read values in sorted order.

But reading this, I think it would also be possible to skip sorting and
check for each price into which bucket it would go and increment the
counter for this bucket - this should also be a possibility for
optimization.

> ... perhaps you shouldn't be iterating over every doc to figure out your
> ranges .. perhaps you can iterate over the terms themselves?
Are you referring to TermEnum with this?

Thanx && cheers,
Martin


> 
> 
> hang on ... rereading your first message i just noticed something i
> definitely didn't spot before...
> 
> >> Fairly long: getFieldCacheCounts for the cat field takes ~70 ms
> >> for the second request, while reading prices takes ~600 ms.
> 
> ...i clearly missed this, and fixated on your assertion that your reading
> of field values took longer then the stock methods -- but you're not just
> comparing the time needed byu different methods, you're also timing
> different fields.
> 
> this actually makes a lot of sense since there are probably a lot fewer
> unique values for the cat field, so there are a lot fewer discrete values
> to deal with when computing counts.
> 
> 
> 
> 
> -Hoss
> 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/


signature.asc
Description: This is a digitally signed message part


Re: MoreLikeThis handler and field collapsing.

2007-07-31 Thread Pieter Berkel
What exactly are you trying to achieve by using the MoreLikeThis handler?  I
created a patch that adds MoreLikeThis functionality (available in the
Standard request handler) to the Dismax handler in
http://issues.apache.org/jira/browse/SOLR-295
 which may be of interest
(although unfortunately not quite the same as what you requested).

As far as I'm aware, there is no real need for MoreLikeThis to be a
standalone request handler in its own right (and to be honest the current
implemetation feels a bit clumsy), rather it should be incorporated as a
"plugin" or search component in the Standard and Dismax handlers (like the
way Facets, Highlighting and Collapsing currently are implemented), which is
what Ryan is trying to achieve with SOLR-281.  I'm hoping the "search
component" idea will gain traction and support soon as I'd really like so
see the Dismax request handler support MoreLikeThis functionality soon! (but
I digress...)

cheers,
Piete



On 31/07/07, Nuno Leitao <[EMAIL PROTECTED]> wrote:
>
> I will take a stab at patching the MoreLikeThis handler - but given
> that I have never touched a single line of Solr code this might fail
> miserably :)
>
> Maybe there is a kind soul which could provide a new patch for
> SOLR-236 which includes field collapse with MLT ?
>
> On 30/07/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> > Nuno Leitao wrote:
> > > Hi,
> > >
> > > I have a 1.3 Solr with the field collapsing patch (SOLR-236 -
> > > http://issues.apache.org/jira/browse/SOLR-236).
> > >
> > > Collapsing works great, but only using the dismax and standard query
> > > handler - I haven't managed to get it to work using the MoreLikeThis
> > > handler though - I am going for a simplistic approach where I just run
> a
> > > query such as:
> > >
> > > /mlt?start=0&rows=3&collapse.field=collapsefield&collapse.type=normal
> > >
> > > Looking at the SOLR-236 patch it seems the field collapsing has only
> > > been patched into the StandardRequestHandler and the
> > > DisMaxRequestHandler, which would explain why this fails to work
> > > completely, but perhaps someone has found another way ?
> > >
> >
> > I have not tried, but field collapsing should be able to work with the
> > MoreLikeThis handler -- but it is not part of the patch.
> >
> > Given that we keep trying to add more widgets to the search chain, there
> > has been talk of "search component" based handler that can easily share
> > this sort of functionality.
> >
> > check:
> > https://issues.apache.org/jira/browse/SOLR-281
> >
> http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274
> >
> > SOLR-281 is now just a quick/dirty brainstorm, but I think it is the
> > likely direction for how field collapsing will be integrated.
> >
> > In short, if you need something to work quickly: apply the same pattern
> > from DisMax and Standard to the MoreLikeThis handler.  If you have more
> > time (and interest) it would be great to add these features to SOLR-281.
>
> >
> >
> > ryan
> >
> >
> >
>


Re: mandatory and optional fields in the dismaxrequesthandler

2007-07-31 Thread [EMAIL PROTECTED]

Chris Hostetter wrote:

: Is it possible to specify precisely one or more mandatory fields in a
: DismaxRequestHandler?

what would the semantics making a field mandatory mean?  considering your
specific example...

:  
: text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
:  
:  

:  
: text^0.5 features^1.0 name^1.2 manu^1.1 
:  





if the q param is:  albino elephant  ... what would it mean that text and
feature are mandatory?  do both words have to appear in text and in
feature, or just one in each?


  
What I wanted is that  'albino' 'elephant' words have to appear in both 
'text' and  'features' fields and  optionally in 'name' and 'manu' fields.


Arno


Fwd: Solr 1.3 HTTP server stops responding

2007-07-31 Thread michael ravits
hello again,

This happend when I was updating the index.
This time I saved the log and there appears this error number of times:

SEVERE: Error during auto-warming of key:[EMAIL 
PROTECTED]:java.lang.OutOfMemoryError: Java heap

I have increased Xmx to 850mb.
But what else can I do?

thank you for your help

michael ravits <[EMAIL PROTECTED]> wrote: Date: Tue, 31 Jul 2007 01:34:36 -0700 
(PDT)
From: michael ravits <[EMAIL PROTECTED]>
Subject: Solr 1.3 HTTP server stops responding
To: solr-user@lucene.apache.org

 hello solrs,

I am facing a similar problem like Kevin Holmes described in a recent thread.

 I have created a thread dump, maybe this can help trace the problem?
 I am attaching the zipped dump to this email.
 
I am using Solr 1.3 with the solr236 patch on win2003/2gb machine
with Xms=512m and Xmx=512m. I didn't save the console output to a file so I 
can't tell whether there were PERFORMACE or MEMORY exceptions, but next time 
I'll have the Log.

thanks for your help


-
Choose the right car based on your needs.   Check out Yahoo! Autos new Car 
Finder tool. 

   
-
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us.

Re: Please help! Solr 1.1 HTTP server stops responding

2007-07-31 Thread Jeff Rodenburg
Not sure if this would help you, but we encountered java heap OOM issues
with 1.1 earlier this year.  We patched solr with the latest bits at the
time, which included a lucene memory fix for java heap OOM issues.  (
http://issues.apache.org/jira/browse/LUCENE-754)

Different servlet container (Tomcat 5.5) and we're running JRE 5 v9.

After applying the update to the solr bits that included the patch mentioned
above, OOM has never re-appeared.

-- j

On 7/30/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
>
> On 30-Jul-07, at 11:35 AM, David Whalen wrote:
>
> > Hi Yonik!
> >
> > I'm glad to finally get to talk to you.  We're all very impressed
> > with solr and when it's running it's really great.
> >
> > We increased the heap size to 1500M and that didn't seem to help.
> > In fact, the crashes seem to occur more now than ever.  We're
> > constantly restarting solr just to get a response.
>
> How much memory is on the system, and is anything else running?  How
> large is the resulting index?
>
> If you're willing for some queries to take longer after a commit,
> reducing/eliminating the autoWarmCount for your queryCache and
> facetCache should decrease the peak memory usage (as Solr as two
> copies of the cache open at that point).  Setting it to zero could up
> the halve the peak memory usage (at the cost of loss of performance
> after commits).
>
> As yonik suggested, check for PERFORMANCE warnings too--you may have
> more than two Searchers open at once.
>
> -Mike
>
>
>


Highlighting question

2007-07-31 Thread Daniel Alheiros
Hi

I've started using highlighting and there is something that I consider a bit
odd... It may be caused by the way I'm indexing or querying I'm sure, but
just to avoid doing a huge number of tests...

I'm querying for "butter" and only exact matches of butter are returning
highlighted, when I change my query to "butters" it returns both "butter"
and "butters" highlighted. Is it something that considers the word and it's
reductions but not match a word that contains the word in the query?

Thanks again,
Daniel


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: Highlighting question

2007-07-31 Thread Mike Klaas


On 31-Jul-07, at 9:41 AM, Daniel Alheiros wrote:


Hi

I've started using highlighting and there is something that I  
consider a bit
odd... It may be caused by the way I'm indexing or querying I'm  
sure, but

just to avoid doing a huge number of tests...

I'm querying for "butter" and only exact matches of butter are  
returning
highlighted, when I change my query to "butters" it returns both  
"butter"
and "butters" highlighted. Is it something that considers the word  
and it's

reductions but not match a word that contains the word in the query?


This is because the example Solr distribution is configured to do  
stemming (see the definition for "text" fieldtype in schema.xml).


Remove PorterStemmerFilterFactory to do exact(er) searching/ 
highlighting only.


-Mike


Recipe: multiple webapps in Jetty 6

2007-07-31 Thread Matt Kangas
For anyone who's been watching SOLR-215 ("Multiple Solr Cores"), or  
otherwise has wanted to run multiple Solr instances in a single Jetty  
instance... I've posted a new, improved recipe to http:// 
wiki.apache.org/solr/SolrJetty (scroll to bottom)


I've also attached a tarball with a known-good config for Solr 1.2.0  
& Jetty 6.1.3. It should be straightforward to define as many webapps  
as you need with this recipe.


Note: I'm pretty sure there is an even cleaner way to accomplish this  
too, without the need to fetch additional .jars and messing with  
JNDI, but I haven't fleshed out the details yet... will update the  
wiki if I get it working. :)


Cheers,
--Matt

--
Matt Kangas / [EMAIL PROTECTED]




why store field will be analyzed?

2007-07-31 Thread James liu
fieldset "topic" indexed='false' and stored='true'

i don't know why it will be analyzed?

now i wanna it only store not analyzed,,,how can i do?


-- 
regards
jl