reusing docset to limit new query

2008-04-16 Thread Britske

I'm creating a custom handler where I have a base query and a resulting
doclistandset. 
I need to do some extra queries to get top-results per facet. There are 2
cases: 

1. the sorting used for the top-results for a particular facet is the same
as the sorting used for the already returned doclistandset. This means that
I can return a docslice of the doclist (contained in the doclistandset)  
after doing some intersections. This is quick and works well.

2.  The sorting is different. In this case I need to do the query again (I
think, please let me know if there's a better option), by using
SolrIndexSearcher.getDocList(...). 

I'm looking for a way to tell the SolrIndexSearcher that it can limit it's
query (including sorting) to the docset that I got by 1. (orginal docset +
some intersections), because I figured it must be quicker (is it? )

I've found a method SolrIndexSearcher.cacheDocSet(..) but am not entirely
sure what it does (sideeffects? )

Can someone please elaborate on this? 

Britske 
-- 
View this message in context: 
http://www.nabble.com/reusing-docset-to-limit-new-query-tp16721670p16721670.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: too many queries?

2008-04-16 Thread Jonathan Ariel
So I counted the number if distinct values that I have for each field that I
want a facet on. In total it's around 100,000. I tried with a filterCache
of 120,000 but it seems like too much because the server went down. I will
try with less, around 75,000 and let you know.

How do you to partition the data to a static set and a dynamic set, and then
combining them at query time? Do you have a link to read about that?



On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:

> On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
>
> > My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is 32
> > bits).
> > It is optimized twice a day, it takes around 15 minutes to optimize.
> > The index is updated (commits) every two minutes. There are between 10
> > and
> > 100 inserts/updates every 2 minutes.
> >
>
> Caching could help--you should definitely start there.
>
> The commit every 2 minutes could end up being an unsurmountable problem.
>  You may have to partition your data into a large, mostly static set and a
> small dynamic set, combining the results at query time.
>
> -Mike
>


Re: too many queries?

2008-04-16 Thread Sean Timm

Jonathan Ariel wrote:

How do you to partition the data to a static set and a dynamic set, and then
combining them at query time? Do you have a link to read about that?
  
One way would be distributed search (SOLR-303), but distributed idf is 
not part of the current patch anymore, so you may have some issue 
combining documents from the two sets as the collection statistics for 
the two are likely to be different.  It sounds like distributed idf may 
be added back in in the near future as there was some chatter about it 
again on the dev list.


-Sean


Re: too many queries?

2008-04-16 Thread Walter Underwood
A commit every two minutes means that the Solr caches are flushed
before they even start to stabilize. Two things to try:

* commit less often, 5 minutes or 10 minutes
* have enough RAM that your entire index can fit in OS file buffers

wunder

On 4/16/08 6:27 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:

> So I counted the number if distinct values that I have for each field that I
> want a facet on. In total it's around 100,000. I tried with a filterCache
> of 120,000 but it seems like too much because the server went down. I will
> try with less, around 75,000 and let you know.
> 
> How do you to partition the data to a static set and a dynamic set, and then
> combining them at query time? Do you have a link to read about that?
> 
> 
> 
> On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:
> 
>> On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
>> 
>>> My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is 32
>>> bits).
>>> It is optimized twice a day, it takes around 15 minutes to optimize.
>>> The index is updated (commits) every two minutes. There are between 10
>>> and
>>> 100 inserts/updates every 2 minutes.
>>> 
>> 
>> Caching could help--you should definitely start there.
>> 
>> The commit every 2 minutes could end up being an unsurmountable problem.
>>  You may have to partition your data into a large, mostly static set and a
>> small dynamic set, combining the results at query time.
>> 
>> -Mike
>> 



Re: too many queries?

2008-04-16 Thread Jonathan Ariel
In order to do that I have to change to a 64 bits OS so I can have more than
4 GB of RAM.Is there any way to see how long does it takes to Solr to warmup
the searcher?

On Wed, Apr 16, 2008 at 11:40 AM, Walter Underwood <[EMAIL PROTECTED]>
wrote:

> A commit every two minutes means that the Solr caches are flushed
> before they even start to stabilize. Two things to try:
>
> * commit less often, 5 minutes or 10 minutes
> * have enough RAM that your entire index can fit in OS file buffers
>
> wunder
>
> On 4/16/08 6:27 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
>
> > So I counted the number if distinct values that I have for each field
> that I
> > want a facet on. In total it's around 100,000. I tried with a
> filterCache
> > of 120,000 but it seems like too much because the server went down. I
> will
> > try with less, around 75,000 and let you know.
> >
> > How do you to partition the data to a static set and a dynamic set, and
> then
> > combining them at query time? Do you have a link to read about that?
> >
> >
> >
> > On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]>
> wrote:
> >
> >> On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
> >>
> >>> My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is 32
> >>> bits).
> >>> It is optimized twice a day, it takes around 15 minutes to optimize.
> >>> The index is updated (commits) every two minutes. There are between 10
> >>> and
> >>> 100 inserts/updates every 2 minutes.
> >>>
> >>
> >> Caching could help--you should definitely start there.
> >>
> >> The commit every 2 minutes could end up being an unsurmountable
> problem.
> >>  You may have to partition your data into a large, mostly static set
> and a
> >> small dynamic set, combining the results at query time.
> >>
> >> -Mike
> >>
>
>


Re: too many queries?

2008-04-16 Thread Jonathan Ariel
Is there anyway to know how much memory is being used in caches?

On Wed, Apr 16, 2008 at 11:50 AM, Jonathan Ariel <[EMAIL PROTECTED]> wrote:

> In order to do that I have to change to a 64 bits OS so I can have more
> than 4 GB of RAM.Is there any way to see how long does it takes to Solr to
> warmup the searcher?
>
>
> On Wed, Apr 16, 2008 at 11:40 AM, Walter Underwood <[EMAIL PROTECTED]>
> wrote:
>
> > A commit every two minutes means that the Solr caches are flushed
> > before they even start to stabilize. Two things to try:
> >
> > * commit less often, 5 minutes or 10 minutes
> > * have enough RAM that your entire index can fit in OS file buffers
> >
> > wunder
> >
> > On 4/16/08 6:27 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
> >
> > > So I counted the number if distinct values that I have for each field
> > that I
> > > want a facet on. In total it's around 100,000. I tried with a
> > filterCache
> > > of 120,000 but it seems like too much because the server went down. I
> > will
> > > try with less, around 75,000 and let you know.
> > >
> > > How do you to partition the data to a static set and a dynamic set,
> > and then
> > > combining them at query time? Do you have a link to read about that?
> > >
> > >
> > >
> > > On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]>
> > wrote:
> > >
> > >> On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
> > >>
> > >>> My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is
> > 32
> > >>> bits).
> > >>> It is optimized twice a day, it takes around 15 minutes to optimize.
> > >>> The index is updated (commits) every two minutes. There are between
> > 10
> > >>> and
> > >>> 100 inserts/updates every 2 minutes.
> > >>>
> > >>
> > >> Caching could help--you should definitely start there.
> > >>
> > >> The commit every 2 minutes could end up being an unsurmountable
> > problem.
> > >>  You may have to partition your data into a large, mostly static set
> > and a
> > >> small dynamic set, combining the results at query time.
> > >>
> > >> -Mike
> > >>
> >
> >
>


Re: too many queries?

2008-04-16 Thread Walter Underwood
Do it. 32-bit OS's went out of style five years ago in server-land.

I would start with 8GB of RAM. 4GB for your index, 2 for Solr, 1 for
the OS and 1 for other processes. That might be tight. 12GB would
be a lot better.

wunder

On 4/16/08 7:50 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:

> In order to do that I have to change to a 64 bits OS so I can have more than
> 4 GB of RAM.Is there any way to see how long does it takes to Solr to warmup
> the searcher?
> 
> On Wed, Apr 16, 2008 at 11:40 AM, Walter Underwood <[EMAIL PROTECTED]>
> wrote:
> 
>> A commit every two minutes means that the Solr caches are flushed
>> before they even start to stabilize. Two things to try:
>> 
>> * commit less often, 5 minutes or 10 minutes
>> * have enough RAM that your entire index can fit in OS file buffers
>> 
>> wunder
>> 
>> On 4/16/08 6:27 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
>> 
>>> So I counted the number if distinct values that I have for each field
>> that I
>>> want a facet on. In total it's around 100,000. I tried with a
>> filterCache
>>> of 120,000 but it seems like too much because the server went down. I
>> will
>>> try with less, around 75,000 and let you know.
>>> 
>>> How do you to partition the data to a static set and a dynamic set, and
>> then
>>> combining them at query time? Do you have a link to read about that?
>>> 
>>> 
>>> 
>>> On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]>
>> wrote:
>>> 
 On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
 
> My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is 32
> bits).
> It is optimized twice a day, it takes around 15 minutes to optimize.
> The index is updated (commits) every two minutes. There are between 10
> and
> 100 inserts/updates every 2 minutes.
> 
 
 Caching could help--you should definitely start there.
 
 The commit every 2 minutes could end up being an unsurmountable
>> problem.
  You may have to partition your data into a large, mostly static set
>> and a
 small dynamic set, combining the results at query time.
 
 -Mike
 
>> 
>> 



Re: Fuzzy queries in dismax specs?

2008-04-16 Thread Walter Underwood
It is working, but I disabled recursive field aliasing. Two questions:

* Is it possible to do recursive field aliasing from solrconfig.xml?
* If not, do we want to preserve this speculative feature?

I think the answers are "no" and "no", but I'd like a second opinion.

wunder

On 4/15/08 10:23 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:

> 
> : I've started implementing something to use fuzzy queries for selected fields
> : in dismax. The request handler spec looks like this:
> : 
> :exact~0.7^4.0 stemmed^2.0
> 
> that's a pretty cool idea ... usually when people talk about adding
> support for other querytypes in dismax they mean to the query sytnax, but
> you are adding more info to the qf to specify how hte field should be
> handled in general -- i like it.
> 
> i think if i had it to do over again (now that dismax supports multiple
> param values, and per field overrides) i would have made qf and pf
> multivalued params containing just the field names, and gotten the boost
> value from a per field overridable fieldBoost param, so adding a
> fuzzyDistance param would also be trivial 9without needing to parse crazy
> syntax)
> 
> (hmmm... ps could be a per field overridable field too ... dismax v2.0
> maybe)
> 
> 
> -Hoss




Re: too many queries?

2008-04-16 Thread oleg_gnatovskiy

Hello. I am having a similar problem as the OP. I see that you recommended
setting 4GB for the index, and 2 for Solr. How do I allocate memory for the
index? I was under the impression that Solr did not support a RAMIndex.


Walter Underwood wrote:
> 
> Do it. 32-bit OS's went out of style five years ago in server-land.
> 
> I would start with 8GB of RAM. 4GB for your index, 2 for Solr, 1 for
> the OS and 1 for other processes. That might be tight. 12GB would
> be a lot better.
> 
> wunder
> 
> On 4/16/08 7:50 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
> 
>> In order to do that I have to change to a 64 bits OS so I can have more
>> than
>> 4 GB of RAM.Is there any way to see how long does it takes to Solr to
>> warmup
>> the searcher?
>> 
>> On Wed, Apr 16, 2008 at 11:40 AM, Walter Underwood
>> <[EMAIL PROTECTED]>
>> wrote:
>> 
>>> A commit every two minutes means that the Solr caches are flushed
>>> before they even start to stabilize. Two things to try:
>>> 
>>> * commit less often, 5 minutes or 10 minutes
>>> * have enough RAM that your entire index can fit in OS file buffers
>>> 
>>> wunder
>>> 
>>> On 4/16/08 6:27 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
>>> 
 So I counted the number if distinct values that I have for each field
>>> that I
 want a facet on. In total it's around 100,000. I tried with a
>>> filterCache
 of 120,000 but it seems like too much because the server went down. I
>>> will
 try with less, around 75,000 and let you know.
 
 How do you to partition the data to a static set and a dynamic set, and
>>> then
 combining them at query time? Do you have a link to read about that?
 
 
 
 On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]>
>>> wrote:
 
> On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
> 
>> My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is
>> 32
>> bits).
>> It is optimized twice a day, it takes around 15 minutes to optimize.
>> The index is updated (commits) every two minutes. There are between
>> 10
>> and
>> 100 inserts/updates every 2 minutes.
>> 
> 
> Caching could help--you should definitely start there.
> 
> The commit every 2 minutes could end up being an unsurmountable
>>> problem.
>  You may have to partition your data into a large, mostly static set
>>> and a
> small dynamic set, combining the results at query time.
> 
> -Mike
> 
>>> 
>>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/too-many-queries--tp16690870p16727264.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: too many queries?

2008-04-16 Thread Walter Underwood
4GB for the operating system to use to buffer disk files.
That is not a Solr setting.

wunder

On 4/16/08 11:05 AM, "oleg_gnatovskiy" <[EMAIL PROTECTED]>
wrote:

> 
> Hello. I am having a similar problem as the OP. I see that you recommended
> setting 4GB for the index, and 2 for Solr. How do I allocate memory for the
> index? I was under the impression that Solr did not support a RAMIndex.
> 
> 
> Walter Underwood wrote:
>> 
>> Do it. 32-bit OS's went out of style five years ago in server-land.
>> 
>> I would start with 8GB of RAM. 4GB for your index, 2 for Solr, 1 for
>> the OS and 1 for other processes. That might be tight. 12GB would
>> be a lot better.
>> 
>> wunder
>> 
>> On 4/16/08 7:50 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
>> 
>>> In order to do that I have to change to a 64 bits OS so I can have more
>>> than
>>> 4 GB of RAM.Is there any way to see how long does it takes to Solr to
>>> warmup
>>> the searcher?
>>> 
>>> On Wed, Apr 16, 2008 at 11:40 AM, Walter Underwood
>>> <[EMAIL PROTECTED]>
>>> wrote:
>>> 
 A commit every two minutes means that the Solr caches are flushed
 before they even start to stabilize. Two things to try:
 
 * commit less often, 5 minutes or 10 minutes
 * have enough RAM that your entire index can fit in OS file buffers
 
 wunder
 
 On 4/16/08 6:27 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
 
> So I counted the number if distinct values that I have for each field
 that I
> want a facet on. In total it's around 100,000. I tried with a
 filterCache
> of 120,000 but it seems like too much because the server went down. I
 will
> try with less, around 75,000 and let you know.
> 
> How do you to partition the data to a static set and a dynamic set, and
 then
> combining them at query time? Do you have a link to read about that?
> 
> 
> 
> On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]>
 wrote:
> 
>> On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
>> 
>>> My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is
>>> 32
>>> bits).
>>> It is optimized twice a day, it takes around 15 minutes to optimize.
>>> The index is updated (commits) every two minutes. There are between
>>> 10
>>> and
>>> 100 inserts/updates every 2 minutes.
>>> 
>> 
>> Caching could help--you should definitely start there.
>> 
>> The commit every 2 minutes could end up being an unsurmountable
 problem.
>>  You may have to partition your data into a large, mostly static set
 and a
>> small dynamic set, combining the results at query time.
>> 
>> -Mike
>> 
 
 
>> 
>> 
>> 



Re: too many queries?

2008-04-16 Thread Otis Gospodnetic
Oleg, you can't explicitly say "N GB for index".  Wunder was just saying how 
much you can imagine how much RAM each piece might need and be happy with.
 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: oleg_gnatovskiy <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, April 16, 2008 2:05:23 PM
Subject: Re: too many queries?


Hello. I am having a similar problem as the OP. I see that you recommended
setting 4GB for the index, and 2 for Solr. How do I allocate memory for the
index? I was under the impression that Solr did not support a RAMIndex.


Walter Underwood wrote:
> 
> Do it. 32-bit OS's went out of style five years ago in server-land.
> 
> I would start with 8GB of RAM. 4GB for your index, 2 for Solr, 1 for
> the OS and 1 for other processes. That might be tight. 12GB would
> be a lot better.
> 
> wunder
> 
> On 4/16/08 7:50 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
> 
>> In order to do that I have to change to a 64 bits OS so I can have more
>> than
>> 4 GB of RAM.Is there any way to see how long does it takes to Solr to
>> warmup
>> the searcher?
>> 
>> On Wed, Apr 16, 2008 at 11:40 AM, Walter Underwood
>> <[EMAIL PROTECTED]>
>> wrote:
>> 
>>> A commit every two minutes means that the Solr caches are flushed
>>> before they even start to stabilize. Two things to try:
>>> 
>>> * commit less often, 5 minutes or 10 minutes
>>> * have enough RAM that your entire index can fit in OS file buffers
>>> 
>>> wunder
>>> 
>>> On 4/16/08 6:27 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
>>> 
 So I counted the number if distinct values that I have for each field
>>> that I
 want a facet on. In total it's around 100,000. I tried with a
>>> filterCache
 of 120,000 but it seems like too much because the server went down. I
>>> will
 try with less, around 75,000 and let you know.
 
 How do you to partition the data to a static set and a dynamic set, and
>>> then
 combining them at query time? Do you have a link to read about that?
 
 
 
 On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]>
>>> wrote:
 
> On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
> 
>> My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is
>> 32
>> bits).
>> It is optimized twice a day, it takes around 15 minutes to optimize.
>> The index is updated (commits) every two minutes. There are between
>> 10
>> and
>> 100 inserts/updates every 2 minutes.
>> 
> 
> Caching could help--you should definitely start there.
> 
> The commit every 2 minutes could end up being an unsurmountable
>>> problem.
>  You may have to partition your data into a large, mostly static set
>>> and a
> small dynamic set, combining the results at query time.
> 
> -Mike
> 
>>> 
>>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/too-many-queries--tp16690870p16727264.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: Searching for popular phrases or words

2008-04-16 Thread Edwin Koome
Thanks Chris.

I had in mind "Occurs in alot of documents". Please do
point me where i can pick up an example of using the
LukeRequestHandler and the "shingles based tokenizer".

Eric
--- Chris Hostetter <[EMAIL PROTECTED]> wrote:

> 
> it depends on your definition of "polular" if you
> mean "occurs in a lot of 
> documents" then take a look at the
> LukeRequestHandler ... if can give you 
> info on terms with high frequencies (and you can use
> a Shingles based 
> tokenizer to index "phrase" as terms
> 
> if by popular you mean "occurs in a lot of queries"
> there isn't anything 
> in Solr that keeps track of what people search for
> ... your application 
> would need to do that.
> 
> : How can i search for popular phrases or words with
> an
> : option to include only, for example, technical
> terms
> : e.g "Oracle database" rather than common english
> 
> You'll need a better definition of your goal to get
> any meaningful answer 
> to the "an option to include only, for example,
> technical terms" part of 
> that question ... the "for example" implies that
> there are other examples 
> ... how would you (as a human person) decide when to
> classify a phrase as 
> a "technical" phrase, vs an ... "other" phrase?  if
> you can't answer that 
> question, then neither can code.
> 
> 
> -Hoss
> 
> 



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


Re: Searching for popular phrases or words

2008-04-16 Thread Otis Gospodnetic
Eric,

Look at LUCENE-400 or Lucene trunk/contrib/analyzers for the shingles stuff.
Have you checked the Wiki for info about LukeRequestHandler?  I bet it's there.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Edwin Koome <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, April 16, 2008 4:07:24 PM
Subject: Re: Searching for popular phrases or words

Thanks Chris.

I had in mind "Occurs in alot of documents". Please do
point me where i can pick up an example of using the
LukeRequestHandler and the "shingles based tokenizer".

Eric
--- Chris Hostetter <[EMAIL PROTECTED]> wrote:

> 
> it depends on your definition of "polular" if you
> mean "occurs in a lot of 
> documents" then take a look at the
> LukeRequestHandler ... if can give you 
> info on terms with high frequencies (and you can use
> a Shingles based 
> tokenizer to index "phrase" as terms
> 
> if by popular you mean "occurs in a lot of queries"
> there isn't anything 
> in Solr that keeps track of what people search for
> ... your application 
> would need to do that.
> 
> : How can i search for popular phrases or words with
> an
> : option to include only, for example, technical
> terms
> : e.g "Oracle database" rather than common english
> 
> You'll need a better definition of your goal to get
> any meaningful answer 
> to the "an option to include only, for example,
> technical terms" part of 
> that question ... the "for example" implies that
> there are other examples 
> ... how would you (as a human person) decide when to
> classify a phrase as 
> a "technical" phrase, vs an ... "other" phrase?  if
> you can't answer that 
> question, then neither can code.
> 
> 
> -Hoss
> 
> 



  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ





Re: too many queries?

2008-04-16 Thread oleg_gnatovskiy

Oh ok. That makes sense. Thanks.

Otis Gospodnetic wrote:
> 
> Oleg, you can't explicitly say "N GB for index".  Wunder was just saying
> how much you can imagine how much RAM each piece might need and be happy
> with.
>  
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
> From: oleg_gnatovskiy <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, April 16, 2008 2:05:23 PM
> Subject: Re: too many queries?
> 
> 
> Hello. I am having a similar problem as the OP. I see that you recommended
> setting 4GB for the index, and 2 for Solr. How do I allocate memory for
> the
> index? I was under the impression that Solr did not support a RAMIndex.
> 
> 
> Walter Underwood wrote:
>> 
>> Do it. 32-bit OS's went out of style five years ago in server-land.
>> 
>> I would start with 8GB of RAM. 4GB for your index, 2 for Solr, 1 for
>> the OS and 1 for other processes. That might be tight. 12GB would
>> be a lot better.
>> 
>> wunder
>> 
>> On 4/16/08 7:50 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
>> 
>>> In order to do that I have to change to a 64 bits OS so I can have more
>>> than
>>> 4 GB of RAM.Is there any way to see how long does it takes to Solr to
>>> warmup
>>> the searcher?
>>> 
>>> On Wed, Apr 16, 2008 at 11:40 AM, Walter Underwood
>>> <[EMAIL PROTECTED]>
>>> wrote:
>>> 
 A commit every two minutes means that the Solr caches are flushed
 before they even start to stabilize. Two things to try:
 
 * commit less often, 5 minutes or 10 minutes
 * have enough RAM that your entire index can fit in OS file buffers
 
 wunder
 
 On 4/16/08 6:27 AM, "Jonathan Ariel" <[EMAIL PROTECTED]> wrote:
 
> So I counted the number if distinct values that I have for each field
 that I
> want a facet on. In total it's around 100,000. I tried with a
 filterCache
> of 120,000 but it seems like too much because the server went down. I
 will
> try with less, around 75,000 and let you know.
> 
> How do you to partition the data to a static set and a dynamic set,
> and
 then
> combining them at query time? Do you have a link to read about that?
> 
> 
> 
> On Tue, Apr 15, 2008 at 7:21 PM, Mike Klaas <[EMAIL PROTECTED]>
 wrote:
> 
>> On 15-Apr-08, at 5:38 AM, Jonathan Ariel wrote:
>> 
>>> My index is 4GB on disk. My servers has 8 GB of RAM each (the OS is
>>> 32
>>> bits).
>>> It is optimized twice a day, it takes around 15 minutes to optimize.
>>> The index is updated (commits) every two minutes. There are between
>>> 10
>>> and
>>> 100 inserts/updates every 2 minutes.
>>> 
>> 
>> Caching could help--you should definitely start there.
>> 
>> The commit every 2 minutes could end up being an unsurmountable
 problem.
>>  You may have to partition your data into a large, mostly static set
 and a
>> small dynamic set, combining the results at query time.
>> 
>> -Mike
>> 
 
 
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/too-many-queries--tp16690870p16727264.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/too-many-queries--tp16690870p16732932.html
Sent from the Solr - User mailing list archive at Nabble.com.



Installation help

2008-04-16 Thread Shawn Carraway
Hi all,
I am trying to install Solr with Jetty (as part of another application)
on a Linux server running Gentoo linux and JDK 1.6.0_05.

When I try to start Jetty (and Solr), it doesn't open a port.

I know you will need more info, but I'm not sure what you would need as
I'm not clear on how this part works.

Thanks,
Shawn



POST interface to sending queries to SOLR?

2008-04-16 Thread Jim Adams
Folks,

I know there is a 'GET' to send queries to Solr.  But is there a POST
interface to sending queries?  If so, can someone point me in that
direction?

Thanks, Jim


Re: Installation help

2008-04-16 Thread Matt Mitchell
What does the Jetty log output say in the console after you start it? It
should mention the port # on one of the last lines. If it does, try using
curl or wget to do a local request:

curl http://localhost:8983/solr/
wget http://localhost:8983/solr/

Matt

On Wed, Apr 16, 2008 at 5:08 PM, Shawn Carraway <[EMAIL PROTECTED]>
wrote:

> Hi all,
> I am trying to install Solr with Jetty (as part of another application)
> on a Linux server running Gentoo linux and JDK 1.6.0_05.
>
> When I try to start Jetty (and Solr), it doesn't open a port.
>
> I know you will need more info, but I'm not sure what you would need as
> I'm not clear on how this part works.
>
> Thanks,
> Shawn
>
>


XSLT transform before update?

2008-04-16 Thread Daniel Papasian
Hey everyone,

I'm experimenting with updating solr from a remote XML source, using an
XSLT transform to get it into the solr XML syntax (and yes, I've looked
into SOLR-469, but disregarded it as I need to do quite a bit using XSLT
to get it to what I can index) to let me maintain an index.

I'm looking at using stream.url, but I need to do the XSLT at some point
in there.  I would prefer to do the XSLT on the client (solr) side of
the transfer, for various reasons.

Is there a way to implement a custom request handler or similar to get
solr to apply an XSLT transform to the content stream before it attempts
to parse it?  If not possible OOTB, where would be the right place to
add said functionality?

Thanks much for your help,

Daniel


Re: XSLT transform before update?

2008-04-16 Thread Chris Hostetter

: Is there a way to implement a custom request handler or similar to get
: solr to apply an XSLT transform to the content stream before it attempts
: to parse it?  If not possible OOTB, where would be the right place to
: add said functionality?

take a look at SOLR-285 and SOLR-370 ... a RequestHandler is the right way 
to go, the biggest problems with the patch in SOLR-285 at the moment are:
  a) i wrote it and i don't know much about doing XSLT transformations in 
java efficiently.
  b) the existing XSLT Transformer "caching" code in Solr is really 
trivial and not suitable for any real volume ... if it were overhauled to 
take advantage of the standard SolrCache APIs it would be a lot more 
reusable by both the XsltResponseWriter and a new XsltUpdateHandler.

-Hoss



Re: POST interface to sending queries to SOLR?

2008-04-16 Thread Chris Hostetter

: I know there is a 'GET' to send queries to Solr.  But is there a POST
: interface to sending queries?  If so, can someone point me in that
: direction?

POST using standard the standard application/x-www-form-urlencoded 
content-type (ie: the same way you would POST using any HTML form)



-Hoss



Re: XSLT transform before update?

2008-04-16 Thread Shalin Shekhar Mangar
Hi Daniel,

Maybe if you can give us a sample of how your XML looks like, we can suggest
how to use SOLR-469 (Data Import Handler) to index it. Most of the use-cases
we have yet encountered are solvable using the XPathEntityProcessor in
DataImportHandler without using XSLT, for details look at
http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476

If you're willing to write code, you can do almost anything with
DataImportHandler. If this is a general need, I can look into adding XSLT
support in Data Import Handler.

On Thu, Apr 17, 2008 at 9:13 AM, Daniel Papasian <
[EMAIL PROTECTED]> wrote:

> Hey everyone,
>
> I'm experimenting with updating solr from a remote XML source, using an
> XSLT transform to get it into the solr XML syntax (and yes, I've looked
> into SOLR-469, but disregarded it as I need to do quite a bit using XSLT
> to get it to what I can index) to let me maintain an index.
>
> I'm looking at using stream.url, but I need to do the XSLT at some point
> in there.  I would prefer to do the XSLT on the client (solr) side of
> the transfer, for various reasons.
>
> Is there a way to implement a custom request handler or similar to get
> solr to apply an XSLT transform to the content stream before it attempts
> to parse it?  If not possible OOTB, where would be the right place to
> add said functionality?
>
> Thanks much for your help,
>
> Daniel
>



-- 
Regards,
Shalin Shekhar Mangar.