Re: CloudSolrServer vs ConcurrentUpdateSolrServer for indexing

2013-04-17 Thread rulinma
you can use multithread.
for fast , you also can cal(general hash algrothim) solrserver to add docs.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CloudSolrServer-vs-ConcurrentUpdateSolrServer-for-indexing-tp4055772p4056606.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Push/pull model between leader and replica in one shard

2013-04-17 Thread Furkan KAMACI
Hej Mark;

What did you use to prepare your presentation, its really nice.

2013/4/17 Furkan KAMACI 

> Really nice presentation.
>
>
> 2013/4/17 Mark Miller 
>
>>
>> On Apr 16, 2013, at 1:36 AM, SuoNayi  wrote:
>>
>> > Hi, can someone explain more details about what model is used to sync
>> docs between the lead and
>> > replica in the shard?
>> > The model can be push or pull.Supposing I have only one shard that has
>> 1 leader and 2 replicas,
>> > when the leader receives a update request, does it will scatter the
>> request to each available and active
>> > replica at first and then processes the request locally at last?In this
>> case if the replicas are able to catch
>> > up with the leader can I think this is a push model that the leader
>> pushes updates to it's replicas?
>>
>> Currently, the leader adds the doc locally and then sends it to all
>> replicas concurrently.
>>
>> >
>> >
>> > What happens if a replica is behind the leader?Will the replica pull
>> docs from the leader and keep
>> > a track of the coming updates from the lead in a log(called tlog)?If so
>> when it complete pulling docs
>> > it will replay updates in the tlog at last?
>>
>> If an update forwarded from a leader to a replica fails it's likely
>> because that replica died. Just in case, the leader will ask that replica
>> to enter "recovery".
>>
>> When a node comes up and is not a leader, it also enters "recovery".
>>
>> Recovery tries to peersync from the leader, and if that fails (works if
>> off by about 100 updates), it replicates the entire index.
>>
>> If you are interested in more details on the SolrCloud architecture, I've
>> given a few talks on it - two of them here:
>>
>> http://vimeo.com/43913870
>> http://www.youtube.com/watch?v=eVK0wLkLw9w
>>
>> - Mark
>>
>>
>


Re: Function Query performance in combination with filters

2013-04-17 Thread Rogalon
Rogalon wrote
> Am 16. April 2013 um 14:46 schrieb "Yonik Seeley-4 [via Lucene]" <

> ml-node+s472066n4056299h21@.nabble

> >:
> 
>> On Tue, Apr 16, 2013 at 7:51 AM, Rogalon <[hidden email]> wrote:
>>
>> > Hi,
>> > I am using pretty complex function queries to completely customize (not
>> only
>> > boost) the score of my result documents that are retrieved from an
>> index of
>> > approx 10e7 documents. To get to an acceptable level of performance I
>> > combine my query with filters in the following way (very short
>> example):
>> >
>> >
>> q=_val_:"sum(termfreq(fieldname,`word`),termfreq(fieldname2,`word2`))"&fq=fieldname:`word`&fq=fieldname2:`word2`
>> >
>> > Although I always have (because of the filter) approx 50.000 docs in
>> the
>> > result set, the search times vary (depending on the actual query)
>> between
>> > 100ms and 6000ms.
>> >
>> > My understanding was that the scoring function is only applied to the
>> result
>> > set from the filters.
>>
>> That should be the case.
>>
>> > But based on what I am seeing it seems that a lot more
>> > documents are actually put through the _val_ function.
>>
>> How did you verify this? 
> 
> Thanks for taking a look at my problem.
> 
> For now - I verified just by taking a look at the query times and doing
> some simple experiments.
> 
> If I am not using the function query at all (q=*:*&fq=...), the approx.
> 50.000 results from the filters are always returned within 200-300ms. This
> is pretty stable. If I have a (test) index of 50.000 documents (instead of
> the the 10e7 index) only and I pass every document through the _val_ query
> (without any filters), this takes about 150ms which in my case would be
> ok.
> 
> Applying no filters to the function query on the 10e7 index leads to
> search times at about 6000ms which is too much.
> 
> But if I use the filters as stated above I get returned 50.000 documents
> but the query times suddenly start to vary between 100ms and 6000ms. Some
> of my filters might actually be on stop words which appear in every other
> document in the index but that seems to really hurt performance only if
> the function query is used.
> 
>  Greetings, Nico 
>>
>>
>> -Yonik
>> http://lucidworks.com
>>
>>
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://lucene.472066.n3.nabble.com/Function-Query-performance-in-combination-with-filters-tp4056283p4056299.html
>>  
>> To unsubscribe from Function Query performance in combination with
>> filters, click here.
>> NAML
>>

Any idea what else I could try to figure out the issue? Thanks in advance
;-)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Function-Query-performance-in-combination-with-filters-tp4056283p4056609.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Document Missing from Share in Solr cloud

2013-04-17 Thread Cool Techi
Field type is string and this has happened for multiple docs over the past week.

Regards,
Ayush

> Date: Tue, 16 Apr 2013 14:06:40 -0600
> Subject: Re: Document Missing from Share in Solr cloud
> From: thelabd...@gmail.com
> To: solr-user@lucene.apache.org
> 
> btw ... what is the field type of your unique ID field?
> 
> 
> On Tue, Apr 16, 2013 at 12:34 PM, Timothy Potter wrote:
> 
> > Ok, that makes more sense and is definitely cause for concern. Do you have
> > a sense for whether this is ongoing or happened a few times unexpectedly in
> > the past? If ongoing, then will probably be easier to track down the root
> > cause.
> >
> >
> > On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi wrote:
> >
> >> That's what I am trying to say, the document is not replicated across all
> >> the replicas for a specific shard, hence the query show different results
> >> on every refresh.
> >>
> >>
> >>
> >> > Date: Tue, 16 Apr 2013 11:34:18 -0600
> >> > Subject: Re: Document Missing from Share in Solr cloud
> >> > From: thelabd...@gmail.com
> >> > To: solr-user@lucene.apache.org
> >> >
> >> > If you are using the default doc router for indexing in SolrCloud, then
> >> a
> >> > document only exists in a single shard but can be replicated in that
> >> shard
> >> > to any number of replicas.
> >> >
> >> > Can you clarify your question as it sounds like you're saying that the
> >> > document is not replicated across all the replicas for a specific
> >> shard? If
> >> > so, that's definitely a problem ...
> >> >
> >> >
> >> > On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi 
> >> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > We noticed a strange behavior in our solr cloud setup, we are using
> >> > > solr4.2  with 1:3 replication setting. We noticed that some of the
> >> > > documents were showing up in search sometimes and not at other, the
> >> reason
> >> > > being the document was not present in all the shards.
> >> > >
> >> > > We have restarted zookeeper and also entire cloud, but these
> >> documents are
> >> > > not being replicated in all the shards for some reason and hence
> >> > > inconsistent search results.
> >> > >
> >> > > Regards,
> >> > > Ayush
> >> > >
> >>
> >>
> >
> >
  

RE: Document Missing from Share in Solr cloud

2013-04-17 Thread Cool Techi
Shouldnt the number of docs across shards be same, I can see a difference 

Shard 1
Last Modified:about 2 hours agoNum Docs:26236135Max Doc:26592164Deleted 
Docs:356029Version:6672183Segment Count:34Shard1  Replica

Last Modified: about 2 hours agoNum Docs:26236135Max Doc:26594887Deleted 
Docs:358752Version:6678209Segment Count: 27

> From: cooltec...@outlook.com
> To: solr-user@lucene.apache.org
> Subject: RE: Document Missing from Share in Solr cloud
> Date: Wed, 17 Apr 2013 13:28:16 +0530
> 
> Field type is string and this has happened for multiple docs over the past 
> week.
> 
> Regards,
> Ayush
> 
> > Date: Tue, 16 Apr 2013 14:06:40 -0600
> > Subject: Re: Document Missing from Share in Solr cloud
> > From: thelabd...@gmail.com
> > To: solr-user@lucene.apache.org
> > 
> > btw ... what is the field type of your unique ID field?
> > 
> > 
> > On Tue, Apr 16, 2013 at 12:34 PM, Timothy Potter 
> > wrote:
> > 
> > > Ok, that makes more sense and is definitely cause for concern. Do you have
> > > a sense for whether this is ongoing or happened a few times unexpectedly 
> > > in
> > > the past? If ongoing, then will probably be easier to track down the root
> > > cause.
> > >
> > >
> > > On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi 
> > > wrote:
> > >
> > >> That's what I am trying to say, the document is not replicated across all
> > >> the replicas for a specific shard, hence the query show different results
> > >> on every refresh.
> > >>
> > >>
> > >>
> > >> > Date: Tue, 16 Apr 2013 11:34:18 -0600
> > >> > Subject: Re: Document Missing from Share in Solr cloud
> > >> > From: thelabd...@gmail.com
> > >> > To: solr-user@lucene.apache.org
> > >> >
> > >> > If you are using the default doc router for indexing in SolrCloud, then
> > >> a
> > >> > document only exists in a single shard but can be replicated in that
> > >> shard
> > >> > to any number of replicas.
> > >> >
> > >> > Can you clarify your question as it sounds like you're saying that the
> > >> > document is not replicated across all the replicas for a specific
> > >> shard? If
> > >> > so, that's definitely a problem ...
> > >> >
> > >> >
> > >> > On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi 
> > >> wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > We noticed a strange behavior in our solr cloud setup, we are using
> > >> > > solr4.2  with 1:3 replication setting. We noticed that some of the
> > >> > > documents were showing up in search sometimes and not at other, the
> > >> reason
> > >> > > being the document was not present in all the shards.
> > >> > >
> > >> > > We have restarted zookeeper and also entire cloud, but these
> > >> documents are
> > >> > > not being replicated in all the shards for some reason and hence
> > >> > > inconsistent search results.
> > >> > >
> > >> > > Regards,
> > >> > > Ayush
> > >> > >
> > >>
> > >>
> > >
> > >
> 
  

Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
I managed to get this done. The facet queries now facets on a multivalue
field as opposed to the dynamic field names.

Unfortunately it doesn't seem to have done much difference, if any at all.

Some more information that might help:

The JVM memory seem to be eaten up slowly. I dont think that there is one
single query that causes the problem. My test case (dumping 180 clients on
top of solr) takes hours before it causes an OOM. Often a full day. The
memory usage wobbles up and down, so the GC is at least partially doing its
job. It still works its way up to 100% eventually. When that happens it
either OOM's or it stops the world and brings the memory consumption to
10-15 gigs.

I did try to facet on all products across all clients (about 1.4 mil docs)
and i could not make it OOM on a server with a 4 gig jvm. This was on a
dedicated test server with my test being the only traffic.

I am beginning to think that this may be related to traffic volume and not
just on the type of query that I do.

I tried to calculate the memory requirement example you gave me above based
on the change that got rid of the dynamic fields.

documents = ~1.400.000
references 11.200.000  (we facet on two multivalue fields with each 4
values on average, so 1.400.000 * 2 * 4 = 11.200.000
unique values = 1.132.344 (total number of variant options across all
clients. This is what we facet on)

1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field
(we have 4 fields)?

I must be calculating this wrong.






On Mon, Apr 15, 2013 at 2:10 PM, John Nielsen  wrote:

> I did a search. I have no occurrence of "UnInverted" in the solr logs.
>
> > Another explanation for the large amount of memory presents itself if
> > you use a single index: If each of your clients facet on at least one
> > fields specific to the client ("client123_persons" or something like
> > that), then your memory usage goes through the roof.
>
> This is exactly how we facet right now! I will definetely rewrite the
> relevant parts of our product to test this out before moving further down
> the docValues path.
>
> I will let you know as soon as I know one way or the other.
>
>
> On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen 
> wrote:
>
>> On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:
>>
>> > The FieldCache is the big culprit. We do a huge amount of faceting so
>> > it seems right.
>>
>> Yes, you wrote that earlier. The mystery is that the math does not check
>> out with the description you have given us.
>>
>> > Unfortunately I am super swamped at work so I have precious little
>> > time to work on this, which is what explains my silence.
>>
>> No problem, we've all been there.
>> >
>> [Band aid: More memory]
>>
>> > The extra memory helped a lot, but it still OOM with about 180 clients
>> > using it.
>>
>> You stated earlier that you has a "solr cluster" and your total(?) index
>> size was 35GB, with each "register" being between "15k" and "30k". I am
>> using the quotes to signify that it is unclear what you mean. Is your
>> cluster multiple machines (I'm guessing no), multiple Solr's, cores,
>> shards or maybe just a single instance prepared for later distribution?
>> Is a register a core, shard or a simply logical part (one client's data)
>> of the index?
>>
>> If each client has their own core or shard, that would mean that each
>> client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
>> ~= 200MB of index. That sounds quite high and you would need a very
>> heavy facet to reach that.
>>
>> If you could grep "UnInverted" from the Solr log file and paste the
>> entries here, that would help to clarify things.
>>
>>
>> Another explanation for the large amount of memory presents itself if
>> you use a single index: If each of your clients facet on at least one
>> fields specific to the client ("client123_persons" or something like
>> that), then your memory usage goes through the roof.
>>
>> Assuming an index with 10M documents, each with 5 references to a modest
>> 10K unique values in a facet field, the simplified formula
>>   #documents*log2(#references) + #references*log2(#unique_values) bit
>> tells us that this takes at least 110MB with field cache based faceting.
>>
>> 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
>> least double that. This fits neatly with your new heap of 64GB.
>>
>>
>> If my guessing is correct, you can solve your memory problems very
>> easily by sharing _all_ the facet fields between your clients.
>> This should bring your memory usage down to a few GB.
>>
>> You are probably already restricting their searches to their own data by
>> filtering, so this should not influence the returned facet values and
>> counts, as compared to separate fields.
>>
>> This is very similar to the thread "Facets with 5000 facet fields" BTW.
>>
>> > Today I finally managed to set up a test core so I can begin to play
>> > around with docValues.
>>
>> If you are using a single index with t

Re: Document Missing from Share in Solr cloud

2013-04-17 Thread Upayavira
Well, your numdocs *is* the same. Your maxdocs isn't, which sounds right
to me.

maxdocs is the number of documents, including deleted ones. Given
deleted docs are purged by background merges, it makes sense that each
index is deciding differently when to do those merges. But the number of
undeleted docs is the same which is a good thing.

Do queries against each replica for a shard, with distrib=false, and see
whether the results are the same.

Upayavira

On Wed, Apr 17, 2013, at 09:14 AM, Cool Techi wrote:
> Shouldnt the number of docs across shards be same, I can see a difference 
> 
> Shard 1
> Last Modified:about 2 hours agoNum Docs:26236135Max Doc:26592164Deleted
> Docs:356029Version:6672183Segment Count:34Shard1  Replica
> 
> Last Modified: about 2 hours agoNum Docs:26236135Max Doc:26594887Deleted
> Docs:358752Version:6678209Segment Count: 27
> 
> > From: cooltec...@outlook.com
> > To: solr-user@lucene.apache.org
> > Subject: RE: Document Missing from Share in Solr cloud
> > Date: Wed, 17 Apr 2013 13:28:16 +0530
> > 
> > Field type is string and this has happened for multiple docs over the past 
> > week.
> > 
> > Regards,
> > Ayush
> > 
> > > Date: Tue, 16 Apr 2013 14:06:40 -0600
> > > Subject: Re: Document Missing from Share in Solr cloud
> > > From: thelabd...@gmail.com
> > > To: solr-user@lucene.apache.org
> > > 
> > > btw ... what is the field type of your unique ID field?
> > > 
> > > 
> > > On Tue, Apr 16, 2013 at 12:34 PM, Timothy Potter 
> > > wrote:
> > > 
> > > > Ok, that makes more sense and is definitely cause for concern. Do you 
> > > > have
> > > > a sense for whether this is ongoing or happened a few times 
> > > > unexpectedly in
> > > > the past? If ongoing, then will probably be easier to track down the 
> > > > root
> > > > cause.
> > > >
> > > >
> > > > On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi 
> > > > wrote:
> > > >
> > > >> That's what I am trying to say, the document is not replicated across 
> > > >> all
> > > >> the replicas for a specific shard, hence the query show different 
> > > >> results
> > > >> on every refresh.
> > > >>
> > > >>
> > > >>
> > > >> > Date: Tue, 16 Apr 2013 11:34:18 -0600
> > > >> > Subject: Re: Document Missing from Share in Solr cloud
> > > >> > From: thelabd...@gmail.com
> > > >> > To: solr-user@lucene.apache.org
> > > >> >
> > > >> > If you are using the default doc router for indexing in SolrCloud, 
> > > >> > then
> > > >> a
> > > >> > document only exists in a single shard but can be replicated in that
> > > >> shard
> > > >> > to any number of replicas.
> > > >> >
> > > >> > Can you clarify your question as it sounds like you're saying that 
> > > >> > the
> > > >> > document is not replicated across all the replicas for a specific
> > > >> shard? If
> > > >> > so, that's definitely a problem ...
> > > >> >
> > > >> >
> > > >> > On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi 
> > > >> wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > We noticed a strange behavior in our solr cloud setup, we are using
> > > >> > > solr4.2  with 1:3 replication setting. We noticed that some of the
> > > >> > > documents were showing up in search sometimes and not at other, the
> > > >> reason
> > > >> > > being the document was not present in all the shards.
> > > >> > >
> > > >> > > We have restarted zookeeper and also entire cloud, but these
> > > >> documents are
> > > >> > > not being replicated in all the shards for some reason and hence
> > > >> > > inconsistent search results.
> > > >> > >
> > > >> > > Regards,
> > > >> > > Ayush
> > > >> > >
> > > >>
> > > >>
> > > >
> > > >
> >   
> 


Re: CloudSolrServer vs ConcurrentUpdateSolrServer for indexing

2013-04-17 Thread jmozah
Sorry.. i didn't understand that... 
did you mean to configure CloudSolrServer with general hash algorithm?

./zahoor

On 17-Apr-2013, at 1:06 PM, rulinma  wrote:

> you also can cal(general hash algrothim) solrserver to add docs.



Re: first time with new keyword, solr take to much time to give the result

2013-04-17 Thread Duncan Irvine
tl;dr: retrieving 10,000 docs is a bad idea. Look into docValues for
storing security info

I suspect that you'll be better served by keeping the permissions
up-to-date in solr and invalidating the caches rather than trying to return
10,000 docs.  On average, you'll be attempting to read up to 800MB of data
per query (400GB * 1/506), and that will be accessed randomly.
 Assuming As Toke said earlier, on a disc this will just be a bad idea.  If
you must persist in querying like this, then I'd second the SSDs - a pair
in RAID 1 should give you good read performance, adequate write and
redundancy. You might be able to get that query down to something in the
region of 5-10 seconds at a guess.  Assuming that you're not actually
returning the entire document in the response, giving an 800MB network
response even on GbE that'll be 10s just for layer-2, let alone the
serialisation overhead.

You might try looking into storing your security information in docValues
fields - set docValues=true against the field in schema.xml (needs Solr 4.2).
 That ought to give greater performance when reading that field and may
circumvent your concerns over cache invalidation although I haven't played
with them yet, so don't quote me on that.

Can you be more specific about the security model? What is being stored in
the DB? How does that get applied to the document? Can you translate that
into a query that solr could understand?  Is it too complex, or are you
really just worried about cache invalidation?

Would it be acceptable to have the security info in solr, but lagging the
DB somewhat. Then select a smaller selection and post-filter in your
business layer?
i.e. instead of running just q=foo:bar&rows=1 then filtering, you run a
query such as q=foo:bar&fq=security_group:(2 3 19)&rows=150 and then
filtering against your DB as a final double-check before presenting to your
user.  This would mean that they would immediately be prevented from seeing
something that they're no longer allowed to, but may have to wait for the
next update to see something they've just been allowed to.

Regards,
  Duncan.


On 16 April 2013 15:02, Montu v Boda  wrote:

> Hi
>
> problem is that the permission is frequently update in our system so that
> we
> have to update the index in the same manner other wise it will give wrong
> result.
>
> in that case i think the cache will get effect and the performance may be
> reduced.
>
>
> Thanks & Regards
> Montu v Boda
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056322.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Don't let your mind wander -- it's too little to be let out alone.


RE: Document Missing from Share in Solr cloud

2013-04-17 Thread Cool Techi
Sorry, made a copy paste mistake. The numbers are different.

My cloud has two shards with each shard having 1 replica. One of the shards and 
replica have the same number of docs, while in the other shard there is a 
mismatch.

Regards,
Ayush

> From: u...@odoko.co.uk
> To: solr-user@lucene.apache.org
> Subject: Re: Document Missing from Share in Solr cloud
> Date: Wed, 17 Apr 2013 09:48:03 +0100
> 
> Well, your numdocs *is* the same. Your maxdocs isn't, which sounds right
> to me.
> 
> maxdocs is the number of documents, including deleted ones. Given
> deleted docs are purged by background merges, it makes sense that each
> index is deciding differently when to do those merges. But the number of
> undeleted docs is the same which is a good thing.
> 
> Do queries against each replica for a shard, with distrib=false, and see
> whether the results are the same.
> 
> Upayavira
> 
> On Wed, Apr 17, 2013, at 09:14 AM, Cool Techi wrote:
> > Shouldnt the number of docs across shards be same, I can see a difference 
> > 
> > Shard 1
> > Last Modified:about 2 hours agoNum Docs:26236135Max Doc:26592164Deleted
> > Docs:356029Version:6672183Segment Count:34Shard1  Replica
> > 
> > Last Modified: about 2 hours agoNum Docs:26236135Max Doc:26594887Deleted
> > Docs:358752Version:6678209Segment Count: 27
> > 
> > > From: cooltec...@outlook.com
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: Document Missing from Share in Solr cloud
> > > Date: Wed, 17 Apr 2013 13:28:16 +0530
> > > 
> > > Field type is string and this has happened for multiple docs over the 
> > > past week.
> > > 
> > > Regards,
> > > Ayush
> > > 
> > > > Date: Tue, 16 Apr 2013 14:06:40 -0600
> > > > Subject: Re: Document Missing from Share in Solr cloud
> > > > From: thelabd...@gmail.com
> > > > To: solr-user@lucene.apache.org
> > > > 
> > > > btw ... what is the field type of your unique ID field?
> > > > 
> > > > 
> > > > On Tue, Apr 16, 2013 at 12:34 PM, Timothy Potter 
> > > > wrote:
> > > > 
> > > > > Ok, that makes more sense and is definitely cause for concern. Do you 
> > > > > have
> > > > > a sense for whether this is ongoing or happened a few times 
> > > > > unexpectedly in
> > > > > the past? If ongoing, then will probably be easier to track down the 
> > > > > root
> > > > > cause.
> > > > >
> > > > >
> > > > > On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi 
> > > > > wrote:
> > > > >
> > > > >> That's what I am trying to say, the document is not replicated 
> > > > >> across all
> > > > >> the replicas for a specific shard, hence the query show different 
> > > > >> results
> > > > >> on every refresh.
> > > > >>
> > > > >>
> > > > >>
> > > > >> > Date: Tue, 16 Apr 2013 11:34:18 -0600
> > > > >> > Subject: Re: Document Missing from Share in Solr cloud
> > > > >> > From: thelabd...@gmail.com
> > > > >> > To: solr-user@lucene.apache.org
> > > > >> >
> > > > >> > If you are using the default doc router for indexing in SolrCloud, 
> > > > >> > then
> > > > >> a
> > > > >> > document only exists in a single shard but can be replicated in 
> > > > >> > that
> > > > >> shard
> > > > >> > to any number of replicas.
> > > > >> >
> > > > >> > Can you clarify your question as it sounds like you're saying that 
> > > > >> > the
> > > > >> > document is not replicated across all the replicas for a specific
> > > > >> shard? If
> > > > >> > so, that's definitely a problem ...
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi 
> > > > >> > 
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > We noticed a strange behavior in our solr cloud setup, we are 
> > > > >> > > using
> > > > >> > > solr4.2  with 1:3 replication setting. We noticed that some of 
> > > > >> > > the
> > > > >> > > documents were showing up in search sometimes and not at other, 
> > > > >> > > the
> > > > >> reason
> > > > >> > > being the document was not present in all the shards.
> > > > >> > >
> > > > >> > > We have restarted zookeeper and also entire cloud, but these
> > > > >> documents are
> > > > >> > > not being replicated in all the shards for some reason and hence
> > > > >> > > inconsistent search results.
> > > > >> > >
> > > > >> > > Regards,
> > > > >> > > Ayush
> > > > >> > >
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > 
> >   
  

Max http connections in CloudSolrServer

2013-04-17 Thread J Mohamed Zahoor
Hi

I am pumping parallel select queries using CloudSolrServer.
It looks like it can handle only certain no of  max connections...

my Question is,
How many concurrent queries does a CloudSolrServer can handle?


An old thread tries to answer this by asking to give our own instance of 
LBHttpSolrServer... 
But it looks like there is no way from LBHttpSolrServer to up the maxConnection 
of the httpClient it has...


Can someone let me know how to bump up the maxConnections and 
maxConnectionsPerHost parameter for the httpCLient used by cloudSolrServer?

./zahoor




RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
John Nielsen [j...@mcb.dk] wrote:
> I managed to get this done. The facet queries now facets on a multivalue 
> field as opposed to the dynamic field names.

> Unfortunately it doesn't seem to have done much difference, if any at all.

I am sorry to hear that.

> documents = ~1.400.000
> references 11.200.000  (we facet on two multivalue fields with each 4 values 
> on average, so 1.400.000 * 2 * 4 = 11.200.000
> unique values = 1.132.344 (total number of variant options across all clients.
> This is what we facet on)

> 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field 
> (we have 4 fields)?

> I must be calculating this wrong.

No, that sounds about right. In reality you need to multiply with 3 or 4, so 
let's round to 50MB/field: 1.4M documents with 2 fields with 5M 
references/field each is not very much and should not take a lot of memory. In 
comparison, we facet on 12M documents with 166M references and do some other 
stuff (in Lucene with a different faceting implementation, but at this level it 
is equivalent to Solr's in terms of memory). Our heap is 3GB.

I am surprised about the lack of "UnInverted" from your logs as it is logged on 
INFO level. It should also be available from the admin interface under 
collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing you got your 
numbers from that and that the list only contains the few facets you mentioned 
previously? It might be wise to sanity check by summing the memSizes though; 
they ought to take up far below 1GB.

>From your description, your index is small and your faceting requirements 
>modest. A SSD-equipped laptop should be adequate as server. So we are back to 
>"math does not check out".


You stated that you were unable to make a 4GB JVM OOM when you just performed 
faceting (I guesstimate that it will also run fine with just ½GB or at least 
with 1GB, based on the numbers above) and you have observed that the field 
cache eats the memory. This does indicate that the old caches are somehow not 
freed when the index is updated. That is strange as Solr should take care of 
that automatically.

Guessing wildly: Do you issue a high frequency small updates with frequent 
commits? If you pause the indexing, does memory use fall back to the single GB 
level (You probably need to trigger a full GC to check that)? If that is the 
case, it might be a warmup problem with old warmups still running when new 
commits are triggered.

Regards,
Toke Eskildsen, State and University Library, Denmark

Re: Document Missing from Share in Solr cloud

2013-04-17 Thread Annette Newton
I have just experienced the same thing on 4.2.1. 4 Shards - each with 2
replicas.  Did some bulk loading and all but one Shard match up.  Small
discrepancy between the replicas, but no obvious errors either.  Will be
doing further loading shortly and will report findings.

Regards.

Netty.


On 17 April 2013 10:26, Cool Techi  wrote:

> Sorry, made a copy paste mistake. The numbers are different.
>
> My cloud has two shards with each shard having 1 replica. One of the
> shards and replica have the same number of docs, while in the other shard
> there is a mismatch.
>
> Regards,
> Ayush
>
> > From: u...@odoko.co.uk
> > To: solr-user@lucene.apache.org
> > Subject: Re: Document Missing from Share in Solr cloud
> > Date: Wed, 17 Apr 2013 09:48:03 +0100
> >
> > Well, your numdocs *is* the same. Your maxdocs isn't, which sounds right
> > to me.
> >
> > maxdocs is the number of documents, including deleted ones. Given
> > deleted docs are purged by background merges, it makes sense that each
> > index is deciding differently when to do those merges. But the number of
> > undeleted docs is the same which is a good thing.
> >
> > Do queries against each replica for a shard, with distrib=false, and see
> > whether the results are the same.
> >
> > Upayavira
> >
> > On Wed, Apr 17, 2013, at 09:14 AM, Cool Techi wrote:
> > > Shouldnt the number of docs across shards be same, I can see a
> difference
> > >
> > > Shard 1
> > > Last Modified:about 2 hours agoNum Docs:26236135Max Doc:26592164Deleted
> > > Docs:356029Version:6672183Segment Count:34Shard1  Replica
> > >
> > > Last Modified: about 2 hours agoNum Docs:26236135Max
> Doc:26594887Deleted
> > > Docs:358752Version:6678209Segment Count: 27
> > >
> > > > From: cooltec...@outlook.com
> > > > To: solr-user@lucene.apache.org
> > > > Subject: RE: Document Missing from Share in Solr cloud
> > > > Date: Wed, 17 Apr 2013 13:28:16 +0530
> > > >
> > > > Field type is string and this has happened for multiple docs over
> the past week.
> > > >
> > > > Regards,
> > > > Ayush
> > > >
> > > > > Date: Tue, 16 Apr 2013 14:06:40 -0600
> > > > > Subject: Re: Document Missing from Share in Solr cloud
> > > > > From: thelabd...@gmail.com
> > > > > To: solr-user@lucene.apache.org
> > > > >
> > > > > btw ... what is the field type of your unique ID field?
> > > > >
> > > > >
> > > > > On Tue, Apr 16, 2013 at 12:34 PM, Timothy Potter <
> thelabd...@gmail.com>wrote:
> > > > >
> > > > > > Ok, that makes more sense and is definitely cause for concern.
> Do you have
> > > > > > a sense for whether this is ongoing or happened a few times
> unexpectedly in
> > > > > > the past? If ongoing, then will probably be easier to track down
> the root
> > > > > > cause.
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi <
> cooltec...@outlook.com>wrote:
> > > > > >
> > > > > >> That's what I am trying to say, the document is not replicated
> across all
> > > > > >> the replicas for a specific shard, hence the query show
> different results
> > > > > >> on every refresh.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> > Date: Tue, 16 Apr 2013 11:34:18 -0600
> > > > > >> > Subject: Re: Document Missing from Share in Solr cloud
> > > > > >> > From: thelabd...@gmail.com
> > > > > >> > To: solr-user@lucene.apache.org
> > > > > >> >
> > > > > >> > If you are using the default doc router for indexing in
> SolrCloud, then
> > > > > >> a
> > > > > >> > document only exists in a single shard but can be replicated
> in that
> > > > > >> shard
> > > > > >> > to any number of replicas.
> > > > > >> >
> > > > > >> > Can you clarify your question as it sounds like you're saying
> that the
> > > > > >> > document is not replicated across all the replicas for a
> specific
> > > > > >> shard? If
> > > > > >> > so, that's definitely a problem ...
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi <
> cooltec...@outlook.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hi,
> > > > > >> > >
> > > > > >> > > We noticed a strange behavior in our solr cloud setup, we
> are using
> > > > > >> > > solr4.2  with 1:3 replication setting. We noticed that some
> of the
> > > > > >> > > documents were showing up in search sometimes and not at
> other, the
> > > > > >> reason
> > > > > >> > > being the document was not present in all the shards.
> > > > > >> > >
> > > > > >> > > We have restarted zookeeper and also entire cloud, but these
> > > > > >> documents are
> > > > > >> > > not being replicated in all the shards for some reason and
> hence
> > > > > >> > > inconsistent search results.
> > > > > >> > >
> > > > > >> > > Regards,
> > > > > >> > > Ayush
> > > > > >> > >
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > >
> > >
>
>



-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confi

RE: Scaling Solr on VMWare

2013-04-17 Thread adfel70
Hi
We are currently considering running solr cloud on vmware.
Di you have any insights regarding the issue you encountered and generally
regarding using virtual machines instead of physical machines for solr
cloud?


Frank Wennerdahl wrote
> Hi Otis and thanks for your response.
> 
> We are indeed suspecting that the problem with only 2 cores being used
> might
> be caused by the virtual environment. We're hoping that someone with
> experience of running Solr on VMWare might know more about this or the
> other
> issues we have.
> 
> The servlet we're running is the bundled Jetty servlet (Solr version 4.1).
> As we have seen a higher number of CPU cores utilized when sending data to
> Solr locally it seems that the servlet isn't restricting the number of
> threads used.
> 
> Frank
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:

> otis.gospodnetic@

> ] 
> Sent: den 26 mars 2013 05:09
> To: 

> solr-user@.apache

> Subject: Re: Scaling Solr on VMWare
> 
> Hi Frank,
> 
> If your servlet container had a crazy low setting for the max number of
> threads I think you would see the CPU underutilized.  But I think you
> would
> also see errors in on the client about connections being requested. 
> Sounds
> like a possibly VM issue that's not Solr-specific...
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> 
> 
> 
> 
> On Mon, Mar 25, 2013 at 1:18 PM, Frank Wennerdahl
> <

> frank.wennerdahl@

> > wrote:
>> Hi.
>>
>>
>>
>> We are currently benchmarking our Solr setup and are having trouble 
>> with scaling hardware for a single Solr instance. We want to 
>> investigate how one instance scales with hardware to find the optimal 
>> ratio of hardware vs sharding when scaling. Our main problem is that 
>> we cannot identify any hardware limitations, CPU is far from maxed 
>> out, disk I/O is not an issue as far as we can see and there is plenty of
> RAM available.
>>
>>
>>
>> In short we have a couple of questions that we hope someone here could 
>> help us with. Detailed information about our setup, use case and 
>> things we've tried is provided below the questions.
>>
>>
>>
>> Questions:
>>
>> 1.   What could cause Solr to utilize only 2 CPU cores when sending
>> multiple update requests in parallel in a VMWare environment?
>>
>> 2.   Is there a software limit on the number of CPU cores that Solr
> can
>> utilize while indexing?
>>
>> 3.   Ruling out network and disk performance, what could cause a
>> decrease in indexing speed when sending data over a network as opposed 
>> to sending it from the local machine?
>>
>>
>>
>> We are running on three cores per Solr instance, however only one core 
>> receives any non-trivial load. We are using VMWare (ESX 5.0) virtual 
>> machines for hosting Solr and a QNAP NAS containing 12 HDDs in a RAID5 
>> setup for storage. Our data consists of a huge amount of small-sized
> documents.
>> When indexing we are using Solr's javabin format (although not through 
>> Solrj, we have implemented the format in C#/.NET) and our batch size 
>> is currently 1000 documents. The actual size of the data varies, but 
>> the batches we have used range from approximately 450KB to 1050KB. 
>> We're sending these batches to Solr in parallel using a number of send
> threads.
>>
>>
>>
>> There are two issues that we've run into:
>>
>> 1.   When sending data from one VM to Solr on another VM we observed
>> that Solr did not seem to utilize CPU cores properly. The Solr VM had 
>> 8 vCPUs available and we were using 4 threads sending data in 
>> parallel. We saw a low (~29%)  CPU utilization on the Solr VM with 2 
>> cores doing almost all the work while the remaining cores remained 
>> almost idle. Increasing the number of send threads to 8 yielded the 
>> same result, capping our indexing speed to about 4.88MB per second. 
>> The client VM had 4 vCPUs which were hardly utilized as we were reading
> data from pre-generated files.
>>
>> To rule out network limitations we sent the test data to a server on 
>> the Solr VM that simply accepted the request and returned an empty 
>> response. We were able to send data at 219MB per second, so the 
>> network did not seem to be the bottleneck. We also tested sending data 
>> to Solr locally from the Solr VM to see if disk I/O was the problem. 
>> Surprisingly we were able to index significantly faster at 7.34MB per 
>> second using 4 send threads (8.4MB with 6 send threads) which 
>> indicated that the disk was not slowing us down when sending data over 
>> the network. Worth noting is that the CPU utilization was now higher 
>> (47,81% with 4 threads, 58,8% with 6) and the work was spread out over 
>> all cores. As before we used pre-generated files and the process sending
> the data used almost no CPU.
>>
>> 2.   We decided to investigate how Solr would scale with additional
>> vCPUs when indexing locally. We increased the number of vCPUs to 16 
>> and the number of send threads to 8. Sadly we n

Re: Scaling Solr on VMWare

2013-04-17 Thread Peter Sturge
Hi,

We have run solr in VM environments extensively (3.6 not Cloud, but the
issues will be similar).
There are some significant things to be aware of when running Solr in a
virtualized environment (these can be equally true with Hyper-V and Xen as
well):
If you're doing heavy indexing, the networking can be a real bottleneck,
depending on the environment.
If you're using a virtual cluster, and you have other VMs that use lots of
network and/or CPU (e.g. a SQL Server, email etc.), you will encounter
performance issues (note: it's generally a good idea to tie a Solr instance
to a physical machine in the cluster).
Using virtual switches can, in some instances, create network bottlenecks,
particularly with high input indexing. There are myriad scenarios for
vSwitches, so it's not practical to go into all the possible scenarios here
- but the general rule is - be careful!
CPU context switching can have a huge impact on Solr, so assigning CPUs,
cores and virtual cores needs some care to ensure there's enough CPU
resource to get the jobs done, but not so many the VM is continually
waiting for cores to become free (VMWare will wait until all configured
core slots are free before proceeding with a request).

The above scratches the surface of running multi-threaded production
applications like Solr in a virtual environment, but hopefully it can
provide a staring point.

Thanks,
Peter



On Wed, Apr 17, 2013 at 11:56 AM, adfel70  wrote:

> Hi
> We are currently considering running solr cloud on vmware.
> Di you have any insights regarding the issue you encountered and generally
> regarding using virtual machines instead of physical machines for solr
> cloud?
>
>
> Frank Wennerdahl wrote
> > Hi Otis and thanks for your response.
> >
> > We are indeed suspecting that the problem with only 2 cores being used
> > might
> > be caused by the virtual environment. We're hoping that someone with
> > experience of running Solr on VMWare might know more about this or the
> > other
> > issues we have.
> >
> > The servlet we're running is the bundled Jetty servlet (Solr version
> 4.1).
> > As we have seen a higher number of CPU cores utilized when sending data
> to
> > Solr locally it seems that the servlet isn't restricting the number of
> > threads used.
> >
> > Frank
> >
> > -Original Message-
> > From: Otis Gospodnetic [mailto:
>
> > otis.gospodnetic@
>
> > ]
> > Sent: den 26 mars 2013 05:09
> > To:
>
> > solr-user@.apache
>
> > Subject: Re: Scaling Solr on VMWare
> >
> > Hi Frank,
> >
> > If your servlet container had a crazy low setting for the max number of
> > threads I think you would see the CPU underutilized.  But I think you
> > would
> > also see errors in on the client about connections being requested.
> > Sounds
> > like a possibly VM issue that's not Solr-specific...
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Mon, Mar 25, 2013 at 1:18 PM, Frank Wennerdahl
> > <
>
> > frank.wennerdahl@
>
> > > wrote:
> >> Hi.
> >>
> >>
> >>
> >> We are currently benchmarking our Solr setup and are having trouble
> >> with scaling hardware for a single Solr instance. We want to
> >> investigate how one instance scales with hardware to find the optimal
> >> ratio of hardware vs sharding when scaling. Our main problem is that
> >> we cannot identify any hardware limitations, CPU is far from maxed
> >> out, disk I/O is not an issue as far as we can see and there is plenty
> of
> > RAM available.
> >>
> >>
> >>
> >> In short we have a couple of questions that we hope someone here could
> >> help us with. Detailed information about our setup, use case and
> >> things we've tried is provided below the questions.
> >>
> >>
> >>
> >> Questions:
> >>
> >> 1.   What could cause Solr to utilize only 2 CPU cores when sending
> >> multiple update requests in parallel in a VMWare environment?
> >>
> >> 2.   Is there a software limit on the number of CPU cores that Solr
> > can
> >> utilize while indexing?
> >>
> >> 3.   Ruling out network and disk performance, what could cause a
> >> decrease in indexing speed when sending data over a network as opposed
> >> to sending it from the local machine?
> >>
> >>
> >>
> >> We are running on three cores per Solr instance, however only one core
> >> receives any non-trivial load. We are using VMWare (ESX 5.0) virtual
> >> machines for hosting Solr and a QNAP NAS containing 12 HDDs in a RAID5
> >> setup for storage. Our data consists of a huge amount of small-sized
> > documents.
> >> When indexing we are using Solr's javabin format (although not through
> >> Solrj, we have implemented the format in C#/.NET) and our batch size
> >> is currently 1000 documents. The actual size of the data varies, but
> >> the batches we have used range from approximately 450KB to 1050KB.
> >> We're sending these batches to Solr in parallel using a number of send
> > threads.
> >>
> >>
> >>
> >> There are two issues that we've run into:
> >>

Re: Document adds, deletes, and commits ... a question about visibility.

2013-04-17 Thread Erick Erickson
Personally I've never heard of a 500 document limit, I routinely use
1,000 doc batches (relatively small documents). Possibly your
co-worker exceeded the packet size or some other outside-solr
limitation?

Erick

On Mon, Apr 15, 2013 at 6:06 PM, Michael McCandless
 wrote:
> At the Lucene level, you don't have to commit before doing the
> deleteByQuery, i.e. 'a' will be correctly deleted without any
> intervening commit.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Apr 15, 2013 at 3:57 PM, Shawn Heisey  wrote:
>> Simple question first: Is there anything in SolrJ that prevents indexing
>> more than 500 documents in one request? I'm not aware of anything myself,
>> but a co-worker remembers running into something, so his code is restricting
>> them to 490 docs.  The only related limit I'm aware of is the POST buffer
>> size limit, which defaults in recent Solr versions to 2MiB.
>>
>> A more complex question: If I am doing both deletes and adds in separate
>> update requests, and I want to ensure that a delete in the next request can
>> delete a document that I am adding in the current one, do I need to commit
>> between the two requests?  This is probably more of a Lucene question than
>> Solr, but Solr is what I'm using.
>>
>> To simplify:  Let's say I start with an empty index.  I add documents "a"
>> and "b" in one request ... then I send a deleteByQuery request for "a" "c"
>> and "e".  If I don't do a commit between these two requests, will "a" still
>> be in the index when I commit after the second request? If so, would there
>> be an easy fix?
>>
>> Thanks,
>> Shawn


Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Erick Erickson
How big are you transaction logs? They can be replayed on startup.
They are truncated and a new one started when you do a hard commit
(openSearcher true or false doesn't matter).

So a quick test of this theory would be to just stop your indexing
process, issue a hard commit on all your cores and _then_ try to
restart. If it comes up immediately, you've identified your problem.

Best
Erick

On Tue, Apr 16, 2013 at 8:33 AM, Umesh Prasad  wrote:
> Hi,
> We are migrating to Solr 4.2 from Solr 3.6 and Solr 4.2 is throwing
> Exception on Restart. What is More, it take a hell lot of Time ( More than
> one hour to get Up and Running)
>
>
> THE exception After Restart ...
> =
> "Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates
> update
> WARNING: Unexpected log entry or corrupt log.  Entry=11
> java.lang.ClassCastException: java.lang.Long cannot be cast to
> java.util.List
> at
> org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929)
> at
> org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863)
> at
> org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014)
> at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253)
> at
> org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
> at
> org.apache.solr.update.UpdateHandler.(UpdateHandler.java:137)
> at
> org.apache.solr.update.UpdateHandler.(UpdateHandler.java:123)
> at
> org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:95)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
> at
> org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
> at org.apache.solr.core.SolrCore.(SolrCore.java:806)
> at org.apache.solr.core.SolrCore.(SolrCore.java:618)
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates
> update
> WARNING: Unexpected log entry or corrupt log.  Entry=8120?785879438123
> java.lang.ClassCastException: java.lang.String cannot be cast to
> java.util.List
> at
> org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929)
> at
> org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863)
> at
> org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014)
> at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253)
> at
> org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
> at
> org.apache.solr.update.UpdateHandler.(UpdateHandler.java:137)
> at
> org.apache.solr.update.UpdateHandler.(UpdateHandler.java:123)
> at
> org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:95)
>
> =
>
> And Once Restarted, I start getting replication errors
>
>
> Apr 16, 2013 5:20:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
> SEVERE: Master at: http://localhost:25280/solr/accessories is not
> available. Index fetch failed. Exception:
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: http://localhost:25280/solr/accessories
> Apr 16, 2013 5:20:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
> SEVERE: Master at: http://localhost:25280/solr/newQueries is not available.
> Index fetch failed. Exception:
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: http://localhost:25280/solr/newQueries
> Apr 16, 2013 5:21:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
> SEVERE: Master at: http://l

Re: JavaScript transform switch statement during Data Import

2013-04-17 Thread paulblyth
Sorry for not providing enough details initially. You're right, it's
difficult for me to share the real code but let me try and give you an
example.



http://www.w3.org/2001/XInclude"/>













My DIH is simply converting colours to use an uppercase first character
(it's just an example..). 

When using the 'myWorkingTransformer' function, the values are indexed
correctly as expected e.g. 'Orange', 'Red', 'Yellow' etc. 

Using the 'myBrokenTransformer' the import completes successfully but the
values are stored as 'orange', 'red', 'yellow' etc.

I hope that is clearer.

Thanks,
paul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JavaScript-transform-switch-statement-during-Data-Import-tp4056340p4056648.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
> I am surprised about the lack of "UnInverted" from your logs as it is
logged on INFO level.

Nope, no trace of it. No mention either in Logging -> Level from the admin
interface.

> It should also be available from the admin interface under
collection/Plugin / Stats/CACHE/fieldValueCache.

I never seriously looked at my fieldValueCache. It never seemed to get used:

http://screencast.com/t/YtKw7UQfU

> You stated that you were unable to make a 4GB JVM OOM when you just
performed faceting (I guesstimate that it will also run fine with just ½GB
or at least with 1GB, based on the
> numbers above) and you have observed that the field cache eats the
memory.

Yep. We still do a lot of sorting on dynamic field names, so the field
cache has a lot of entries. (9.411 entries as we speak. This is
considerably lower than before.). You mentioned in an earlier mail that
faceting on a field shared between all facet queries would bring down the
memory needed. Does the same thing go for sorting? Does those 9411 entries
duplicate data between them? If this is where all the memory is going, I
have a lot of coding to do.

> Guessing wildly: Do you issue a high frequency small updates with
frequent commits? If you pause the indexing, does memory use fall back to
the single GB level

I do commit a bit more often than i should. I get these in my log file from
time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 The way I
understand this is that two searchers are being warmed at the same time and
that one will be discarded when it finishes its auto warming procedure. If
the math above is correct, I would need tens of searchers auto
warming in parallel to cause my problem. If I misunderstand how this works,
do let me know.

My indexer has a cleanup routine that deletes replay logs and other things
when it has nothing to do. This includes running a commit on the solr
server to make sure nothing is ever in a state where something is not
written to disk anywhere. In theory it can commit once every 60 seconds,
though i doubt that ever happenes. The less work the indexer has, the more
often it commits. (yes i know, its on my todo list)

Other than that, my autocommit settings look like this:

 6 6000 
false 

The control panel says that the warm up time of the last searcher is 5574.
Is that seconds or milliseconds?
http://screencast.com/t/d9oIbGLCFQwl

I would prefer to not turn off the indexer unless the numbers above
suggests that I really should try this. Waiting for a full GC would take a
long time. Unfortunately I don't know of a way to provoke a full GC on
command.


On Wed, Apr 17, 2013 at 11:48 AM, Toke Eskildsen 
wrote:

> John Nielsen [j...@mcb.dk] wrote:
> > I managed to get this done. The facet queries now facets on a multivalue
> field as opposed to the dynamic field names.
>
> > Unfortunately it doesn't seem to have done much difference, if any at
> all.
>
> I am sorry to hear that.
>
> > documents = ~1.400.000
> > references 11.200.000  (we facet on two multivalue fields with each 4
> values
> > on average, so 1.400.000 * 2 * 4 = 11.200.000
> > unique values = 1.132.344 (total number of variant options across all
> clients.
> > This is what we facet on)
>
> > 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per
> field (we have 4 fields)?
>
> > I must be calculating this wrong.
>
> No, that sounds about right. In reality you need to multiply with 3 or 4,
> so let's round to 50MB/field: 1.4M documents with 2 fields with 5M
> references/field each is not very much and should not take a lot of memory.
> In comparison, we facet on 12M documents with 166M references and do some
> other stuff (in Lucene with a different faceting implementation, but at
> this level it is equivalent to Solr's in terms of memory). Our heap is 3GB.
>
> I am surprised about the lack of "UnInverted" from your logs as it is
> logged on INFO level. It should also be available from the admin interface
> under collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing
> you got your numbers from that and that the list only contains the few
> facets you mentioned previously? It might be wise to sanity check by
> summing the memSizes though; they ought to take up far below 1GB.
>
> From your description, your index is small and your faceting requirements
> modest. A SSD-equipped laptop should be adequate as server. So we are back
> to "math does not check out".
>
>
> You stated that you were unable to make a 4GB JVM OOM when you just
> performed faceting (I guesstimate that it will also run fine with just ½GB
> or at least with 1GB, based on the numbers above) and you have observed
> that the field cache eats the memory. This does indicate that the old
> caches are somehow not freed when the index is updated. That is strange as
> Solr should take care of that automatically.
>
> Guessing wildly: Do you issue a high frequency small updates with frequent
> commits? If you pause the indexing, does memory use fall back

Re: JavaScript transform switch statement during Data Import

2013-04-17 Thread paulblyth
That post lost a lot of formatting. Please find attached instead.
db-data-config.xml
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JavaScript-transform-switch-statement-during-Data-Import-tp4056340p4056649.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Example, Multi Word Search issue

2013-04-17 Thread zeroeffect
Version 4.2.0
collection1 example

I currently have indexed over 1.5 million html files, with more to come. 

Here is an issue I am running into, if I search the word mayor I get a great
list of results. 

Now if I search the word bing I get results. Searching the words together
"mayor bing" with out quotes I get zero results returned, in fact nothing
happens the pages just spins waiting for a response, then never gets one. 

If I leave the quotes I get a response showing no results and that is true
those have not been archived.

I am lost as to why it hang when searching those two terms together.
Shouldn't I get results for both words even if they are not both in a
document?

Thanks for your guidance.

ZeroEffect



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Example-Multi-Word-Search-issue-tp4056651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Pattern Tokenizer Factory not working with negation regular expression

2013-04-17 Thread meghana
Hi, 

I need my tokenizer factory , to split on everything expect numbers ,
letters , '&' , ':' and single quote character. 

I use 'PatternTokenizerFactory' as below,



but, its spiting tokens by space only . not sure what I am doing wrong in
this.

can anybody help me on this??
Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pattern-Tokenizer-Factory-not-working-with-negation-regular-expression-tp4056653.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pattern Tokenizer Factory not working with negation regular expression

2013-04-17 Thread Jack Krupansky
Hyphen indicates as character range (as in "a-z"), so if you want to include 
a hyphen as a character, escape it with a single backslash.


-- Jack Krupansky

-Original Message- 
From: meghana

Sent: Wednesday, April 17, 2013 7:58 AM
To: solr-user@lucene.apache.org
Subject: Pattern Tokenizer Factory not working with negation regular 
expression


Hi,

I need my tokenizer factory , to split on everything expect numbers ,
letters , '&' , ':' and single quote character.

I use 'PatternTokenizerFactory' as below,



but, its spiting tokens by space only . not sure what I am doing wrong in
this.

can anybody help me on this??
Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pattern-Tokenizer-Factory-not-working-with-negation-regular-expression-tp4056653.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: JavaScript transform switch statement during Data Import

2013-04-17 Thread Gora Mohanty
On 17 April 2013 17:10, paulblyth  wrote:
> That post lost a lot of formatting. Please find attached instead.
> db-data-config.xml
> 

I do not see how this could be working in either case.
Your select statement is "SELECT COLOURS FROM CLRS"
but the "column" attribute in your DIH field is "COLOUR"
(please note the missing "S" at the end). Otherwise, I do not
see an issue with the transformers.

You also do not need a transformer for this, but could use the
SELECT statement to capitalise the value returned from the
database. The exact SQL is probably database-dependent, but
for mysql, this would work:
SELECT CONCAT( UPPER( LEFT( `COLOURS`, 1 ) ), SUBSTRING( `COLOURS`, 2
)  ) AS COLOUR from CLRS

Regards,
Gora


Re: Pattern Tokenizer Factory not working with negation regular expression

2013-04-17 Thread meghana
Jack Krupansky-2 wrote
> Hyphen indicates as character range (as in "a-z"), so if you want to
> include 
> a hyphen as a character, escape it with a single backslash.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: meghana
> Sent: Wednesday, April 17, 2013 7:58 AM
> To: 

> solr-user@.apache

> Subject: Pattern Tokenizer Factory not working with negation regular 
> expression
> 
> Hi,
> 
> I need my tokenizer factory , to split on everything expect numbers ,
> letters , '&' , ':' and single quote character.
> 
> I use 'PatternTokenizerFactory' as below,
>  pattern="[^a-zA-Z0-9&-:]" />
> but, its spiting tokens by space only . not sure what I am doing wrong in
> this.
> 
> can anybody help me on this??
> Thanks
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Pattern-Tokenizer-Factory-not-working-with-negation-regular-expression-tp4056653.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Thanks Jack Krupansky, it was so stupid mistake. you saved me!! :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pattern-Tokenizer-Factory-not-working-with-negation-regular-expression-tp4056653p4056659.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Yonik Seeley
On Tue, Apr 16, 2013 at 9:44 PM, Roman Chyla  wrote:
> Is there some profound reason why the defType is not passed onto the filter
> query?

defType is a convenience so that the main query parameter "q" can
directly be the user query (without specifying it's type like
"edismax").
Filter queries are normally machine generated.

-Yonik
http://lucidworks.com


RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
John Nielsen [j...@mcb.dk]:
> I never seriously looked at my fieldValueCache. It never seemed to get used:

> http://screencast.com/t/YtKw7UQfU

That was strange. As you are using a multi-valued field with the new setup, 
they should appear there. Can you find the facet fields in any of the other 
caches?

...I hope you are not calling the facets with facet.method=enum? Could you 
paste a typical facet-enabled search request?

> Yep. We still do a lot of sorting on dynamic field names, so the field cache
> has a lot of entries. (9.411 entries as we speak. This is considerably lower
> than before.). You mentioned in an earlier mail that faceting on a field
> shared between all facet queries would bring down the memory needed.
> Does the same thing go for sorting?

More or less. Sorting stores the raw string representations (utf-8) in memory 
so the number of unique values has more to say than it does for faceting. Just 
as with faceting, a list of pointers from documents to values (1 value/document 
as we are sorting) is maintained, so the overhead is something like

#documents*log2(#unique_terms*average_term_length) + 
#unique_terms*average_term_length
(where average_term_length is in bits)

Caveat: This is with the index-wide sorting structure. I am fairly confident 
that this is what Solr uses, but I have not looked at it lately so it is 
possible that some memory-saving segment-based trickery has been implemented.

> Does those 9411 entries duplicate data between them?

Sorry, I do not know. SOLR- discusses the problems with the field cache and 
duplication of data, but I cannot infer if it is has been solved or not. I am 
not familiar with the stat breakdown of the fieldCache, but it _seems_ to me 
that there are 2 or 3 entries for each segment for each sort field. 
Guesstimating further, let's say you have 30 segments in your index. Going with 
the guesswork, that would bring the number of sort fields to 9411/3/30 ~= 100. 
Looks like you use a custom sort field for each client?

Extrapolating from 1.4M documents and 180 clients, let's say that there are 
1.4M/180/5 unique terms for each sort-field and that their average length is 
10. We thus have
1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB 
per sort field or about 4GB for all the 180 fields.

With this few unique values, the doc->value structure is by far the biggest, 
just as with facets. As opposed to the faceting structure, this is fairly close 
to the actual memory usage. Switching to a single sort field would reduce the 
memory usage from 4GB to about 55MB.

> I do commit a bit more often than i should. I get these in my log file from
> time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

So 1 active searcher and 2 warming searchers. Ignoring that one of the warming 
searchers is highly likely to finish well ahead of the other one, that means 
that your heap must hold 3 times the structures for a single searcher. With the 
old heap size of 25GB that left "only" 8GB for a full dataset. Subtract the 4GB 
for sorting and a similar amount for faceting and you have your OOM.

Tweaking your ingest to avoid 3 overlapping searchers will lower your memory 
requirements by 1/3. Fixing the facet & sorting logic will bring it down to 
laptop size.

> The control panel says that the warm up time of the last searcher is 5574. Is 
> that seconds or milliseconds?
> http://screencast.com/t/d9oIbGLCFQwl

milliseconds, I am fairly sure. It is much faster than I anticipated. Are you 
warming all the sort- and facet-fields?

> Waiting for a full GC would take a long time.

Until you have fixed the core memory issue, you might consider doing an 
explicit GC every night to clean up and hope that it does not occur 
automatically at daytime (or whenever your clients uses it).

> Unfortunately I don't know of a way to provoke a full GC on command.

VisualVM, which is delivered with the Oracle JDK (look somewhere in the bin 
folder), is your friend. Just start it on the server and click on the relevant 
process.

Regards,
Toke Eskildsen

Re: updateLog in Solr 4.2

2013-04-17 Thread vicky desai
If updateLog tag is manadatory than why is it given as a parameter in
solrconfig.xml . I mean by default it should be always writing update logs
in my data directory even if I dont use updateLog parameter in config file.
Also the same config file works for solr 4.0 but not solr 4.2

I will be logging a bug for the same



--
View this message in context: 
http://lucene.472066.n3.nabble.com/updateLog-in-Solr-4-2-tp4055548p4056665.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JavaScript transform switch statement during Data Import

2013-04-17 Thread paulblyth
Hi Gora,

Please forgive the typo. This is merely a simplified example to illustrated
the scenario (if/else and switch) we're trying to achieve; although the
values have been changed the if/else and switch statements remain as is.

The fact that the switch statement should work is the problem - it just
doesn't. The log doesn't record anything useful that I can tell (from a
limited experience), it just simply doesn't seem to execute the switch
statement at all.

Thanks,
Paul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JavaScript-transform-switch-statement-during-Data-Import-tp4056340p4056669.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
Whopps. I made some mistakes in the previous post. 

Toke Eskildsen [t...@statsbiblioteket.dk]:

> Extrapolating from 1.4M documents and 180 clients, let's say that
> there are 1.4M/180/5 unique terms for each sort-field and that their
> average length is 10. We thus have
> 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
> per sort field or about 4GB for all the 180 fields.

That would be 10 bytes and thus 80 bits. The results were correct though.

> So 1 active searcher and 2 warming searchers. Ignoring that one of
> the warming searchers is highly likely to finish well ahead of the other
> one, that means that your heap must hold 3 times the structures for
> a single searcher.

This should be taken with a grain of salt as it depends on whether or not there 
is any re-use of segments. There might be for sorting.

Apologies for any confusion,
Toke Eskildsen


Doubts about solr stats component

2013-04-17 Thread kannan rbk
Hi Team,

I am using solr for indexing data.  I need some statistics information like
"max , min , stddev" from indexed data. I read about `SolrStatsComponent`
and I used this too.

I read this line on `apache_solr_4_cookbook.pdf`

"Please be careful when using this component on the multivalued fields as
it can be a
performance bottleneck."

**My Solr Query**


http://localhost:8080/solr/daycore/select?q=*:*&stats=true&stats.field=login_attempts&rows=0

Can I use this query?

Is it affect solr performance?

Regards ,

Bharathikannan R


Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-17 Thread Dmitry Kan
Hi,

If you are not afraid of looking into the code, you could trace and
possibly fix this. Remember to commit a patch :)

Another (easier?) way is to compile a repeatable test and file a Jira.

Dmitry


On Tue, Apr 16, 2013 at 4:12 PM, juancesarvillalba <
juancesarvilla...@gmail.com> wrote:

>
>
> Hi,
>
> At moment, I am not considering store synonyms in the index, although is
> something that I have to do some time.
>
> Is "strange" that something "common" like multi-word synonyms have a bug
> with highligting but I couldn't find any solution.
>
> Thanks for your help.
>
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056305.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: updateLog in Solr 4.2

2013-04-17 Thread Mark Miller

On Apr 17, 2013, at 9:17 AM, vicky desai  wrote:

> If updateLog tag is manadatory than why is it given as a parameter in
> solrconfig.xml 

Because its not mandatory.

- Mark


Re: first time with new keyword, solr take to much time to give the result

2013-04-17 Thread Montu v Boda
Hi

Thanks For your reply.

we will try to index the permission in solr and add the filter query and try
to get optimum(150 or 100 rows) in result from the solr.

and in future we will try with SSD as well.


Thanks to all For such a great response.

Thanks & Regards
Montu v Boda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056685.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: using maven to deploy solr on tomcat

2013-04-17 Thread jnduan
hi Adeel,
I have use solr with maven since 2011,and my dependency is not solr but 
solr-core and some other dependencies .
therefore,my project structure is just like unpack the solr.war file with out 
the dir 'WEB-INF/lib'.

So I can write some code work with solr ,e.g. a listener set up system 
properties 'solr.solr.home' to the right path before solr is lunch.Even set up 
some properties for zookeeper in solr 4.2.1.

You said your solr home's source path is 'src/main/resources/solr-dev/',that 
means the solr home is 'WEB-INF/classes/solr-dev' at the runtime.Then just use 
ClassLoader.getResource() in a ServletContextListener to get the absolute path 
of solr home ,just like this:

private static final String SOLR_HOME_PROP_NAME = "solr.solr.home";

.

try {
URL url = 
this.getClass().getClassLoader().getResource("/solr-dev");
if (url != null) {
File file = new File(url.toURI());
if (file.exists()) {

System.setProperty(SOLR_HOME_PROP_NAME,file.getCanonicalPath());
logger.info(" Set /solr/home to system 
properties,key:{}, value:{}",SOLR_HOME_PROP_NAME,file.getCanonicalPath());
}
else{
logger.error("Resouce url '/solr' is not exists");
}
}
else{
logger.error("Can not locate resource url '/solr'");
}
} catch (Exception e) {
logger.error(e.getMessage(), e);
}

I wish this could help,best regards.

Duan Jienan
在 2013-4-16,上午4:33,Adeel Qureshi  写道:

> I am trying to embed solr war in an empty application to be able to use
> maven to deploy the generated war file (that includes solr war) to tomcat.
> My pom file is simple
> 
> 
> 
> org.apache.solr
> solr
> ${solr.version}
> war
> 
> 
> 
> this brings in the solr war but now i need to specify solr home to choose
> between a dev and prod directory as the solr home. So I have setup a env
> variable
> 
>  value="src/main/resources/solr-dev"/>
> 
> but this leads to absolute path of
> 
> INFO: Using JNDI solr.home: src/main/resources/solr-dev
> INFO: looking for solr.xml:
> C:\springsource\sts-2.8.1.RELEASE\src\main\resources\solr-dev\solr.xml
> 
> notice that its uses the given path as absolute path and obviously doesnt
> ends up looking inside the application directory for these config directory.
> 
> Any help would be appreciate.
> Thanks
> Adeel



Re: Rejecting document already existing in different shard.

2013-04-17 Thread Dmitry Kan
Hi,

Although we use logical sharding, there are cases in our environment as you
described. We handle them manually:

0. prepare new version of a document
1. remove the old version of the document
2. post it and commit

With logical sharding it is relatively easy, but we do need to store
location metadata in a DB.

In your case, have you had a look onto this:

http://wiki.apache.org/solr/Deduplication

Other things that come to mind: store the parameters of hashing and then
find a link between new and parameters of the "same" document.

Dmitry


On Wed, Mar 13, 2013 at 11:34 PM, Marcin Rzewucki wrote:

> Hi there,
>
> Let's say we use custom hashing algorithm and there is a document already
> indexed in "shard1". After some time the same document has changed and
> should be indexed to "shard2" (because of routing rules used in indexing
> program). It has been indexed without issues and as a result 2 "almost" the
> same documents are in different shards. In my case, they are duplicates for
> the end user. Is it possible to reject a document if it already exists in
> different shard ? It would be even easier to handle such cases prior to
> adding new with the same ID.
>
> Regards.
>


Re: Solr Example, Multi Word Search issue

2013-04-17 Thread Alexandre Rafalovitch
How are you searching? From WebUI Admin or from a client? If from a
client, check number of rows being returned. For example SolrNet asks
for 2 rows unless overruled (to force you being explicit about
your paging), so you could be stuck on results
serialization/deserialization. Try searching in WebUI and enable Debug
flag for more details.

In terms of actual searches, there is several things that could play together.
1) The default search I believe is OR rather than AND, so you might be
getting documents with _either_ of the terms
2) You don't say what type of search parse you are using. Have a read
of eDisMax and its parameters, it has a lot of tuning options
3) TextField types need to be explicitly enabled to support phrase
searches. This might be part of an issue. I believe example schema
demonstrates that.

Hopefully this helps with the next step.

Regards,
   Alex

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 17, 2013 at 7:43 AM, zeroeffect  wrote:
> Version 4.2.0
> collection1 example
>
> I currently have indexed over 1.5 million html files, with more to come.
>
> Here is an issue I am running into, if I search the word mayor I get a great
> list of results.
>
> Now if I search the word bing I get results. Searching the words together
> "mayor bing" with out quotes I get zero results returned, in fact nothing
> happens the pages just spins waiting for a response, then never gets one.
>
> If I leave the quotes I get a response showing no results and that is true
> those have not been archived.
>
> I am lost as to why it hang when searching those two terms together.
> Shouldn't I get results for both words even if they are not both in a
> document?
>
> Thanks for your guidance.
>
> ZeroEffect
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Example-Multi-Word-Search-issue-tp4056651.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Push/pull model between leader and replica in one shard

2013-04-17 Thread Mark Miller
Thanks, the earlier presentation is done with KeyNote and the later (more 
animation) is done with Tumult Hype.

- Mark

On Apr 17, 2013, at 3:43 AM, Furkan KAMACI  wrote:

> Hej Mark;
> 
> What did you use to prepare your presentation, its really nice.
> 
> 2013/4/17 Furkan KAMACI 
> 
>> Really nice presentation.
>> 
>> 
>> 2013/4/17 Mark Miller 
>> 
>>> 
>>> On Apr 16, 2013, at 1:36 AM, SuoNayi  wrote:
>>> 
 Hi, can someone explain more details about what model is used to sync
>>> docs between the lead and
 replica in the shard?
 The model can be push or pull.Supposing I have only one shard that has
>>> 1 leader and 2 replicas,
 when the leader receives a update request, does it will scatter the
>>> request to each available and active
 replica at first and then processes the request locally at last?In this
>>> case if the replicas are able to catch
 up with the leader can I think this is a push model that the leader
>>> pushes updates to it's replicas?
>>> 
>>> Currently, the leader adds the doc locally and then sends it to all
>>> replicas concurrently.
>>> 
 
 
 What happens if a replica is behind the leader?Will the replica pull
>>> docs from the leader and keep
 a track of the coming updates from the lead in a log(called tlog)?If so
>>> when it complete pulling docs
 it will replay updates in the tlog at last?
>>> 
>>> If an update forwarded from a leader to a replica fails it's likely
>>> because that replica died. Just in case, the leader will ask that replica
>>> to enter "recovery".
>>> 
>>> When a node comes up and is not a leader, it also enters "recovery".
>>> 
>>> Recovery tries to peersync from the leader, and if that fails (works if
>>> off by about 100 updates), it replicates the entire index.
>>> 
>>> If you are interested in more details on the SolrCloud architecture, I've
>>> given a few talks on it - two of them here:
>>> 
>>> http://vimeo.com/43913870
>>> http://www.youtube.com/watch?v=eVK0wLkLw9w
>>> 
>>> - Mark
>>> 
>>> 
>> 



Re: updateLog in Solr 4.2

2013-04-17 Thread Jack Krupansky
updateLog is not mandatory in general for Solr, but it is mandatory for 
"cloud mode", right?


Solrconfig mentions "solr cloud replica recovery", but doesn't explicitly 
say that's a required part of "cloud mode". Maybe just a little 
clarification in Solrconfig would help, like "solr cloud replica recovery 
(which is a required component of SolrCloud)". Or, a separate comment, like 
"The transaction log is mandatory for running SolrCloud."


-- Jack Krupansky

-Original Message- 
From: Mark Miller

Sent: Wednesday, April 17, 2013 10:01 AM
To: solr-user@lucene.apache.org
Subject: Re: updateLog in Solr 4.2


On Apr 17, 2013, at 9:17 AM, vicky desai  wrote:


If updateLog tag is manadatory than why is it given as a parameter in
solrconfig.xml


Because its not mandatory.

- Mark 



Re: using maven to deploy solr on tomcat

2013-04-17 Thread Adeel Qureshi
okay this looks promising. I will give it a try and let you know how it
goes. Thanks


On Wed, Apr 17, 2013 at 9:19 AM, jnduan  wrote:

> hi Adeel,
> I have use solr with maven since 2011,and my dependency is not solr but
> solr-core and some other dependencies .
> therefore,my project structure is just like unpack the solr.war file with
> out the dir 'WEB-INF/lib'.
>
> So I can write some code work with solr ,e.g. a listener set up system
> properties 'solr.solr.home' to the right path before solr is lunch.Even set
> up some properties for zookeeper in solr 4.2.1.
>
> You said your solr home's source path is
> 'src/main/resources/solr-dev/',that means the solr home is
> 'WEB-INF/classes/solr-dev' at the runtime.Then just use
> ClassLoader.getResource() in a ServletContextListener to get the absolute
> path of solr home ,just like this:
>
> private static final String SOLR_HOME_PROP_NAME = "solr.solr.home";
>
> .
>
> try {
> URL url =
> this.getClass().getClassLoader().getResource("/solr-dev");
> if (url != null) {
> File file = new File(url.toURI());
> if (file.exists()) {
>
> System.setProperty(SOLR_HOME_PROP_NAME,file.getCanonicalPath());
> logger.info(" Set /solr/home to system
> properties,key:{}, value:{}",SOLR_HOME_PROP_NAME,file.getCanonicalPath());
> }
> else{
> logger.error("Resouce url '/solr' is not exists");
> }
> }
> else{
> logger.error("Can not locate resource url '/solr'");
> }
> } catch (Exception e) {
> logger.error(e.getMessage(), e);
> }
>
> I wish this could help,best regards.
>
> Duan Jienan
> 在 2013-4-16,上午4:33,Adeel Qureshi  写道:
>
> > I am trying to embed solr war in an empty application to be able to use
> > maven to deploy the generated war file (that includes solr war) to
> tomcat.
> > My pom file is simple
> >
> > 
> > 
> > org.apache.solr
> > solr
> > ${solr.version}
> > war
> > 
> > 
> >
> > this brings in the solr war but now i need to specify solr home to choose
> > between a dev and prod directory as the solr home. So I have setup a env
> > variable
> >
> >  > value="src/main/resources/solr-dev"/>
> >
> > but this leads to absolute path of
> >
> > INFO: Using JNDI solr.home: src/main/resources/solr-dev
> > INFO: looking for solr.xml:
> > C:\springsource\sts-2.8.1.RELEASE\src\main\resources\solr-dev\solr.xml
> >
> > notice that its uses the given path as absolute path and obviously doesnt
> > ends up looking inside the application directory for these config
> directory.
> >
> > Any help would be appreciate.
> > Thanks
> > Adeel
>
>


Re: dataimporter.last_index_time SolrCloud

2013-04-17 Thread jimtronic
Is this a bug? I can create the ticket in Jira if it is, but it's not clear
to me what should be happening.

I noticed that if it is using the value set in the home directory, but that
value does not get updated, so my imports get slower and slower. 

I guess I could create a cron job to update that time, but this seems kind
of wonky.

Thanks!
Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimporter-last-index-time-SolrCloud-tp4055679p4056718.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Umesh Prasad
Thanks Erick.

Couple of Questions :
Our transaction logs are huge as we have disabled auto commit. The biggest
one is 6.1 GB.

*567M*autosuggest/data/tlog
*22M* avmediaCore/data/tlog
*388M*booksCore/data/tlog
*4.9G *   books/data/tlog
*6.1G *   mp3-downloads/data/tlog ( 150 % of index Size)
1*.5G*next-5/data/tlog
690Mqueries/data/tlog  ( 25 % of Index Size )
207MqueryProduct/data/tlog  (100 % of Index Size)

Btw, I am surprised by the size of transaction log, because that is a
significant amount of index size itself

2.6Gautosuggest/data/index
992MavmediaCore/data/index
12G booksCore/data/index
4.2Gmp3-downloads-new/data/index
45G next-5/data/index
2.9Gqueries/data/index
*220M*queryProduct/data/index


We use DIH and have turned off the Auto commit because we have to sometimes
build index from Scratch (clean=true) and we not want to
Our master server sees a lot of restarts, sometimes 2-3 times a day. It
polls other Data Sources for updates which are quite a few. Master
maintains a version of last committed version and can handle uncommitted
changes.

Given the frequent restarts, We can't really afford a huge start up at this
point.
 In the worst case, does Solr allow for disabling transactional log ?

Here is our Index Config


   
false

10
32
2147483647
5
1000
1

single


false
32
5


2147483647
5

false


  
  5
  
  0
 2HOUR





Thanks & Regards
Umesh Prasad



On Wed, Apr 17, 2013 at 4:57 PM, Erick Erickson wrote:

> How big are you transaction logs? They can be replayed on startup.
> They are truncated and a new one started when you do a hard commit
> (openSearcher true or false doesn't matter).
>
> So a quick test of this theory would be to just stop your indexing
> process, issue a hard commit on all your cores and _then_ try to
> restart. If it comes up immediately, you've identified your problem.
>
> Best
> Erick
>
> On Tue, Apr 16, 2013 at 8:33 AM, Umesh Prasad 
> wrote:
> > Hi,
> > We are migrating to Solr 4.2 from Solr 3.6 and Solr 4.2 is throwing
> > Exception on Restart. What is More, it take a hell lot of Time ( More
> than
> > one hour to get Up and Running)
> >
> >
> > THE exception After Restart ...
> > =
> > "Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates
> > update
> > WARNING: Unexpected log entry or corrupt log.  Entry=11
> > java.lang.ClassCastException: java.lang.Long cannot be cast to
> > java.util.List
> > at
> > org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929)
> > at
> >
> org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863)
> > at
> > org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014)
> > at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253)
> > at
> > org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
> > at
> > org.apache.solr.update.UpdateHandler.(UpdateHandler.java:137)
> > at
> > org.apache.solr.update.UpdateHandler.(UpdateHandler.java:123)
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:95)
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> > at
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> > at
> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> > at
> org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
> > at
> > org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
> > at org.apache.solr.core.SolrCore.(SolrCore.java:806)
> > at org.apache.solr.core.SolrCore.(SolrCore.java:618)
> > at
> >
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
> > at
> > org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> > at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> > at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> > at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> > at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.

Re: Solr Example, Multi Word Search issue

2013-04-17 Thread Otis Gospodnetic
Hi

You probably AND them by default.  Look at your mm value of default boolean
operator setting in solrconfig.xml

http://search-lucene.com/?q=mm+default+boolean+operator&fc_project=Solr

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Apr 17, 2013 7:43 AM, "zeroeffect"  wrote:

> Version 4.2.0
> collection1 example
>
> I currently have indexed over 1.5 million html files, with more to come.
>
> Here is an issue I am running into, if I search the word mayor I get a
> great
> list of results.
>
> Now if I search the word bing I get results. Searching the words together
> "mayor bing" with out quotes I get zero results returned, in fact nothing
> happens the pages just spins waiting for a response, then never gets one.
>
> If I leave the quotes I get a response showing no results and that is true
> those have not been archived.
>
> I am lost as to why it hang when searching those two terms together.
> Shouldn't I get results for both words even if they are not both in a
> document?
>
> Thanks for your guidance.
>
> ZeroEffect
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Example-Multi-Word-Search-issue-tp4056651.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Roman Chyla
Makes sense, thanks. One more question. Shouldn't there be a mechanism to
define a default query parser?

something like (inside QParserPlugin):

public static String DEFAULT_QTYPE = "default"; // now it
is LuceneQParserPlugin.NAME;

public static final Object[] standardPlugins = {
DEFAULT_QTYPE, LuceneQParserPlugin.class,
LuceneQParserPlugin.NAME, LuceneQParserPlugin.class,
   ...
}

in this way we can use solrconfig.xml to override the default qparser

Or does that break some assumptions?

roman



On Wed, Apr 17, 2013 at 8:34 AM, Yonik Seeley  wrote:

> On Tue, Apr 16, 2013 at 9:44 PM, Roman Chyla 
> wrote:
> > Is there some profound reason why the defType is not passed onto the
> filter
> > query?
>
> defType is a convenience so that the main query parameter "q" can
> directly be the user query (without specifying it's type like
> "edismax").
> Filter queries are normally machine generated.
>
> -Yonik
> http://lucidworks.com
>


Re: Master slave replication with digest authentication

2013-04-17 Thread Shawn Heisey

On 4/17/2013 1:20 AM, Maciej Pestka wrote:

Hi,

I've configured basic authentication on tomcat & my slave solr instance and it 
works.

Any idea how to configure slave to replicate properly with digest 
authentication?
on Solr WIKI I could find only basic authentication example:
http://wiki.apache.org/solr/SolrSecurity

solrconfig.xml (slave config):
username
password


I'm pretty sure that basic auth is all that's supported.  Right now, 
replication is the only internal solr communication that supports 
authentication.  There is an issue to add it for all internal solr 
communication.  The hope is to get it into 4.3.  I don't know how likely 
that is to happen.


https://issues.apache.org/jira/browse/SOLR-4470

As implemented, this will only support basic authentication, but the 
issue notes say that adding digest wouldn't be a huge change.  When this 
makes it into Solr (4.3 or 4.4) I wouldn't count on seeing digest until 
the next release, and only if there is enough desire in the community 
for the feature.


Thanks,
Shawn



Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Upayavira
You specify it as a default parameter for a requestHandler in your
solrconfig.xml, giving a default value for defType. Not sure that you
can set a default that will cover filter queries too.

Upayavira

On Wed, Apr 17, 2013, at 05:46 PM, Roman Chyla wrote:
> Makes sense, thanks. One more question. Shouldn't there be a mechanism to
> define a default query parser?
> 
> something like (inside QParserPlugin):
> 
> public static String DEFAULT_QTYPE = "default"; // now it
> is LuceneQParserPlugin.NAME;
> 
> public static final Object[] standardPlugins = {
> DEFAULT_QTYPE, LuceneQParserPlugin.class,
> LuceneQParserPlugin.NAME, LuceneQParserPlugin.class,
>...
> }
> 
> in this way we can use solrconfig.xml to override the default qparser
> 
> Or does that break some assumptions?
> 
> roman
> 
> 
> 
> On Wed, Apr 17, 2013 at 8:34 AM, Yonik Seeley 
> wrote:
> 
> > On Tue, Apr 16, 2013 at 9:44 PM, Roman Chyla 
> > wrote:
> > > Is there some profound reason why the defType is not passed onto the
> > filter
> > > query?
> >
> > defType is a convenience so that the main query parameter "q" can
> > directly be the user query (without specifying it's type like
> > "edismax").
> > Filter queries are normally machine generated.
> >
> > -Yonik
> > http://lucidworks.com
> >


Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Erik Hatcher
True, you cannot currently specify a default (other than the trick Roman showed 
earlier) query parser for fq parameters.  I think of the bulk of my fq's in the 
form of fq={!term f=facet_field}value so setting a default term query parser 
for fq's wouldn't really help me exactly, as it needs an f(ield) parameter 
specified uniquely for every fq.  And then there's the key/excl stuff that I'm 
increasingly see folks use for faceting and filtering so that fq's on average 
are usually pretty complex entities.  I'm not sure what a default fq query 
parser provides as a benefit to projects, so I'd love some examples.  If there 
were a "field_term" qparser that took field:value syntax and didn't require any 
other per-instance parameterization such that it split by first colon and 
created a TermQuery that'd be handy.  But with multiselect faceting, you're 
specifying an OR'd list of selections anyway and thus want a query parser that 
can do that too.  

Erik



On Apr 17, 2013, at 13:25 , Upayavira wrote:

> You specify it as a default parameter for a requestHandler in your
> solrconfig.xml, giving a default value for defType. Not sure that you
> can set a default that will cover filter queries too.
> 
> Upayavira
> 
> On Wed, Apr 17, 2013, at 05:46 PM, Roman Chyla wrote:
>> Makes sense, thanks. One more question. Shouldn't there be a mechanism to
>> define a default query parser?
>> 
>> something like (inside QParserPlugin):
>> 
>> public static String DEFAULT_QTYPE = "default"; // now it
>> is LuceneQParserPlugin.NAME;
>> 
>> public static final Object[] standardPlugins = {
>>DEFAULT_QTYPE, LuceneQParserPlugin.class,
>>LuceneQParserPlugin.NAME, LuceneQParserPlugin.class,
>>   ...
>> }
>> 
>> in this way we can use solrconfig.xml to override the default qparser
>> 
>> Or does that break some assumptions?
>> 
>> roman
>> 
>> 
>> 
>> On Wed, Apr 17, 2013 at 8:34 AM, Yonik Seeley 
>> wrote:
>> 
>>> On Tue, Apr 16, 2013 at 9:44 PM, Roman Chyla 
>>> wrote:
 Is there some profound reason why the defType is not passed onto the
>>> filter
 query?
>>> 
>>> defType is a convenience so that the main query parameter "q" can
>>> directly be the user query (without specifying it's type like
>>> "edismax").
>>> Filter queries are normally machine generated.
>>> 
>>> -Yonik
>>> http://lucidworks.com
>>> 



Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Shawn Heisey

On 4/17/2013 10:29 AM, Umesh Prasad wrote:

We use DIH and have turned off the Auto commit because we have to sometimes
build index from Scratch (clean=true) and we not want to
Our master server sees a lot of restarts, sometimes 2-3 times a day. It
polls other Data Sources for updates which are quite a few. Master
maintains a version of last committed version and can handle uncommitted
changes.

Given the frequent restarts, We can't really afford a huge start up at this
point.
  In the worst case, does Solr allow for disabling transactional log ?


Unless you are using SolrCloud, you can disable the updateLog. 
SolrCloud requires it.


There is one additional caveat - when you disable the updateLog, you 
have to switch to MMapDirectoryFactory instead of 
NRTCachingDirectoryFactory.  The NRT directory implementation will cache 
a portion of a commit (including hard commits) into RAM instead of onto 
disk.  On the next commit, the previous one is persisted completely to 
disk.  Without a transaction log, you can lose data.


My advice - keep the updateLog on, and use autoCommit with 
openSearcher=false.  It is the best way to avoid large transaction logs. 
 It sounds like you do not want the auto commits to affect query 
results, which is a reasonable goal.  You can have that even with 
autoCommit - just set openSearcher to false.  Here's an example, no need 
to stick with the numbers that I have included:



  
25000
30
false
  
  


Thanks,
Shawn



Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Mark Miller

On Apr 17, 2013, at 1:42 PM, Shawn Heisey  wrote:

> On 4/17/2013 10:29 AM, Umesh Prasad wrote:
>> We use DIH and have turned off the Auto commit because we have to sometimes
>> build index from Scratch (clean=true) and we not want to
>> Our master server sees a lot of restarts, sometimes 2-3 times a day. It
>> polls other Data Sources for updates which are quite a few. Master
>> maintains a version of last committed version and can handle uncommitted
>> changes.
>> 
>> Given the frequent restarts, We can't really afford a huge start up at this
>> point.
>>  In the worst case, does Solr allow for disabling transactional log ?
> 
> Unless you are using SolrCloud, you can disable the updateLog. SolrCloud 
> requires it.
> 
> There is one additional caveat - when you disable the updateLog, you have to 
> switch to MMapDirectoryFactory instead of NRTCachingDirectoryFactory.  The 
> NRT directory implementation will cache a portion of a commit (including hard 
> commits) into RAM instead of onto disk.  On the next commit, the previous one 
> is persisted completely to disk.  Without a transaction log, you can lose 
> data.

I don't think this is true? NRTCachingDirectoryFactory should not cache hard 
commits and should be as safe as MMapDirectoryFactory is - neither of which is 
as safe as using a tran log.

- Mark

> 
> My advice - keep the updateLog on, and use autoCommit with 
> openSearcher=false.  It is the best way to avoid large transaction logs.  It 
> sounds like you do not want the auto commits to affect query results, which 
> is a reasonable goal.  You can have that even with autoCommit - just set 
> openSearcher to false.  Here's an example, no need to stick with the numbers 
> that I have included:
> 
> 
>  
>25000
>30
>false
>  
>  
> 
> 
> Thanks,
> Shawn
> 



facet.method enum vs fc

2013-04-17 Thread Mingfeng Yang
I am doing faceting on an index of 120M documents, on the field of url,
using the following two queries.  Note that the only difference of the two
queries is that first one uses default facet.method, and the second one
uses face.method=enum.   ( each document in the index contains a review we
extracted from internet with multiple fields, and url field stands for the
link to the original web pages.  The matching document size is like 5.3
million. )

http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0

http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum

The first method gives me outofmemory error( ERROR 500: Java heap space
 java.lang.OutOfMemoryError: Java heap space), but the second one runs fine
though very slow (163 seconds)

According to the wiki and solr documentation, the default facet.method=fc
uses less memory than facet.method=enum, isn't it?

Thanks,
Ming


Re: Doubts about solr stats component

2013-04-17 Thread Gopal Patwa
please post field defination from solr schema.xml for
stats.field=login_attempts
,
it depends how you have defined stats field


Re: Max http connections in CloudSolrServer

2013-04-17 Thread Shawn Heisey

On 4/17/2013 3:46 AM, J Mohamed Zahoor wrote:

Hi

I am pumping parallel select queries using CloudSolrServer.
It looks like it can handle only certain no of  max connections...

my Question is,
How many concurrent queries does a CloudSolrServer can handle?


Looking into the code for 4.x versions, I found that the default max 
number of connections is 128, and the default max number of connections 
per host is 32.  This is set in the HttpSolrServer constructor.



An old thread tries to answer this by asking to give our own instance of 
LBHttpSolrServer...
But it looks like there is no way from LBHttpSolrServer to up the maxConnection 
of the httpClient it has...

Can someone let me know how to bump up the maxConnections and 
maxConnectionsPerHost parameter for the httpCLient used by cloudSolrServer?


You should be able to create an instance of LBHttpSolrServer, which in 
turn lets you use a custom HttpClient, where you could set the 
connection limits. Then you could use the LBHttpSolrServer object to 
create an instance of CloudSolrServer.


The following code MIGHT work (probably in a try block), but I haven't 
tried to actually use it, so it might be horribly broken.


  ModifiableSolrParams params = new ModifiableSolrParams();
  params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 1000);
  params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 200);
  HttpClient client = HttpClientUtil.createClient(params);
  LBHttpSolrServer lbServer = new LBHttpSolrServer
(client, "http://localhost/solr";);
  lbServer.removeSolrServer("http://localhost/solr";);
  SolrServer server = new CloudSolrServer(zkHost, lbServer);

I would argue that CloudSolrServer (and therefore LBHttpSolrServer) 
should have many of the setters available on HttpSolrServer, including 
setDefaultMaxConnectionsPerHost and setMaxTotalConnections.  Perhaps not 
all of them, some of them are things that most people would never really 
need, and some of them might not make sense for a clustered server object.


Side issue: shouldn't that be setMaxConnectionsPerHost instead of 
including the word Default?  If there's no objection, I would plan on 
adding the renamed method and using a typical deprecation procedure for 
the old one.


Thanks,
Shawn



Re: Spellchecker not working for Solr 4.1

2013-04-17 Thread davers
When I set distrib=false the spellchecker works perfectly. So I take it
spellchecker doesn't work in solr 4.1 in cloud mode. Does anybody know if it
works in 4.2.1?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Shawn Heisey

On 4/17/2013 11:56 AM, Mark Miller wrote:

There is one additional caveat - when you disable the updateLog, you have to 
switch to MMapDirectoryFactory instead of NRTCachingDirectoryFactory.  The NRT 
directory implementation will cache a portion of a commit (including hard 
commits) into RAM instead of onto disk.  On the next commit, the previous one 
is persisted completely to disk.  Without a transaction log, you can lose data.


I don't think this is true? NRTCachingDirectoryFactory should not cache hard 
commits and should be as safe as MMapDirectoryFactory is - neither of which is 
as safe as using a tran log.


This is based on observations of what happens with my segment files when 
I do a full-import, using autoCommit with openSearcher disabled.  I see 
that each autoCommit results in a full segment being written, the part 
of another segment.  On the next autoCommit, the rest of the files for 
the last segment are written, another full segment is written, I get 
another partial segment.  I asked about this on the list some time ago, 
and what I just told Umesh is a rehash of what I understood from Yonik's 
response.


If I'm wrong, I hope someone who knows for sure can correct me.

Thanks,
Shawn



RE: Spellchecker not working for Solr 4.1

2013-04-17 Thread Dyer, James
Spellcheck is broken when using both distributed and grouping.  The fix is 
here: https://issues.apache.org/jira/browse/SOLR-3758 .  This will be part of 
4.3, which likely will be released within the next few weeks.  In the mean time 
you can apply the patch to 4.2 or as a workaround, re-issue a spellcheck-only 
request without grouping when queries return with 0 hits.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: davers [mailto:dboych...@improvementdirect.com] 
Sent: Wednesday, April 17, 2013 1:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecker not working for Solr 4.1

When I set distrib=false the spellchecker works perfectly. So I take it
spellchecker doesn't work in solr 4.1 in cloud mode. Does anybody know if it
works in 4.2.1?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056768.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Spellchecker not working for Solr 4.1

2013-04-17 Thread davers
Thank you for the response



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: facet.method enum vs fc

2013-04-17 Thread Timothy Potter
What are your results when using facet.method=fcs?


On Wed, Apr 17, 2013 at 12:06 PM, Mingfeng Yang wrote:

> I am doing faceting on an index of 120M documents, on the field of url,
> using the following two queries.  Note that the only difference of the two
> queries is that first one uses default facet.method, and the second one
> uses face.method=enum.   ( each document in the index contains a review we
> extracted from internet with multiple fields, and url field stands for the
> link to the original web pages.  The matching document size is like 5.3
> million. )
>
>
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0
>
>
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum
>
> The first method gives me outofmemory error( ERROR 500: Java heap space
>  java.lang.OutOfMemoryError: Java heap space), but the second one runs fine
> though very slow (163 seconds)
>
> According to the wiki and solr documentation, the default facet.method=fc
> uses less memory than facet.method=enum, isn't it?
>
> Thanks,
> Ming
>


Re: Using multiple text files for Suggestor dictionarys

2013-04-17 Thread Chris Hostetter

: Is it possible to use multiple text files? I tried the following:
...
: But the second list, the cities, are apparently undetected, after
: restarting the tomcat and rebuilding the dictionary. Can this be done? If
: not, how would you recommend managing different dictionaries?

Skimming the code, it does not appear to currently be possible.  I think 
as things stand, your ownly recourse is to cat your multiple files 
together as part of whatever process you currently use to generate them.

If you'd like to open a feature request in Jira, it looks like it would be 
semi-straight forward to support specifying multiple file names. 

-Hoss


Re: facet.method enum vs fc

2013-04-17 Thread Mingfeng Yang
Does Solr 3.6 has facet.method=fcs?   I tried anyway, and got

ERROR 500: GC overhead limit exceeded  java.lang.OutOfMemoryError: GC
overhead limit exceeded.


On Wed, Apr 17, 2013 at 12:38 PM, Timothy Potter wrote:

> What are your results when using facet.method=fcs?
>
>
> On Wed, Apr 17, 2013 at 12:06 PM, Mingfeng Yang  >wrote:
>
> > I am doing faceting on an index of 120M documents, on the field of url,
> > using the following two queries.  Note that the only difference of the
> two
> > queries is that first one uses default facet.method, and the second one
> > uses face.method=enum.   ( each document in the index contains a review
> we
> > extracted from internet with multiple fields, and url field stands for
> the
> > link to the original web pages.  The matching document size is like 5.3
> > million. )
> >
> >
> >
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0
> >
> >
> >
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum
> >
> > The first method gives me outofmemory error( ERROR 500: Java heap space
> >  java.lang.OutOfMemoryError: Java heap space), but the second one runs
> fine
> > though very slow (163 seconds)
> >
> > According to the wiki and solr documentation, the default facet.method=fc
> > uses less memory than facet.method=enum, isn't it?
> >
> > Thanks,
> > Ming
> >
>


Re: Max http connections in CloudSolrServer

2013-04-17 Thread Chris Hostetter

: Side issue: shouldn't that be setMaxConnectionsPerHost instead of including
: the word Default?  If there's no objection, I would plan on adding the renamed
: method and using a typical deprecation procedure for the old one.

I think the name comes from the effect it has on the underlying HttpClient 
code ... it's possible to configure a HttpConnectionManager such that it 
has different number of max connections per host -- ie: host1 has max 
connections of 23, host2 has max connections of 45, etc  i believe 
that method just changes the "default" when there isn't something 
specifically set for an individual host..


-Hoss


Solr Caching

2013-04-17 Thread Furkan KAMACI
I've just started to read about Solr caching. I want to learn one thing.
Let's assume that I have given 4 GB RAM into my Solr application and I have
10 GB RAM. When Solr caching mechanism starts to work, does it use memory
from that 4 GB part or lets operating system to cache it from 6 GB part of
RAM that is remaining from Solr application?


Re: Solr Caching

2013-04-17 Thread Walter Underwood
On Apr 17, 2013, at 3:09 PM, Furkan KAMACI wrote:

> I've just started to read about Solr caching. I want to learn one thing.
> Let's assume that I have given 4 GB RAM into my Solr application and I have
> 10 GB RAM. When Solr caching mechanism starts to work, does it use memory
> from that 4 GB part or lets operating system to cache it from 6 GB part of
> RAM that is remaining from Solr application?

Both.

Solr manages caches of Java objects. These are stored in the Java heap.

The OS manages caches of files. These are stored in file buffers managed by the 
OS.

All are in RAM.

wunder
--
Walter Underwood
wun...@wunderwood.org





Select Queris While Merging Indexes

2013-04-17 Thread Furkan KAMACI
I see that while merging indexes (I mean optimizing via admin gui), my Solr
instance can still response select queries (as well). How that querying
mechanism works (because merging not finished yet but my Solr instance
still can return a consistent response)?


Re: solr 3.5 core rename issue

2013-04-17 Thread Shawn Heisey

On 4/16/2013 2:39 PM, Jie Sun wrote:



   
 
 
 
...
   


the command I ran was to rename from '413' to '413a'.

when i debug through solr CoreAdminHandler, I notice the persistent flag
only controls if the new data will be persisted to solr.xml or not, thus as
you can see, it did changed my solr.xml, there is no problem here.

But the index dir ends up with no change at all (still '413').


I filed an issue for this.

https://issues.apache.org/jira/browse/SOLR-4732

Thanks,
Shawn



Re: Max http connections in CloudSolrServer

2013-04-17 Thread Shawn Heisey

On 4/17/2013 3:21 PM, Chris Hostetter wrote:

I think the name comes from the effect it has on the underlying HttpClient
code ... it's possible to configure a HttpConnectionManager such that it
has different number of max connections per host -- ie: host1 has max
connections of 23, host2 has max connections of 45, etc  i believe
that method just changes the "default" when there isn't something
specifically set for an individual host..


That puts it into complete perspective, so changing the name is a bad 
idea.  I do think it might be a good idea to include HttpSolrServer's 
convenience methods in the other classes, even if there is a working 
HttpClient -> LBHttpSolrServer -> CloudSolrServer way to change things.


Thanks,
Shawn



Re: Select Queris While Merging Indexes

2013-04-17 Thread Shawn Heisey

On 4/17/2013 4:28 PM, Furkan KAMACI wrote:

I see that while merging indexes (I mean optimizing via admin gui), my Solr
instance can still response select queries (as well). How that querying
mechanism works (because merging not finished yet but my Solr instance
still can return a consistent response)?


Lucene index segments never change after they are fully committed.  When 
you merge or optimize (explicit fullMerge), *new* segments are created, 
and the old ones are not deleted until the entire operation is complete.


The existing Searcher object that serves queries uses the old segments, 
which never change.  A new Searcher object is created that uses the new 
segments, and if the operation is not a full merge, also uses some of 
the old segments.


Any old segments that are no longer required are deleted when the old 
Searcher closes.


Thanks,
Shawn



Re: Select Queris While Merging Indexes

2013-04-17 Thread Jack Krupansky

"merging indexes"

The proper terminology is "merging segments".

Until the new, merged segment is complete, the existing segments remain 
untouched and readable.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Wednesday, April 17, 2013 6:28 PM
To: solr-user@lucene.apache.org
Subject: Select Queris While Merging Indexes

I see that while merging indexes (I mean optimizing via admin gui), my Solr
instance can still response select queries (as well). How that querying
mechanism works (because merging not finished yet but my Solr instance
still can return a consistent response)? 



Query Elevation Component

2013-04-17 Thread davers
I would like to use the Query Elevation Component. As I understand it only
elevates based on term. I would also like it to consider the list of fq
parameters. Well really just one fq parameter. ex (fq=siteid:4) since I used
the same solr index for many sites. Is something like this available
already? If not where would I start looking to code this feature myself? Any
help is appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimporter.last_index_time SolrCloud

2013-04-17 Thread Chris Hostetter

: Is this a bug? I can create the ticket in Jira if it is, but it's not clear
: to me what should be happening.

It certainly sounds like it, but i too am not certian what is actaully 
suppose to be happening here, or why it changed.

Please open a jira with the details of your DIH requestHandler config, as 
well as your data-config.xml, and what you see in your instance dir & 
zookeeper before and after running an import (ie: do you already have a 
dataimport.properties on disk or in ZK? does it get created automaticly in 
on disk or in zk? what do the contents look like before and after each 
run? etc...)


-Hoss


Re: solr 3.5 core rename issue

2013-04-17 Thread Jie Sun
thanks Shawn for filing the issue.

by the way my solrconfig.xml has:
 
${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name}

For now I will have to shutdown solr and write a script to modify the
solr.xml manually and rename the core data directory to new one.

by the way when I try to remove a core using unload (I am using solr 3.5):

.../solr/admin/cores?action=UNLOAD&core=4130&deleteIndex=true 

it removes the core from solr.xml, but it leaves the data directory '413',
but the index subfolder under 413 is removed, however there are
spellchecker1 and  spellchecker2 still remain. 

Do you know why?
thanks
Jie




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425p4056865.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query Elevation Component

2013-04-17 Thread Upayavira
Perhaps you should describe the problem you are tryin to solve. There
may be other ways to solve it.

Upayavira

On Thu, Apr 18, 2013, at 01:08 AM, davers wrote:
> I would like to use the Query Elevation Component. As I understand it
> only
> elevates based on term. I would also like it to consider the list of fq
> parameters. Well really just one fq parameter. ex (fq=siteid:4) since I
> used
> the same solr index for many sites. Is something like this available
> already? If not where would I start looking to code this feature myself?
> Any
> help is appreciated.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 3.5 core rename issue

2013-04-17 Thread Shawn Heisey
On 4/17/2013 7:07 PM, Jie Sun wrote:
> thanks Shawn for filing the issue.
> 
> by the way my solrconfig.xml has:
>  
> ${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name}
> 
> For now I will have to shutdown solr and write a script to modify the
> solr.xml manually and rename the core data directory to new one.
> 
> by the way when I try to remove a core using unload (I am using solr 3.5):
> 
> .../solr/admin/cores?action=UNLOAD&core=4130&deleteIndex=true 
> 
> it removes the core from solr.xml, but it leaves the data directory '413',
> but the index subfolder under 413 is removed, however there are
> spellchecker1 and  spellchecker2 still remain. 

The dataDir option you have in solrconfig.xml completely explains why
this is happening.  One detail regarding RENAME and SWAP is that the
'solr.core.name' property is never updated unless you completely restart
Solr.  I have closed the SOLR-4372 as invalid.  This is not a bug, it's
a side-effect of how the CoreAdmin API works.

For RENAME and SWAP to work correctly through a Solr restart, you will
need to include the dataDir option in all your core definitions in
solr.xml and remove it from solrconfig.xml.  If dataDir is specified in
solr.xml, then the correct dataDir will be associated with the correct
core after restart.

You won't be able to use solr.core.name in your dataDir tags, because
that doesn't exist at the solr.xml level, and even if it did exist, it
would be wrong after a restart, and you would have the same problem
you're having now.

When you rename a core, it will always retain the old dataDir.  I am
pretty sure there is no way to fix this, but if I'm wrong, I'm sure that
someone will let me know.

I ran into the problem with solr.core.name in 3.5.0, because my
PingRequestHandler enable/disable functionality included this property
in the enable filename, and it would always use the old one until Solr
was restarted.  Solr 4.x solved this problem for me by moving the enable
file from the servlet container's current working directory to dataDir.

When you use the deleteIndex parameter on a core UNLOAD, the index is
all it will delete.  Solr 4.0 added deleteDataDir and deleteInstanceDir
parameters.

http://wiki.apache.org/solr/CoreAdmin#UNLOAD

Thanks,
Shawn



Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
> That was strange. As you are using a multi-valued field with the new
setup, they should appear there.

Yes, the new field we use for faceting is a multi valued field.

> Can you find the facet fields in any of the other caches?

Yes, here it is, in the field cache:

http://screencast.com/t/mAwEnA21yL

> I hope you are not calling the facets with facet.method=enum? Could you
paste a typical facet-enabled search request?

Here is a typical example (I added newlines for readability):

http://172.22.51.111:8000/solr/default1_Danish/search
?defType=edismax
&q=*%3a*
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_7+key%3ditemvariantoptions_int_mv_7%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_9+key%3ditemvariantoptions_int_mv_9%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_8+key%3ditemvariantoptions_int_mv_8%7ditemvariantoptions_int_mv
&facet.field=%7b!ex%3dtagitemvariantoptions_int_mv_2+key%3ditemvariantoptions_int_mv_2%7ditemvariantoptions_int_mv
&fq=site_guid%3a(10217)
&fq=item_type%3a(PRODUCT)
&fq=language_guid%3a(1)
&fq=item_group_1522_combination%3a(*)
&fq=is_searchable%3a(True)
&sort=item_group_1522_name_int+asc, variant_of_item_guid+asc
&querytype=Technical
&fl=feed_item_serialized
&facet=true
&group=true
&group.facet=true
&group.ngroups=true
&group.field=groupby_variant_of_item_guid
&group.sort=name+asc
&rows=0

> Are you warming all the sort- and facet-fields?

I'm sorry, I don't know. I have the field value cache commented out in my
config, so... Whatever is default?

Removing the custom sort fields is unfortunately quite a bit more difficult
than my other facet modification.

The problem is that each item can have several sort orders. The sort order
to use is defined by a group number which is known ahead of time. The group
number is included in the sort order field name. To solve it in the same
way i solved the facet problem, I would need to be able to sort on a
multi-valued field, and unless I'm wrong, I don't think that it's possible.

I am quite stomped on how to fix this.




On Wed, Apr 17, 2013 at 3:06 PM, Toke Eskildsen wrote:

> John Nielsen [j...@mcb.dk]:
> > I never seriously looked at my fieldValueCache. It never seemed to get
> used:
>
> > http://screencast.com/t/YtKw7UQfU
>
> That was strange. As you are using a multi-valued field with the new
> setup, they should appear there. Can you find the facet fields in any of
> the other caches?
>
> ...I hope you are not calling the facets with facet.method=enum? Could you
> paste a typical facet-enabled search request?
>
> > Yep. We still do a lot of sorting on dynamic field names, so the field
> cache
> > has a lot of entries. (9.411 entries as we speak. This is considerably
> lower
> > than before.). You mentioned in an earlier mail that faceting on a field
> > shared between all facet queries would bring down the memory needed.
> > Does the same thing go for sorting?
>
> More or less. Sorting stores the raw string representations (utf-8) in
> memory so the number of unique values has more to say than it does for
> faceting. Just as with faceting, a list of pointers from documents to
> values (1 value/document as we are sorting) is maintained, so the overhead
> is something like
>
> #documents*log2(#unique_terms*average_term_length) +
> #unique_terms*average_term_length
> (where average_term_length is in bits)
>
> Caveat: This is with the index-wide sorting structure. I am fairly
> confident that this is what Solr uses, but I have not looked at it lately
> so it is possible that some memory-saving segment-based trickery has been
> implemented.
>
> > Does those 9411 entries duplicate data between them?
>
> Sorry, I do not know. SOLR- discusses the problems with the field
> cache and duplication of data, but I cannot infer if it is has been solved
> or not. I am not familiar with the stat breakdown of the fieldCache, but it
> _seems_ to me that there are 2 or 3 entries for each segment for each sort
> field. Guesstimating further, let's say you have 30 segments in your index.
> Going with the guesswork, that would bring the number of sort fields to
> 9411/3/30 ~= 100. Looks like you use a custom sort field for each client?
>
> Extrapolating from 1.4M documents and 180 clients, let's say that there
> are 1.4M/180/5 unique terms for each sort-field and that their average
> length is 10. We thus have
> 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB
> per sort field or about 4GB for all the 180 fields.
>
> With this few unique values, the doc->value structure is by far the
> biggest, just as with facets. As opposed to the faceting structure, this is
> fairly close to the actual memory usage. Switching to a single sort field
> would reduce the memory usage from 4GB to about 55MB.
>
> > I do commit a bit more often than i should. I get these in my log file
> from
> > time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>
> So 1 active searcher and 2 warming searcher

Solr 4.2 fl issue

2013-04-17 Thread William Bell
We are getting an issue when using a GUID got a field in Solr 4.2. Solr 3.6
is fine. Something like:

fl=098765-765-788558-7654_userid as a string stored.

The issue is when the GUID is begging with numeric and then a minus.

This is a bug

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Lucene Sorting

2013-04-17 Thread pankaj.pandey4
Hi,

We are facing sorting issue on the data indexed using Solr. Below is the sample 
code. Problem is, data returned by the below code is not properly sorted i.e. 
there's no ordering of data. Can anyone assist me on this?

TopDocs topDocs = null;
  Directory directory = FSDirectory.open(indexDir);
  IndexSearcher searcher = new IndexSearcher(IndexReader.open(directory));
Sort column = new Sort(new SortField(sortColumn, SortField.STRING, reverse));
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
queryParser = new QueryParser(Version.LUCENE_36, fieldName, analyzer);
  queryParser.setAllowLeadingWildcard(true);
  queryParser.setDefaultOperator(Operator.AND);
topDocs = searcher.search(queryParser.parse(queryStr), filter, maxHits, column);

Thanks!

Regards,
Pankaj

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.

www.wipro.com


Re: facet.method enum vs fc

2013-04-17 Thread Toke Eskildsen
On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> I am doing faceting on an index of 120M documents, 
> on the field of url[...]

I would guess that you would need 3-4GB for that.
How much memory do you allocate to Solr?

- Toke Eskildsen