Re: Configure SolrCloud for Loadbalance for .net client

2016-06-03 Thread shivendra.tiwari

Hi Mikhail,

We are using SolrNet to communicate Solr to .net app. You can get this fork 
from following location:

https://github.com/mausch/SolrNet

I hope it will help

Warm Regards!
Shivendra Kumar Tiwari

-Original Message- 
From: Mikhail Khludnev

Sent: Friday, June 03, 2016 12:23 PM
To: solr-user
Subject: Re: Configure SolrCloud for Loadbalance for .net client

Hello,

How does it work now? Do you have a list of slaves configured on a client
app? Btw what do you use to call Solr from .net?
01 июня 2016 г. 14:08 пользователь "shivendra.tiwari" <
shivendra.tiw...@arcscorp.net> написал:


Hi,

I have to configure SolrCloud for loadbalance on .net application please
suggest what we have to needed and how to configure it. We are currently
working on lower version of Solr with Master and Slave concept.

Please suggest.


Warm Regards!
Shivendra Kumar Tiwari 





RE: Small setFacetLimit() terminates Solr

2016-06-03 Thread Markus Jelsma
I'll have a look at it!
Thanks guys!

Markus


 
-Original message-
> From:Toke Eskildsen 
> Sent: Thursday 2nd June 2016 15:49
> To: solr-user@lucene.apache.org
> Subject: Re: Small setFacetLimit() terminates Solr
> 
> On Thu, 2016-06-02 at 09:26 -0400, Yonik Seeley wrote:
> > My guess would be that the smaller limit causes large facet refinement
> > requests to be sent out on the second phase.
> > It's not clear what's happening after that though (i.e. why that
> > causes things to crash)
> 
> The facet refinement can be a lot heavier than the initial call. For
> some of our queries (with unpatched Solr), we observed that it took 10
> times as long.
> 
> 
> Markus: You are hitting Solr in a way that scales very poorly. Maybe you
> can use export instead?
> https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
> 
> 
> If you really need the faceting with full counts & everything, consider
> switching to a single-shard (and multiple replicas) setup as that
> removes the need for the refinement phase.
> 
> 
> - Toke Eskildsen, State and University Library, Denmark
> 
> 
> 


Solr 4.10.3 goes into recovery for unknown reason

2016-06-03 Thread Mark Christiaens
We have a SolrCloud setup (within a Cloudera deployment) using Solr
4.10.3.  The Solr cluster consists of 2 nodes.  Both their backing store is
on HDFS (not on the local file system).  Once every 1-2 weeks, the system
goes into recovery without apparent reason.  Digging through the logs it
looks something like this:

>From time to time, when the load is high, the leader/replica election goes
wrong. We see that

May 31, 10:31:46.524 AM ERROR org.apache.solr.core.SolrCore
org.apache.solr.common.SolrException: ClusterState says we are the leader (
http://grbbd1nodp06.core.local:8983/solr/lily_entity_CUSTOMER_shard1_replica2),
but locally we don't think so. Request came from null

When that happens, the cluster starts recovering:

May 31, 10:33:04.621 AM INFO org.apache.solr.cloud.RecoveryStrategy
Publishing state of core lily_entity_CUSTOMER_shard2_replica2 as
recovering, leader is
http://grbbd1nodp05.core.local:8983/solr/lily_entity_CUSTOMER_shard2_replica1/
and I am
http://grbbd1nodp06.core.local:8983/solr/lily_entity_CUSTOMER_shard2_replica2/

Apparently, that doesn't go too smoothly either. First it tries something
called "PeerSync" that fails and then Solr goes to "replication". I suspect
that "PeerSync" is a recovery strategy where the last N updates are
transferred from one node to other in order for that other node to catch
up. And "replication" probably is copying the entire contents of a node to
another.

PeerSync Recovery was not successful - trying replication.
core=lily_models_shard2_replica1
grbbd1nodp06.core.local INFO May 31, 2016 10:32 AM RecoveryStrategy
Starting Replication Recovery. core=lily_models_shard2_replica1

Apparently the recovery of a Solr node happens piecemeal. For some pieces,
the replication or PeerSync seems to fail, for others it does not fail.

Wait 2.0 seconds before trying to recover again (1)
grbbd1nodp06.core.local ERROR May 31, 2016 10:32 AM RecoveryStrategy
Error while trying to recover:org.apache.solr.common.SolrException:
Replication for recovery failed.
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:168)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:448)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:237)
grbbd1nodp06.core.local ERROR May 31, 2016 10:32 AM RecoveryStrategy
Recovery failed - trying again... (0)
core=lily_entity_CUSTOMER_shard2_replica2
grbbd1nodp06.core.local ERROR May 31, 2016 10:32 AM ReplicationHandler
SnapPull failed :org.apache.solr.common.SolrException: Index fetch failed :
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:573)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:310)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:349)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:165)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:448)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:237)
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://grbbd1clup01-ns/solr/lily_entity_CUSTOMER/core_node2/data/index.20160429093321915/segments_8hp9
at
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1218)
at
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
at
org.apache.solr.store.hdfs.HdfsDirectory$HdfsIndexInput.(HdfsDirectory.java:205)
at
org.apache.solr.store.hdfs.HdfsDirectory.openInput(HdfsDirectory.java:136)
at
org.apache.solr.store.blockcache.BlockDirectory.openInput(BlockDirectory.java:124)
at
org.apache.solr.store.blockcache.BlockDirectory.openInput(BlockDirectory.java:144)
at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:198)
at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:341)
at org.apache.solr.handler.SnapPuller.hasUnusedFiles(SnapPuller.java:623)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:456)
... 5 more

After 10 minutes or so, the system does seem to recover.

I dug through the Solr bugs but could not find this exact issue.  I'm
particularly intrigued by the reason why the leader/replica election seems
to go wrong.  Any suggestions?


clustering in solr

2016-06-03 Thread Mugeesh Husain
Hello everyone,

I am looking for predefined set of categories in job search application, I
looked over clustering in solr.

please suggest for clustering based categories or share any good link.


can I achieved this using apache mahout ?


Thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/clustering-in-solr-tp4280491.html
Sent from the Solr - User mailing list archive at Nabble.com.


distinc date format on parameter lastModified in LukeRequestHandler

2016-06-03 Thread Miguel Valencia Zurera

hi everybody

I have two instalation of apache solr 3.5.0 and when I consult web page 
"/admin/luke" I see that parameter lastmodified have distinct format in 
both.

The first show: 2016-05-20T13:03:03Z
and the second show: name="lastModified">2016-05-20T13:03:03.593Z


why the second solr show miliseconds in lastmodified parameter?, Is 
possible configure the format of this parameter?


Thanks


Re: Indexing date types

2016-06-03 Thread Emir Arnautovic

Hi Steve,
The best way to make sure everything work is to test, but without 
testing on target version, my answers would be:
1. if Solr accepts date without time it'll be the same as time 00:00:00 
so if it does not accept, you can always append.
2. it'll work just expect that sum of facet count can be larger than 
total doc since same doc will count in more than one bucket.
3. doc values work only on Str and Trie fields and question is why you 
need DateRangeField - are you indexing ranges or points in time? If it 
is just multiple points, you can use TrieDateField with default 
precision to enable fast range queries.


HTH,
Emir

On 02.06.2016 18:10, Steven White wrote:

I forgot to mention another issue I run into.  Looks like "docValues" is
not supported with DateRangeField, is this true?

If I have:

 
 

Solr will fail to start, reporting the following error:

 org.apache.solr.core.CoreContainer; Error creating core [openpages]:
Could not load conf for core openpages: Field type
dateRange{class=org.apache.solr.schema.DateRangeField,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={class=solr.DateRangeField}}
does not support doc values.

I have to remove "docValues" to fix this.  Is this the case or have I
missed something?

Thanks.

Steve

On Thu, Jun 2, 2016 at 11:46 AM, Steven White  wrote:


Hi everyone,

This is two part question about date in Solr.

Question #1:

My understanding is, in order for me to index date types, the date data
must be formatted and indexed as such:

 -MM-DDThh:mm:ssZ

What if I do not have the time part, should I be indexing it as such and
still get all the features of facet search on date (obviously, excluding
time):

 -MM-DD

I have setup my Solr schema as such to index dates:

 
 

Question #2:

Per the above schema design, I will be indexing my date type as
"multiValued" which, as you know, more than 1 date data will be indexed
into the field "other_dates".  Will this be a problem when I facet search
on this field?  That is, will all the date facet capability still work,
such as range and math per
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
(obviously, excluding time)?

Thanks in advance.

Steve



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Toke Eskildsen
On Thu, 2016-06-02 at 18:14 -0700, Erick Erickson wrote:
> But memory is an ongoing struggle I'm afraid.

With fear of going too far into devel-territory...


There are several places in Solr where memory usage if far from optimal
with high-cardinality data and where improvements can be made without
better GC or off-heap.

Some places it is due to "clean object oriented" programming, for
example with priority queues filled with objects, which gets very GC
expensive for 100K+ entries. Some of this can be remedied by less clean
coding and bit-hacking, but often results in less-manageable code.

https://sbdevel.wordpress.com/2015/11/13/the-ones-that-got-away/


Other places it is large arrays that are hard to avoid, for example with
docID-bitmaps and counter-arrays for String faceting. These put quite a
strain on GC as they are being allocated and released all the time.
Unless the index is constantly updated, DocValues does not help much
with GC as the counters are the same, DocValues or not.

The layout of these structures is well-defined: As long as the Searcher
has not been re-opened, each new instance of an array is of the exact
same size as the previous one. When the searcher is re-opened, all the
sizes changes. Putting those structures off-heap is one solution,
another is to re-use the structures.

Our experiments with re-using faceting counter structures has been very
promising (far less GC, lower response times). I would think that the
same would be true for a similar docID-bitmap re-use scheme.


So yes, very much an on-going struggle, but one where there are multiple
known remedies. Not necessarily easy to implement though.

- Toke Eskildsen, State and Univeristy Library, Denmark




Re: SOLR cloud sharding

2016-06-03 Thread Susheel Kumar
Also not sure about your domain but you may want to double check if you
really need 350 fields for searching & storing. Many times when you
challenge this against the higher cost of hardware, you may be able to
reduce # of searchable / stored fields.

Thanks,
Susheel

On Thu, Jun 2, 2016 at 9:21 AM, Shawn Heisey  wrote:

> On 6/2/2016 1:28 AM, Selvam wrote:
> > We need to run a heavy SOLR with 300 million documents, with each
> > document having around 350 fields. The average length of the fields
> > will be around 100 characters, it may have date and integers fields as
> > well. Now we are not sure whether to have single server or run
> > multiple servers (for each node/shards?). We are using Solr 5.5 and
> > want best performance. We are new to SolrCloud, I would like to
> > request your inputs on how many nodes/shards we need to have and how
> > many servers for best performance. We primarily use geo-statial search.
>
> The really fast answer, which I know isn't really an answer, is this:
>
>
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> This is *also* the answer if I take time to really think about it ...
> and I do realize that none of this actually helps you.  You will need to
> prototype.  Ideally, your prototype should be the entire index.
> Performance will generally not scale linearly, so if you make decisions
> based on a small-scale prototype, you might find that you don't have
> enough hardware.
>
> The answer will be *heavily* influenced by how many of those 350 fields
> will be used for searching, sorting, faceting, etc.  It will also be
> influenced by the complexity of the queries, how fast the queries must
> complete, and how many queries per second the cluster must handle.
>
> With the information you have supplied, your whole index is likely to be
> in the 10-20TB range.  Performance on an index that large, even with
> plenty of hardware and good tuning, is probably not going to be
> stellar.  You are likely to need several terabytes of total RAM (across
> all servers) to achieve reasonable performance *on a single copy*.  If
> you want two copies of the index for high availability, your RAM
> requirements will double.  Handling an index this size is not going to
> be inexpensive.
>
> An unavoidable fact about Solr performance:  For best results, Solr must
> be able to read critical data entirely from RAM for queries.  If it must
> go to disk, then performance will not be optimal -- disks are REALLY
> slow.  Putting the data on SSD will help, but even SSD storage is quite
> a lot slower than RAM.
>
> For *perfect* performance, the index data on a server must fit entirely
> into unallocated memory -- which means memory beyond the Java heap and
> the basic operating system requirements.  The operating system (not
> Java) will automatically handle caching the index in this available
> memory.  This perfect situation is usually not required in practice,
> though -- the *entire* index is not needed when you do a query.
>
> Here's something I wrote about the topic of Solr performance.  It is not
> as comprehensive as I would like it to be, because I have tried to make
> it relatively concise and useful:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Thanks,
> Shawn
>
>


Re: Configure SolrCloud for Loadbalance for .net client

2016-06-03 Thread Shawn Heisey
On 6/2/2016 11:39 PM, shivendra.tiwari wrote:
> Accually, I am using fork provided by SolrNet for Cloud from here
> https://github.com/vladen/SolrNet  but unable to communicate from
> zookeeper. Do you have any idea it is stable for SolrCloud. I am using
> SolrNet for simple master and slave it is working fine but for cloud
> mode unable to understand what I have to use. 

SolrNet is a third-party software product that was not written by the
Solr project.  We have no idea how well it works or how stable it is. 
Since the software you are using is a fork, it might be more stable than
the original, or it might be less stable.

I think that Solr clients for languages other than Java probably
*should* be maintained by the Solr project, but we are already very busy
maintaining and improving the existing software.  Interested
contributors are always welcome.

Thanks,
Shawn



find stores with sales of > $x in last 2 months ?

2016-06-03 Thread Allison, Timothy B.
All,
  This is a toy example, but is there a way to search for, say, stores with 
sales of > $x in the last 2 months with Solr?
  $x and the time frame are selected by the user at query time.  

If the queries could be constrained (this is still tbd), I could see updating 
"stats" fields within each store document on a daily basis (sales_last_1_month, 
sales_last_2_months, sales_last_3_months...etc).  The dataset is fairly small 
and daily updates of this nature would not be prohibitive.

   Or, is this trying to use a screw driver where a hammer is required?
 
   Thank you.

   Best,

 Tim


Re: distinc date format on parameter lastModified in LukeRequestHandler

2016-06-03 Thread Shawn Heisey
On 6/3/2016 5:03 AM, Miguel Valencia Zurera wrote:
> I have two instalation of apache solr 3.5.0 and when I consult web
> page "/admin/luke" I see that parameter lastmodified have distinct
> format in both.
> The first show: 2016-05-20T13:03:03Z
> and the second show:  name="lastModified">2016-05-20T13:03:03.593Z
>
> why the second solr show miliseconds in lastmodified parameter?, Is
> possible configure the format of this parameter? 

On the first one, the number of milliseconds is zero, so Solr removes it
from the display.  This is not unusual.  They are both correctly formatted.

One thing that I have seen Solr do is display ".01" for 10 milliseconds
or ".1" for 100 milliseconds ... which confuses people and software,
even though it is technically correct.  This might have been fixed in a
later release, but I am not sure.

Thanks,
Shawn



RE: [E] Re: Question(s) about Highlighting

2016-06-03 Thread Jamal, Sarfaraz
Good Morning Alessandro,

I verified it through the analysis tool (thanks for pointing it out), and it 
appears to be working correctly - As I see all of them as being synonyms of 
each other for this entry:

sasjamal, sarfaraz, sas

- When I do it only at indexing time, and disable it during query time (editing 
the synonyms.txt file SOLR6) -
It does not treat them equally

When I do it at indexing and query time, it seems to work - but the highlight 
snippets stop working.

I believe it is working, MINUS the highlighting/snippets if that makes sense?

Thanks

Sarfaraz Jamal (Sas)
Revenue Assurance Tech Ops
614-560-8556
sarfaraz.ja...@verizonwireless.com

-Original Message-
From: Alessandro Benedetti [mailto:abenede...@apache.org] 
Sent: Thursday, June 2, 2016 5:41 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Question(s) about Highlighting

Hi Jamal,
I assume you are using the Synonym token filter.
From the observation I can assume you are using it only at indexing time.
This means that when you index you are  :

1) given a row in the synonym.txt you index all the terms per row in place of 
any of the term in the row .

2) given any of the term in the left side of the expression, you index the term 
in the right side of the expression

You can verify this easily with the analysis tool in the Solr UI .



On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> I am having some difficulty understanding how to do something and if 
> it is even possible
>
> I have tried the following sets of Synonyms:
>
> 1.  sarfaraz, sas, sasjamal
> 2.  sasjamal,sas => Sarfaraz
>
> In the second instance, any searches with the world 'sasjamal' do not 
> appear in the results, as it has been converted to Sarfaraz (I 
> believe) -
>

This means you don't use the same synonym.txt at query time. indeed sasjamal is 
not in the index at all.


> In the first instance it works better - I believe all instances of any 
> of those words  appear in the results. However the highlighted 
> snippets also stop working when any of those words are Matched. Is 
> there any documentation, insights or help about this issue?
>

I should verify that, it could be related the term offset.
Please take a look to the analysis tool as well to understand better how the 
offsets are assigned.
I remember long time ago there was a discussion about it and a bug or similar 
raised.

Cheers

>
> Thanks in advance,
>
> Sas
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Thursday, June 2, 2016 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: MongoDB and Solr - Massive re-indexing
>
> On 6/2/2016 11:56 AM, Robert Brown wrote:
> > My question is whether sending batches of 1,000 documents to Solr is 
> > still beneficial (thinking about docs that may not change), or if I 
> > should look at the MongoDB connector for Solr, based on the volume 
> > of incoming data we see.
> >
> > Would the connector still see all docs updating if I re-insert them 
> > blindly, and thus still send all 50m documents back to Solr everyday 
> > anyway?
> >
> > Is my setup quite typical for the MongoDB connector?
>
> Sending update requests to Solr containing batches of 1000 docs is a 
> good idea.  Depending on how large they are, you may be able to send 
> even more than 1000.  If you can avoid sending documents that haven't 
> changed, Solr will likely perform better and relevance scoring will be 
> better, because you won't have as many deleted docs.
>
> The mongo connector is not software from the Solr project, or even 
> from Apache.  We don't know anything about it.  If you have questions 
> about that software, please contact the people who maintain it.  If 
> their answers lead to questions about Solr itself, then you can bring those 
> back here.
>
> Thanks,
> Shawn
>
>


--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


AtomicUpdateDocumentMerger Unknown operation for the an atomic update, operation ignored

2016-06-03 Thread Markus Jelsma
Hi,

Just now i indexed ~15k doc to a newly made core and shema, running 6.0 local 
this time. It was just regular indexing, nothing fancy and very small 
documents. Then the following popped up in the logs:

2496200 WARN  (qtp97730845-17) [   x:documents] 
o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an atomic 
update, operation ignored: 3932885930
2496201 WARN  (qtp97730845-17) [   x:documents] 
o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an atomic 
update, operation ignored: 3688877580
2496201 WARN  (qtp97730845-17) [   x:documents] 
o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an atomic 
update, operation ignored: 1424679833
2496202 WARN  (qtp97730845-17) [   x:documents] 
o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an atomic 
update, operation ignored: 2688901972
2496203 WARN  (qtp97730845-17) [   x:documents] 
o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an atomic 
update, operation ignored: 3932885930
2496204 WARN  (qtp97730845-17) [   x:documents] 
o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an atomic 
update, operation ignored: 1424679833
2496204 WARN  (qtp97730845-17) [   x:documents] 
o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an atomic 
update, operation ignored: 3932885930
2496205 WARN  (qtp97730845-17) [   x:documents] 
o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an atomic 
update, operation ignored: 795889345
...

There were a total of 24001 while i only indexed ~15k documents. To verify, i 
cleared the core and tried again, the warnings popped up again. To verify it 
only happens to an empty core, i reindexed the the same set of documents to the 
already filled core and it happened again. To make sure this is not suddenly 
happening to all cores in this instance, i indexed, via the same process, a 
bunch of different documents to another core. But that core is not affected.

The documents in the index seem fine, their contents are as expected.The same 
Solr instance has a lot more different cores but usually with similar schema's. 
The SolrInputDocument construction and indexing process is identical for each 
core / schema. Solrconfig is mostly the same, Solr specific stuff is just 
identical.

Any hints on where to look?

Many thanks!
Markus


Re: Configure SolrCloud for Loadbalance for .net client

2016-06-03 Thread Mikhail Khludnev
I briefly skimmed through SolrNet doc. I see that it's can be pointed to
solr http endpoint. You can do the same with SolrCloud nodes. You can start
from hitting any node for indexing and searching, all cloud nodes are
ambivalent.

On Fri, Jun 3, 2016 at 11:24 AM, shivendra.tiwari <
shivendra.tiw...@arcscorp.net> wrote:

> Hi Mikhail,
>
> We are using SolrNet to communicate Solr to .net app. You can get this
> fork from following location:
> https://github.com/mausch/SolrNet
>
> I hope it will help
>
> Warm Regards!
> Shivendra Kumar Tiwari
>
> -Original Message- From: Mikhail Khludnev
> Sent: Friday, June 03, 2016 12:23 PM
> To: solr-user
> Subject: Re: Configure SolrCloud for Loadbalance for .net client
>
> Hello,
>
> How does it work now? Do you have a list of slaves configured on a client
> app? Btw what do you use to call Solr from .net?
> 01 июня 2016 г. 14:08 пользователь "shivendra.tiwari" <
> shivendra.tiw...@arcscorp.net> написал:
>
> Hi,
>>
>> I have to configure SolrCloud for loadbalance on .net application please
>> suggest what we have to needed and how to configure it. We are currently
>> working on lower version of Solr with Master and Slave concept.
>>
>> Please suggest.
>>
>>
>> Warm Regards!
>> Shivendra Kumar Tiwari
>>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Stemming and Managed Schema

2016-06-03 Thread Jamal, Sarfaraz
Hi Guys,

I found the following article:
http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/

And I want to do stemming on one of our fields.

However, I am using a Managed Schema and I am unsure how to add these two 
blocks to it -

I know there is an API for managed schemas, would that support these additions?

Thanks!

Sas


Re: Stemming and Managed Schema

2016-06-03 Thread Andrea Gazzarini
Sure, this is the API reference [1] where you can see, you can add types 
and fields


Andrea

[1] https://cwiki.apache.org/confluence/display/solr/Schema+API


On 03/06/16 17:07, Jamal, Sarfaraz wrote:

Hi Guys,

I found the following article:
http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/

And I want to do stemming on one of our fields.

However, I am using a Managed Schema and I am unsure how to add these two 
blocks to it -

I know there is an API for managed schemas, would that support these additions?

Thanks!

Sas




Re: Stemming and Managed Schema

2016-06-03 Thread Shawn Heisey
On 6/3/2016 9:07 AM, Jamal, Sarfaraz wrote:
> I found the following article:
> http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/
>
> And I want to do stemming on one of our fields.
>
> However, I am using a Managed Schema and I am unsure how to add these two 
> blocks to it -
>
> I know there is an API for managed schemas, would that support these 
> additions?

You can't edit an existing fieldType with the Schema API.  You can
entirely replace it, but you have to include the whole definition.

https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-ReplaceaFieldType

I'm aware that the managed-schema file says to not make manual edits --
but you *can* edit it manually, as long as you are absolutely sure that
nobody is using the Schema API until after you complete your edits and
reload the core/collection.

Thanks,
Shawn



RE: [E] Re: Stemming and Managed Schema

2016-06-03 Thread Jamal, Sarfaraz
Awesome,

So just to make sure I got it right:

I would edit the managed-schema, make my changes, shutdown solr? And start it 
back up and verify it is still there?

Or is there another way to reload the core/collection?

Thanks!

Sas



-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, June 3, 2016 11:17 AM
To: solr-user@lucene.apache.org
Subject: [E] Re: Stemming and Managed Schema

On 6/3/2016 9:07 AM, Jamal, Sarfaraz wrote:
> I found the following article:
> http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-so
> lr-2013-08-02/
>
> And I want to do stemming on one of our fields.
>
> However, I am using a Managed Schema and I am unsure how to add these 
> two blocks to it -
>
> I know there is an API for managed schemas, would that support these 
> additions?

You can't edit an existing fieldType with the Schema API.  You can entirely 
replace it, but you have to include the whole definition.

https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-ReplaceaFieldType

I'm aware that the managed-schema file says to not make manual edits -- but you 
*can* edit it manually, as long as you are absolutely sure that nobody is using 
the Schema API until after you complete your edits and reload the 
core/collection.

Thanks,
Shawn



Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Jeff Wartes

For what it’s worth, I’d suggest you go into a conversation with Azul with a 
more explicit “I’m looking to buy” approach. I reached out to them with a more 
“I’m exploring my options” attitude, and never even got a trial. I get the 
impression their business model involves a fairly expensive (to them) trial 
process, so they’re looking for more urgency on the part of the client than I 
was expressing.

Instead, I spent a few weeks analyzing how my specific index allocated memory. 
This turned out to be quite worthwhile. Armed with that information, I was able 
to file a few patches (coming in 6.1, perhaps?) that reduced allocations by a 
pretty decent amount on large indexes. (SOLR-8922, particularly) It also 
straight-up ruled out certain things Solr supports, because the allocations 
were just too heavy. (SOLR-9125)

I suppose the next thing I’m considering is using multiple JVMs per host, 
essentially one per shard. This wouldn’t change the allocation rate, but does 
serve to reduce the worst-case GC pause, since each JVM can have a smaller 
heap. I’d be trading a little p50 latency for some p90 latency reduction, I’d 
expect. Of course, that adds a bunch of headache to managing replica locations 
too.


On 6/2/16, 6:30 PM, "Phillip Peleshok"  wrote:

>Fantastic! I'm sorry I couldn't find that JIRA before and for getting you
>to track it down.
>
>Yup, I noticed that for the docvalues with the ordinal map and I'm
>definitely leveraging all that but I'm hitting the terms limit now and that
>ends up pushing me over.  I'll see about giving Zing/Azul a try.  From all
>my readings using theUnsafe seemed a little sketchy (
>http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/) so
>I'm glad that seemed to be the point of contention bringing it in and not
>anything else.
>
>Thank you very much for the info,
>Phil
>
>On Thu, Jun 2, 2016 at 6:14 PM, Erick Erickson 
>wrote:
>
>> Basically it never reached consensus, see the discussion at:
>> https://issues.apache.org/jira/browse/SOLR-6638
>>
>> If you can afford it I've seen people with very good results
>> using Zing/Azul, but that can be expensive.
>>
>> DocValues can help for fields you facet and sort on,
>> those essentially move memory into the OS
>> cache.
>>
>> But memory is an ongoing struggle I'm afraid.
>>
>> Best,
>> Erick
>>
>> On Wed, Jun 1, 2016 at 12:34 PM, Phillip Peleshok 
>> wrote:
>> > Hey everyone,
>> >
>> > I've been using Solr for some time now and running into GC issues as most
>> > others have.  Now I've exhausted all the traditional GC settings
>> > recommended by various individuals (ie Shawn Heisey, etc) but neither
>> > proved sufficient.  The one solution that I've seen that proved useful is
>> > Heliosearch and the off-heap implementation.
>> >
>> > My question is this, why wasn't the off-heap FieldCache implementation (
>> > http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever rolled
>> into
>> > Solr when the other HelioSearch improvement were merged? Was there a
>> > fundamental design problem or just a matter of time/testing that would be
>> > incurred by the move?
>> >
>> > Thanks,
>> > Phil
>>



Re: [E] Re: Stemming and Managed Schema

2016-06-03 Thread Shawn Heisey
On 6/3/2016 9:22 AM, Jamal, Sarfaraz wrote:
> I would edit the managed-schema, make my changes, shutdown solr? And
> start it back up and verify it is still there? 

That's the sledgehammer approach.  Simple and effective, but Solr does
go offline for a short time.

> Or is there another way to reload the core/collection?

For SolrCloud:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2

For non-cloud mode:
https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD

Thanks,
Shawn



Re: AtomicUpdateDocumentMerger Unknown operation for the an atomic update, operation ignored

2016-06-03 Thread Shawn Heisey
On 6/3/2016 7:54 AM, Markus Jelsma wrote:
> Just now i indexed ~15k doc to a newly made core and shema, running
> 6.0 local this time. It was just regular indexing, nothing fancy and
> very small documents. Then the following popped up in the logs:
> 2496200 WARN (qtp97730845-17) [ x:documents]
> o.a.s.u.p.AtomicUpdateDocumentMerger Unknown operation for the an
> atomic update, operation ignored: 3932885930

This happens when you add a document where the value of a field is a
key/value construct, but isn't actually an atomic update.  In JSON, this
is represented with curly braces.  In Java, it is a Map object.  This
kind of construct is only used for Atomic Updates -- the key must be one
of the atomic operations: set, add, delete, inc.  The warning message
that I quoted indicates that the the key in the key/value construct for
that document was 3932885930.

If you are using SolrJ, then the value of one or more fields is being
set to a Map object, which isn't right unless you intend to do an Atomic
Update.  If you are using JSON formatted updates, there are probably
curly braces where they don't belong.

Thanks,
Shawn



Stemming Help

2016-06-03 Thread Jamal, Sarfaraz
Hi Guys,

I am following this tutorial:
http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/

My (Managed) Schema file looks like this: (in the appropriate places)


-  

-   




  

 -  

-

I have re-indexed everything -

It is not effecting my search at all -

- from what I can tell from the analysis tool nothing is happening.

Is there something else I am missing or should take a look at, or is it 
possible to debug this? Or some other documentation I can search though?

Thanks!

Sas

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, June 3, 2016 2:02 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Stemming and Managed Schema

On 6/3/2016 9:22 AM, Jamal, Sarfaraz wrote:
> I would edit the managed-schema, make my changes, shutdown solr? And 
> start it back up and verify it is still there?

That's the sledgehammer approach.  Simple and effective, but Solr does go 
offline for a short time.

> Or is there another way to reload the core/collection?

For SolrCloud:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2

For non-cloud mode:
https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD

Thanks,
Shawn



Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-03 Thread MaryJo Sminkey
Okay so big thanks for the help with getting the hon_lucene_synonyms plugin
working. That is a big load off to finally have a solution in place for all
our multi-term synonyms. We did find that the information in Step 8 about
the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser does
not seem to be correct, we only ever get "ExtendedDismaxQParser" but the
synonym expansion is definitely working.

In implementing it though, the one thing I'm still having an issue with is
trying to figure out how I can get results on the original term to appear
first in our results and matches on the synonyms lower in the results. The
plugin includes settings for an originalboost and synonymboost, but that
doesn't seem to be working along with all the other edismax boosts I'm
doing. We search across a number of fields, each with their own boost and
then do phrase searches with boosts as well. My params look like this:

params["defType"] = 'synonym_edismax';
params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
prodnumbertext^20.0';
params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
params["ps"] = 1;
params["tie"] = 0.1;
params["synonyms"] = true;
params["synonyms.originalBoost"] = 2.0;
params["synonyms.synonymBoost"] = 0.5;

And here's an example of what the plugin gives me for a search on "sbc"
which includes synonyms for "sb" and "small block" I don't really know
enough about this to figure out what exactly it's doing but since all of
the results I am getting first are ones with "small block" in the name, and
the ones with "sbc" in the prodname field which should be first are buried
about 1000 documents in, I know the originalboost and synonymboost aren't
working with all this other stuff. Ideas how to fix this? With the normal
synonym filter we just set up copies of the fields that could have synonyms
to use with that filter applied and had a lower boost on those. Not sure
how to make it work with this custom query parser though.

+((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
(((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
| productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
prodnumbertext:small prodnumbertext:sbc) prodnumbertext:block)~2)^20.0)~0.1
()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small |
keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
body:block^0.5 | productinfo:block | keywords:block^2.0 |
prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
| keywords:"small block"~1^10.0 | prodname:"small
block"~1^50.0)~0.1))^0.5)) ()


Mary Jo


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-03 Thread MaryJo Sminkey
On some additional tests, it looks like it's the phrase matching in
particular that is the issue, if I take that out I do seem to be getting
better results. I definitely don't want to get rid of those so need to find
a way to make them work together.



Sent with MailTrack


On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey  wrote:

> Okay so big thanks for the help with getting the hon_lucene_synonyms
> plugin working. That is a big load off to finally have a solution in place
> for all our multi-term synonyms. We did find that the information in Step 8
> about the plugin showing "SynonymExpandingExtendedDismaxQParser" for
> QParser does not seem to be correct, we only ever get
> "ExtendedDismaxQParser" but the synonym expansion is definitely working.
>
> In implementing it though, the one thing I'm still having an issue with is
> trying to figure out how I can get results on the original term to appear
> first in our results and matches on the synonyms lower in the results. The
> plugin includes settings for an originalboost and synonymboost, but that
> doesn't seem to be working along with all the other edismax boosts I'm
> doing. We search across a number of fields, each with their own boost and
> then do phrase searches with boosts as well. My params look like this:
>
> params["defType"] = 'synonym_edismax';
> params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> prodnumbertext^20.0';
> params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["ps"] = 1;
> params["tie"] = 0.1;
> params["synonyms"] = true;
> params["synonyms.originalBoost"] = 2.0;
> params["synonyms.synonymBoost"] = 0.5;
>
> And here's an example of what the plugin gives me for a search on "sbc"
> which includes synonyms for "sb" and "small block" I don't really know
> enough about this to figure out what exactly it's doing but since all of
> the results I am getting first are ones with "small block" in the name, and
> the ones with "sbc" in the prodname field which should be first are buried
> about 1000 documents in, I know the originalboost and synonymboost aren't
> working with all this other stuff. Ideas how to fix this? With the normal
> synonym filter we just set up copies of the fields that could have synonyms
> to use with that filter applied and had a lower boost on those. Not sure
> how to make it work with this custom query parser though.
>
> +((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
> (((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
> prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
> | productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
> prodnumbertext:small prodnumbertext:sbc) prodnumbertext:block)~2)^20.0)~0.1
> ()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small |
> keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
> body:block^0.5 | productinfo:block | keywords:block^2.0 |
> prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
> body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
> | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1))^0.5)) ()
>
>
> Mary Jo
>
>


Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Phillip Peleshok
Thank you for the info on this.  Yeah, I should've raised this in the dev
lists; sorry about that.  Funny you mention that since I was trending in
that direction as well.  Then saw the off-heap stuff and thought it might
have had an easy way out.  I'd like to focus on the re-use scheme to be
honest.  Already looking at that approach for the ordinal maps.

Thanks again,
Phil

On Fri, Jun 3, 2016 at 4:33 AM, Toke Eskildsen 
wrote:

> On Thu, 2016-06-02 at 18:14 -0700, Erick Erickson wrote:
> > But memory is an ongoing struggle I'm afraid.
>
> With fear of going too far into devel-territory...
>
>
> There are several places in Solr where memory usage if far from optimal
> with high-cardinality data and where improvements can be made without
> better GC or off-heap.
>
> Some places it is due to "clean object oriented" programming, for
> example with priority queues filled with objects, which gets very GC
> expensive for 100K+ entries. Some of this can be remedied by less clean
> coding and bit-hacking, but often results in less-manageable code.
>
> https://sbdevel.wordpress.com/2015/11/13/the-ones-that-got-away/
>
>
> Other places it is large arrays that are hard to avoid, for example with
> docID-bitmaps and counter-arrays for String faceting. These put quite a
> strain on GC as they are being allocated and released all the time.
> Unless the index is constantly updated, DocValues does not help much
> with GC as the counters are the same, DocValues or not.
>
> The layout of these structures is well-defined: As long as the Searcher
> has not been re-opened, each new instance of an array is of the exact
> same size as the previous one. When the searcher is re-opened, all the
> sizes changes. Putting those structures off-heap is one solution,
> another is to re-use the structures.
>
> Our experiments with re-using faceting counter structures has been very
> promising (far less GC, lower response times). I would think that the
> same would be true for a similar docID-bitmap re-use scheme.
>
>
> So yes, very much an on-going struggle, but one where there are multiple
> known remedies. Not necessarily easy to implement though.
>
> - Toke Eskildsen, State and Univeristy Library, Denmark
>
>
>


Re: [E] Re: Faceting Question(s)

2016-06-03 Thread MaryJo Sminkey
Just a followup on this, I found that the method below using URL params
doesn't work when using the Rest API, if you try to set the field in your
facet object to something like "{!ex=dt}doctype" it throws an error. Here's
the documentation on the correct method to use with the API.

http://yonik.com/multi-select-faceting/

MJ


On Thu, Jun 2, 2016 at 2:17 PM, Andrew Chillrud 
wrote:

> To return counts for doctype values that are currently not selected, tag
> filters that directly constrain doctype, and exclude those filters when
> faceting on doctype.
>
>
> q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype
>
> Filter exclusion is supported for all types of facets. Both the tag and ex
> local parameters may specify multiple values by separating them with commas.
>





Sent with MailTrack



Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Phillip Peleshok
Funny you say that, as that's exactly what happened.  Tried them a couple
weeks ago and nothing.  Going at them again and will see what happens.

Yeah, we're in the same boat.  We started with the profilers (Yourkit) to
track down the causes.  Mainly got hit in the field cache and ordinal maps
(and all the objects just to build them).  Since we transitioned from
classic facets to json facets, unfortunately SOLR-8922 doesn't lend much
but it looks really good.  We were looking at cutting out the ordinal cache
depending on the cardinality but that's still a PoC at this point, but does
allow us to cap the memory usage.  Then given the (
http://stackoverflow.com/questions/214362/java-very-large-heap-sizes) we
stumbled across the off-heap and were giving that a go to see if it's worth
the avenue.  But after reading the UnSafe, started getting cold feet and
that's why I was trying to dig up a little more history.

Was actually thinking about the isolation of JVM per shard too.  Going
through the whiteboarding, decided against since it didn't lend itself to
our scenarios, but would be interested in how it turns out for you.

Thanks!
Phil

On Fri, Jun 3, 2016 at 8:33 AM, Jeff Wartes  wrote:

>
> For what it’s worth, I’d suggest you go into a conversation with Azul with
> a more explicit “I’m looking to buy” approach. I reached out to them with a
> more “I’m exploring my options” attitude, and never even got a trial. I get
> the impression their business model involves a fairly expensive (to them)
> trial process, so they’re looking for more urgency on the part of the
> client than I was expressing.
>
> Instead, I spent a few weeks analyzing how my specific index allocated
> memory. This turned out to be quite worthwhile. Armed with that
> information, I was able to file a few patches (coming in 6.1, perhaps?)
> that reduced allocations by a pretty decent amount on large indexes.
> (SOLR-8922, particularly) It also straight-up ruled out certain things Solr
> supports, because the allocations were just too heavy. (SOLR-9125)
>
> I suppose the next thing I’m considering is using multiple JVMs per host,
> essentially one per shard. This wouldn’t change the allocation rate, but
> does serve to reduce the worst-case GC pause, since each JVM can have a
> smaller heap. I’d be trading a little p50 latency for some p90 latency
> reduction, I’d expect. Of course, that adds a bunch of headache to managing
> replica locations too.
>
>
> On 6/2/16, 6:30 PM, "Phillip Peleshok"  wrote:
>
> >Fantastic! I'm sorry I couldn't find that JIRA before and for getting you
> >to track it down.
> >
> >Yup, I noticed that for the docvalues with the ordinal map and I'm
> >definitely leveraging all that but I'm hitting the terms limit now and
> that
> >ends up pushing me over.  I'll see about giving Zing/Azul a try.  From all
> >my readings using theUnsafe seemed a little sketchy (
> >http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/) so
> >I'm glad that seemed to be the point of contention bringing it in and not
> >anything else.
> >
> >Thank you very much for the info,
> >Phil
> >
> >On Thu, Jun 2, 2016 at 6:14 PM, Erick Erickson 
> >wrote:
> >
> >> Basically it never reached consensus, see the discussion at:
> >> https://issues.apache.org/jira/browse/SOLR-6638
> >>
> >> If you can afford it I've seen people with very good results
> >> using Zing/Azul, but that can be expensive.
> >>
> >> DocValues can help for fields you facet and sort on,
> >> those essentially move memory into the OS
> >> cache.
> >>
> >> But memory is an ongoing struggle I'm afraid.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jun 1, 2016 at 12:34 PM, Phillip Peleshok 
> >> wrote:
> >> > Hey everyone,
> >> >
> >> > I've been using Solr for some time now and running into GC issues as
> most
> >> > others have.  Now I've exhausted all the traditional GC settings
> >> > recommended by various individuals (ie Shawn Heisey, etc) but neither
> >> > proved sufficient.  The one solution that I've seen that proved
> useful is
> >> > Heliosearch and the off-heap implementation.
> >> >
> >> > My question is this, why wasn't the off-heap FieldCache
> implementation (
> >> > http://yonik.com/hs-solr-off-heap-fieldcache-performance/) ever
> rolled
> >> into
> >> > Solr when the other HelioSearch improvement were merged? Was there a
> >> > fundamental design problem or just a matter of time/testing that
> would be
> >> > incurred by the move?
> >> >
> >> > Thanks,
> >> > Phil
> >>
>
>


Re: [E] Re: Stemming and Managed Schema

2016-06-03 Thread Erick Erickson
Actually, I prefer to do it the other way:

1> shut down Solr
2> edit managed_schema
3> start Solr.

that eliminates any possibility of inadvertently overwriting your
changes by issuing a managed schema call.

that's a nit though, either will work.


FWIW,
Erick

On Fri, Jun 3, 2016 at 11:02 AM, Shawn Heisey  wrote:
> On 6/3/2016 9:22 AM, Jamal, Sarfaraz wrote:
>> I would edit the managed-schema, make my changes, shutdown solr? And
>> start it back up and verify it is still there?
>
> That's the sledgehammer approach.  Simple and effective, but Solr does
> go offline for a short time.
>
>> Or is there another way to reload the core/collection?
>
> For SolrCloud:
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
>
> For non-cloud mode:
> https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD
>
> Thanks,
> Shawn
>


How to control solr 4.8 recovery download file speed

2016-06-03 Thread ??????????????
Hey everyone,
  I've been using Solr 4.8 for some time now,my solrcloud have three
replication, the shard have more than 64GB index file.But I meet some 
problem when solrcloud execute recovery, it consume a lot of bandwidth.
So when solrcloud recovery the replication download file from leader
 my other network program can't get bandwidth.
 
  My question is this, how can I control the solrcloud recovery bandwidth
consume? Is there some method control replication download file speed?
  Now, add a patch for "SnapPull.java", I modify the "DirectoryFileFetcher"
class, I modify the "fetchPackets(FastInputStream)" method.
if (bytesCount >= bytesLimit) {
long currentTime = System.currentTimeMillis();
if (currentTime <= endTime) {
Thread.sleep(endTime - currentTime);
endTime = System.currentTimeMillis() + 50;
bytesCount = 0
} else{
endTime = System.currentTimeMillis() + 50;
bytesCount = 0;
}
}

bytesCount += packetSize;
  
  I control the download speed 40MB/s, but Some times it work failed.
  Every one meet same issue.
  
 Thanks.
 Peter Song.