solr1.3からsolr8.4へのデータ移行について

2020-10-12 Thread 阿部真也
私はsolrを使用した古いシステムから新しいシステムに再構築し
データ移行を行う必要があります。

現在システムはsolr1.3で動作していて、新規に構築するシステムでは
solrのバージョンを8.4 にアップデートしようと考えています。
そこで、/var/solr/{system_name}/data のデータを
旧システムから新システムに移し替えることでうまくいくかどうか、知りたいです。

既にsolrconfig.xmlのほとんどの設定が移行できないことが分かっていますが、
こちらは使用している設定名の代替手段がエラーログに出てくるため
何とかなるかもしれないと思っています。

よろしくお願いします。


Re: Any solr api to force leader on a specified node

2020-10-12 Thread Erick Erickson
First, I totally agree with Walter. See: 
https://lucidworks.com/post/indexing-with-solrj/

Second, DIH is being deprecated. It is being moved to
a package that will be supported if, and only if, there is
enough community support for it. “Community support” 
means people who use it need to step up and maintain
it.

Third, there’s nothing in Solr that requires DIH to
be run on a leader so your premise is wrong. You need
to look at your logs to see what’s happening there. It
should be perfectly fine to run it on a replica.

Best,
Erick

> On Oct 11, 2020, at 11:53 PM, Walter Underwood  wrote:
> 
> Don’t use DIH. DIH has a lot of limitations and problems, as you are 
> discovering.
> 
> Write a simple program that fetches from the database and sends documents 
> in batches to Solr. I did this before DIH was invented (Solr 1.3) and I’m 
> doing it
> now.
> 
> You can send the updates to the load balancer for the Solr Cloud cluster. The
> updates will be automatically routed to the right leader. It is very fast.
> 
> My loader is written in Python and I don’t even bother with a special Solr 
> library.
> It just sends JSON to the update handler with the right options.
> 
> We do this for all of our clusters. Our biggest one is 48 hosts with 55 
> million
> documents.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Oct 11, 2020, at 8:40 PM, yaswanth kumar  wrote:
>> 
>> Hi wunder 
>> 
>> Thanks for replying on this..
>> 
>> I did setup solr cloud with 4 nodes being one node having DIH configured 
>> that pulls data from ms sql every minute.. if I install DIH on rest of the 
>> nodes it’s causing connection issues on the source dB which I don’t want and 
>> manage with only one sever polling dB while rest are used as replicas for 
>> search.
>> 
>> So now everything works fine but when the severs are rebooted for 
>> maintenance and once they come back and if the leader is not the one that 
>> doesn’t have DIH it stops pulling data from sql , so that’s the reason why I 
>> want always to force a node as leader
>> 
>> Sent from my iPhone
>> 
>>> On Oct 11, 2020, at 11:05 PM, Walter Underwood  
>>> wrote:
>>> 
>>> That requirement is not necessary. Let Solr choose a leader.
>>> 
>>> Why is someone making this bad requirement?
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Oct 11, 2020, at 8:01 PM, yaswanth kumar  wrote:
 
 Can someone pls help me to know if there is any solr api /config where we 
 can make sure to always opt leader on a particular solr node in solr 
 cloud??
 
 Using solr 8.2 and zoo 3.4
 
 I have four nodes and my requirement is to always make a particular node 
 as leader
 
 Sent from my iPhone
>>> 
> 



Memory line in status output

2020-10-12 Thread Ryan W
Hi all,

What is the meaning of the "memory" line in the output when I run the solr
status command?  What controls whether that memory gets exhausted?  At
times if I run "solr status" over and over, that memory number creeps up
and up and up.  Presumably it is not a good thing if it moves all the way
up to my 31GB capacity.  What controls whether that happens?  How do I
prevent that?  Or does Solr manage this automatically?


$ /opt/solr/bin/solr status

Found 1 Solr nodes:

Solr process 101530 running on port 8983
{
  "solr_home":"/opt/solr/server/solr",
  "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy -
2019-05-28 23:37:48",
  "startTime":"2020-10-12T12:04:57.379Z",
  "uptime":"0 days, 1 hours, 46 minutes, 41 seconds",
  "memory":"3.3 GB (%10.7) of 31 GB"}


Re: Memory line in status output

2020-10-12 Thread Erick Erickson
Solr doesn’t manage this at all, it’s the JVM’s garbage collection
that occasionally kicks in. In general, memory creeps up until
the GC threshold is set (which there are about a zillion
parameters that you can set) and then GC kicks in.

Generally, the recommendation is to use the G1GC collector
and just leave the default settings as they are.

It’s usually a mistake, BTW, to over-allocate memory. You should shrink the
heap as far as you can and still maintain a reasonable safety margin. See:

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

What’s a “reasonable safety margin”? Unfortunately you have to experiment.

Best,
Erick

> On Oct 12, 2020, at 10:33 AM, Ryan W  wrote:
> 
> Hi all,
> 
> What is the meaning of the "memory" line in the output when I run the solr
> status command?  What controls whether that memory gets exhausted?  At
> times if I run "solr status" over and over, that memory number creeps up
> and up and up.  Presumably it is not a good thing if it moves all the way
> up to my 31GB capacity.  What controls whether that happens?  How do I
> prevent that?  Or does Solr manage this automatically?
> 
> 
> $ /opt/solr/bin/solr status
> 
> Found 1 Solr nodes:
> 
> Solr process 101530 running on port 8983
> {
>  "solr_home":"/opt/solr/server/solr",
>  "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy -
> 2019-05-28 23:37:48",
>  "startTime":"2020-10-12T12:04:57.379Z",
>  "uptime":"0 days, 1 hours, 46 minutes, 41 seconds",
>  "memory":"3.3 GB (%10.7) of 31 GB"}



[CVE-2020-13957] The checks added to unauthenticated configset uploads in Apache Solr can be circumvented

2020-10-12 Thread Tomas Fernandez Lobbe
Severity: High

Vendor: The Apache Software Foundation

Versions Affected:
6.6.0 to 6.6.5
7.0.0 to 7.7.3
8.0.0 to 8.6.2

Description:
Solr prevents some features considered dangerous (which could be used for
remote code execution) to be configured in a ConfigSet that's uploaded via
API without authentication/authorization. The checks in place to prevent
such features can be circumvented by using a combination of UPLOAD/CREATE
actions.

Mitigation:
Any of the following are enough to prevent this vulnerability:
* Disable UPLOAD command in ConfigSets API if not used by setting the
system property: "configset.upload.enabled" to "false" [1]
* Use Authentication/Authorization and make sure unknown requests aren't
allowed [2]
* Upgrade to Solr 8.6.3 or greater.
* If upgrading is not an option, consider applying the patch in SOLR-14663
([3])
* No Solr API, including the Admin UI, is designed to be exposed to
non-trusted parties. Tune your firewall so that only trusted computers and
people are allowed access

Credit:
Tomás Fernández Löbbe, András Salamon

References:
[1] https://lucene.apache.org/solr/guide/8_6/configsets-api.html
[2]
https://lucene.apache.org/solr/guide/8_6/authentication-and-authorization-plugins.html
[3] https://issues.apache.org/jira/browse/SOLR-14663
[4] https://issues.apache.org/jira/browse/SOLR-14925
[5] https://wiki.apache.org/solr/SolrSecurity


Re: Memory line in status output

2020-10-12 Thread Ryan W
Thanks.  How do I activate the G1GC collector?  Do I do this by editing a
config file, or by adding a parameter when I start solr?

Oracle's docs are pointing me to a file that supposedly is at
instance-dir/OUD/config/java.properties, but I don't have that path.  I am
not sure what is meant by instance-dir here, but perhaps it means my JRE
install, which is at
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el7_8.x86_64/jre -- but
there is no "OUD" directory in this location.



On Mon, Oct 12, 2020 at 11:15 AM Erick Erickson 
wrote:

> Solr doesn’t manage this at all, it’s the JVM’s garbage collection
> that occasionally kicks in. In general, memory creeps up until
> the GC threshold is set (which there are about a zillion
> parameters that you can set) and then GC kicks in.
>
> Generally, the recommendation is to use the G1GC collector
> and just leave the default settings as they are.
>
> It’s usually a mistake, BTW, to over-allocate memory. You should shrink the
> heap as far as you can and still maintain a reasonable safety margin. See:
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> What’s a “reasonable safety margin”? Unfortunately you have to experiment.
>
> Best,
> Erick
>
> > On Oct 12, 2020, at 10:33 AM, Ryan W  wrote:
> >
> > Hi all,
> >
> > What is the meaning of the "memory" line in the output when I run the
> solr
> > status command?  What controls whether that memory gets exhausted?  At
> > times if I run "solr status" over and over, that memory number creeps up
> > and up and up.  Presumably it is not a good thing if it moves all the way
> > up to my 31GB capacity.  What controls whether that happens?  How do I
> > prevent that?  Or does Solr manage this automatically?
> >
> >
> > $ /opt/solr/bin/solr status
> >
> > Found 1 Solr nodes:
> >
> > Solr process 101530 running on port 8983
> > {
> >  "solr_home":"/opt/solr/server/solr",
> >  "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy -
> > 2019-05-28 23:37:48",
> >  "startTime":"2020-10-12T12:04:57.379Z",
> >  "uptime":"0 days, 1 hours, 46 minutes, 41 seconds",
> >  "memory":"3.3 GB (%10.7) of 31 GB"}
>
>


Re: Memory line in status output

2020-10-12 Thread Shawn Heisey

On 10/12/2020 5:11 PM, Ryan W wrote:

Thanks.  How do I activate the G1GC collector?  Do I do this by editing a
config file, or by adding a parameter when I start solr?

Oracle's docs are pointing me to a file that supposedly is at
instance-dir/OUD/config/java.properties, but I don't have that path.  I am
not sure what is meant by instance-dir here, but perhaps it means my JRE
install, which is at
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el7_8.x86_64/jre -- but
there is no "OUD" directory in this location.


The collector is chosen by the startup options given to Java, in this 
case by the start script for Solr.  I've never heard of it being set by 
a config in the JRE.


In Solr 7, the start script defaults to the CMS collector.  We have 
updated that to G1 in the latest Solr 8.x versions, because CMS has been 
deprecated by Oracle.


Adding the following lines to the correct solr.in.sh would change the 
garbage collector to G1.  I got this from the "bin/solr" script in Solr 
8.5.1:


  GC_TUNE=('-XX:+UseG1GC' \
'-XX:+PerfDisableSharedMem' \
'-XX:+ParallelRefProcEnabled' \
'-XX:MaxGCPauseMillis=250' \
'-XX:+UseLargePages' \
'-XX:+AlwaysPreTouch')

If you used the service installer script to install Solr, then the 
correct file to add this to is usually /etc/default/solr.in.sh ... but 
if you did the install manually, it may be in the same bin directory 
that contains the solr script itself.  Your initial message says the 
solr home is /opt/solr/server/solr so I am assuming it's not running on 
Windows.


Thanks,
Shawn


Re: [CVE-2020-13957] The checks added to unauthenticated configset uploads in Apache Solr can be circumvented

2020-10-12 Thread Bernd Fehling
Good to know that Version 6.6.6 is not affected, so I am safe ;-)

Regards
Bernd

Am 12.10.20 um 20:38 schrieb Tomas Fernandez Lobbe:
> Severity: High
> 
> Vendor: The Apache Software Foundation
> 
> Versions Affected:
> 6.6.0 to 6.6.5
> 7.0.0 to 7.7.3
> 8.0.0 to 8.6.2
> 
> Description:
> Solr prevents some features considered dangerous (which could be used for
> remote code execution) to be configured in a ConfigSet that's uploaded via
> API without authentication/authorization. The checks in place to prevent
> such features can be circumvented by using a combination of UPLOAD/CREATE
> actions.
> 
> Mitigation:
> Any of the following are enough to prevent this vulnerability:
> * Disable UPLOAD command in ConfigSets API if not used by setting the
> system property: "configset.upload.enabled" to "false" [1]
> * Use Authentication/Authorization and make sure unknown requests aren't
> allowed [2]
> * Upgrade to Solr 8.6.3 or greater.
> * If upgrading is not an option, consider applying the patch in SOLR-14663
> ([3])
> * No Solr API, including the Admin UI, is designed to be exposed to
> non-trusted parties. Tune your firewall so that only trusted computers and
> people are allowed access
> 
> Credit:
> Tomás Fernández Löbbe, András Salamon
> 
> References:
> [1] https://lucene.apache.org/solr/guide/8_6/configsets-api.html
> [2]
> https://lucene.apache.org/solr/guide/8_6/authentication-and-authorization-plugins.html
> [3] https://issues.apache.org/jira/browse/SOLR-14663
> [4] https://issues.apache.org/jira/browse/SOLR-14925
> [5] https://wiki.apache.org/solr/SolrSecurity
>