[ https://issues.apache.org/jira/browse/SOLR-13089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009441#comment-17009441 ]
ASF subversion and git services commented on SOLR-13089: -------------------------------------------------------- Commit ac777a5352224b2c8f46836f0e078809308fc2d8 in lucene-solr's branch refs/heads/gradle-master from Martijn Koster [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ac777a5 ] SOLR-13089: Fix lsof edge cases in the solr CLI script > bin/solr's use of lsof has some issues > -------------------------------------- > > Key: SOLR-13089 > URL: https://issues.apache.org/jira/browse/SOLR-13089 > Project: Solr > Issue Type: Bug > Components: SolrCLI > Reporter: Martijn Koster > Assignee: Jan Høydahl > Priority: Minor > Fix For: 8.5 > > Attachments: 0001-SOLR-13089-lsof-fixes.patch, SOLR-13089.patch > > > The {{bin/solr}} script uses this {{lsof}} invocation to check if the Solr > port is being listened on: > {noformat} > running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN` > if [ -z "$running" ]; then > {noformat} > code is at > [here|https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2147]. > There are a few issues with this. > h2. 1. False negatives when port is occupied by different user > When {{lsof}} runs as non-root, it only shows sockets for processes with your > effective uid. > For example: > {noformat} > $ id -u && nc -l 7788 & > [1] 26576 > 1000 > #### works: nc ran as my user > $ lsof -PniTCP:7788 -sTCP:LISTEN > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > nc 26580 mak 3u IPv4 2818104 0t0 TCP *:7788 (LISTEN) > #### fails: ssh is running as root > $ lsof -PniTCP:22 -sTCP:LISTEN > #### works if we are root > $ sudo lsof -PniTCP:22 -sTCP:LISTEN > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > sshd 2524 root 3u IPv4 18426 0t0 TCP *:22 (LISTEN) > sshd 2524 root 4u IPv6 18428 0t0 TCP *:22 (LISTEN) > {noformat} > Solr runs as non-root. > So if some other process owned by a different user occupies that port, you > will get a false negative (it will say Solr is not running even though it is) > I can't think of a good way to fix or work around that (short of not using > {{lsof}} in the first place). > Perhaps an uncommon scenario we need not worry too much about. > h2. 2. lsof can complain about lack of /etc/password entries > If {{lsof}} runs without the current effective user having an entry in > {{/etc/passwd}}, > it produces a warning on stderr: > {noformat} > $ docker run -d -u 0 solr:7.6.0 bash -c "chown -R 8888 /opt/; gosu 8888 > solr-foreground" > 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 > $ docker exec -it -u 8888 > 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash > I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN > lsof: no pwd entry for UID 8888 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > lsof: no pwd entry for UID 8888 > java 9 8888 115u IPv4 2813503 0t0 TCP *:8983 (LISTEN) > I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 > -sTCP:LISTEN>/dev/null > lsof: no pwd entry for UID 8888 > lsof: no pwd entry for UID 8888 > {noformat} > You can avoid this by using the {{-t}} tag, which specifies that lsof should > produce terse output with process identifiers only and no header: > {noformat} > I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN > 9 > {noformat} > This is a rare circumstance, but one I encountered and worked around. > h2. 3. On Alpine, lsof is implemented by busybox, but with incompatible > arguments > On Alpine, {{busybox}} implements {{lsof}}, but does not support the > arguments, so you get: > {noformat} > $ docker run -it alpine sh > / # lsof -t -PniTCP:8983 -sTCP:LISTEN > 1 /bin/busybox /dev/pts/0 > 1 /bin/busybox /dev/pts/0 > 1 /bin/busybox /dev/pts/0 > 1 /bin/busybox /dev/tty > {noformat} > so if you ran Solr, in the background, and it failed to start, this code > would produce a false positive. > For example: > {noformat} > docker volume create mysol > docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol" > docker run -it -v mysol:/mysol -w /mysol -v > $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh > apk add procps bash > tar xvzf /solr-7.6.0.tgz > chown -R 8983:8983 . > {noformat} > then in a separate terminal: > {noformat} > $ docker exec -it -u 8983 serene_saha sh > /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start > whoami: unknown uid 8983 > Waiting up to 180 seconds to see Solr running on port 8983 [|] > Started Solr server on port 8983 (pid=101). Happy searching! > /mysol $ > {noformat} > and in another separate terminal: > {noformat} > $ docker exec -it thirsty_liskov bash > bash-4.4$ cat server/logs/solr-8983-console.log > Unrecognized option: --invalid > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > {noformat} > so it is saying Solr is running, when it isn't. > Now, all this can be avoided by just installing the real {{lsof}} with {{apk > add lsof}} which works properly. So should we detect and warn? Or even refuse > to run rather than invoke a tool that does not implement the contract we > expect? > h2. 4. Shellcheck dislikes backticks > Shellcheck says {{SC2006: Use $(..) instead of legacy `..`.}} > Now, shellcheck complains about 130 other issues too, so it's a drop in a > bucket, but if we're changing things, might as well fix that. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org