As an additional bit of information, here's the tcpdump of my startup of solr in the docker container, after logging into the container and running "bin/solr start -f -c" (which is the same CMD my Dockerfile executes):
root@91e3883fb675:/opt/solr-8.2.0# tcpdump -nvvv -i any -c 100 host 172.20.60.138 tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 21:54:49.426019 IP (tos 0x0, ttl 64, id 44803, offset 0, flags [DF], proto TCP (6), length 60) 172.17.0.2.60562 > 172.20.60.138.2181: Flags [S], cksum 0x94e0 (incorrect -> 0x19d3), seq 2175798173, win 29200, options [mss 1460,sackOK,TS val 6792350 ecr 0,nop,wscale 7], length 0 21:54:49.472340 IP (tos 0x0, ttl 37, id 37699, offset 0, flags [none], proto TCP (6), length 48) 172.20.60.138.2181 > 172.17.0.2.60562: Flags [S.], cksum 0xd892 (correct), seq 452884582, ack 2175798174, win 65535, options [mss 1460,wscale 2,eol], length 0 21:54:49.472428 IP (tos 0x0, ttl 64, id 44804, offset 0, flags [DF], proto TCP (6), length 40) 172.17.0.2.60562 > 172.20.60.138.2181: Flags [.], cksum 0x94cc (incorrect -> 0x0472), seq 1, ack 1, win 229, length 0 21:54:49.472950 IP (tos 0x0, ttl 64, id 44805, offset 0, flags [DF], proto TCP (6), length 89) 172.17.0.2.60562 > 172.20.60.138.2181: Flags [P.], cksum 0x94fd (incorrect -> 0x8ecb), seq 1:50, ack 1, win 229, length 49 21:54:49.473400 IP (tos 0x0, ttl 37, id 33425, offset 0, flags [none], proto TCP (6), length 40) 172.20.60.138.2181 > 172.17.0.2.60562: Flags [.], cksum 0x0526 (correct), seq 1, ack 50, win 65535, length 0 21:54:59.448636 IP (tos 0x0, ttl 64, id 44806, offset 0, flags [DF], proto TCP (6), length 40) 172.17.0.2.60562 > 172.20.60.138.2181: Flags [F.], cksum 0x94cc (incorrect -> 0x0440), seq 50, ack 1, win 229, length 0 21:54:59.449070 IP (tos 0x0, ttl 37, id 3430, offset 0, flags [none], proto TCP (6), length 40) 172.20.60.138.2181 > 172.17.0.2.60562: Flags [.], cksum 0x0525 (correct), seq 1, ack 51, win 65535, length 0 21:55:21.518447 IP (tos 0x0, ttl 37, id 2259, offset 0, flags [none], proto TCP (6), length 40) 172.20.60.138.2181 > 172.17.0.2.60562: Flags [F.], cksum 0x0524 (correct), seq 1, ack 51, win 65535, length 0 21:55:21.518513 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 172.17.0.2.60562 > 172.20.60.138.2181: Flags [.], cksum 0x043f (correct), seq 51, ack 2, win 229, length 0 172.17.0.2 is my solr docker container, 172.20.60.138 is my zk1 docker container residing out in AWS. >From this, it looks like communication is happening but that it's finishing and closing the connection instead of holding it open. Am I interpreting this correctly? -- Drew(i...@gmail.com) http://wyntermute.dyndns.org/blog/ -- I Drive Way Too Fast To Worry About Cholesterol. On Fri, Oct 18, 2019 at 1:18 PM Drew Kidder <dre...@gmail.com> wrote: > Again, thank you all for the suggestions. > > My ZK ensemble is talking to each other and the outside world: > > solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk1.zookeeper.internal 2181 > Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built > on 05/03/2019 12:07 GMT > Latency min/avg/max: 0/0/0 > Received: 53 > Sent: 33 > Connections: 1 > Outstanding: 19 > Zxid: 0x0 > Mode: follower > Node count: 5 > > solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk2.zookeeper.internal 2181 > Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built > on 05/03/2019 12:07 GMT > Latency min/avg/max: 0/0/0 > Received: 37 > Sent: 17 > Connections: 1 > Outstanding: 19 > Zxid: 0x200000000 > Mode: leader > Node count: 5 > Proposal sizes last/min/max: 32/32/36 > > solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk3.zookeeper.internal 2181 > Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built > on 05/03/2019 12:07 GMT > Latency min/avg/max: 0/0/0 > Received: 7 > Sent: 3 > Connections: 1 > Outstanding: 3 > Zxid: 0x200000000 > Mode: follower > Node count: 5 > > All of these commands can be executed on the solr container as either the > root user or the solr user (see the command prompt in each command). Note > that zk2 is the leader and zk1 and zk3 are followers. The configuration > files (including the ZOO_MY_ID and ZOO_SERVERS environment variables) are > all set up correctly and by all rights and purposes, ZK appears to be set > up correctly and functioning. > > Jorne Franke: I tried implementing your suggestion of providing "/" as the > root node by appending "/" to the end of the ZK_HOST connection string and > it still did not work (e.g. ENV ZK_HOST > zk1.zookeeper.internal:2181,zk2.zookeeper.internal:2181,zk3.zookeeper.internal:2181/ > in the Dockerfile). Was this what you meant? Or were you suggesting to set > the ZK_ROOT in the Solr configs/environment instead? > > -- > Drew(i...@gmail.com) > http://wyntermute.dyndns.org/blog/ > > -- I Drive Way Too Fast To Worry About Cholesterol. > > > On Fri, Oct 18, 2019 at 12:11 PM Ahmed Adel <aa.0...@gmail.com> wrote: > >> This could be because Zookeeper ensemble is not properly configured. Using >> a very similar setup which consists of ZK cluster of three hosts and one >> Solr Cloud node (all are containers), the system got running. Each ZK host >> has ZOO_MY_ID and ZOO_SERVERS environment variables set before running ZK. >> In this case, the former variable value would be from 1 to 3 on each host >> and the latter would be "server.1=z1:2888:3888;2181 >> server.2=z2:2888:3888;2181 server.3=z3:2888:3888;2181" the same on all >> hosts (the double quotes may be needed for proper parsing). This >> ZOO_SERVERS syntax is for ZK version 3.5. 3.4 is slightly different. >> >> http://aadel.io >> >> On Fri, Oct 18, 2019 at 5:28 PM Drew Kidder <dre...@gmail.com> wrote: >> >> > Thank you all for your suggestions! I appreciate the fast turnaround. >> > >> > My setup is using Amazon ECS for our solr cloud installation. Each ZK >> is in >> > its own container, using Route53 Service Discovery to provide the DNS >> name. >> > The ZK nodes can all talk to each other, and I can communicate to each >> one >> > of those nodes from my local machine and from within the solr container. >> > Solr is one node per container, as Martijn correctly assumed. I am not >> > using a zkRoot at present because my intention is to use ZK solely for >> Solr >> > Cloud and nothing else. >> > >> > I have tried removing the "-z" option from the Dockerfile CMD and using >> the >> > ZK_HOST environment variable (see below). I have even also modified the >> > solr.in.sh and set the ZK_HOST variable there, all to no avail. I have >> > tried both the Dockerfile command route, and have logged into the solr >> > container and tried to run the CMD manually to see if there was a >> problem >> > with the way I was using the CMD entry. All of those methods give me the >> > same result output captured in the gist below. >> > >> > The gist for my solr.log output is here: >> > https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087 >> > >> > My Dockerfile for the solr container looks like this: >> > >> > >> > FROM solr:8.2 >> > >> > EXPOSE 8983 8999 2181 >> > >> > VOLUME /app/logs >> > VOLUME /app/data >> > VOLUME /app/conf >> > >> > ## add our jetty configuration (increased request size!) >> > COPY jetty.xml /opt/solr/server/etc/ >> > >> > ## SolrCloud configuration >> > ENV ZK_HOST zk1:2181,zk2:2181,zk3:2181 >> > ENV ZK_CLIENT_TIMEOUT 30000 >> > >> > USER root >> > RUN apt-get update >> > RUN apt-get install -y netcat net-tools vim procps >> > USER solr >> > >> > # Copy over custom solr plugins >> > COPY myplugins/src/resources/* /opt/solr/server/solr/my-resources/ >> > COPY lib/*.jar /opt/solr/my-lib/ >> > >> > # Copy over my configs >> > COPY conf/ /app/conf >> > >> > #Start solr in cloud mode, connecting to zookeeper >> > CMD ["solr","start","-f","-c"] >> > >> > The docker command I use to execute this Dockerfile is `docker run -p >> > 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest` >> > >> > Output of `ps -eflww` from within the solr container (as root): >> > >> > root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww >> > F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY >> TIME >> > CMD >> > 4 S solr 1 0 9 80 0 - 1043842 - 14:36 ? >> 00:00:07 >> > /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC >> > -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled >> > -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch >> > >> > >> -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M >> > -Dcom.sun.management.jmxremote >> > -Dcom.sun.management.jmxremote.local.only=false >> > -Dcom.sun.management.jmxremote.ssl=false >> > -Dcom.sun.management.jmxremote.authenticate=false >> > -Dcom.sun.management.jmxremote.port=18983 >> > -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000 >> > -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs >> > -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks >> -Duser.timezone=UTC >> > -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data >> > -Dsolr.data.home= -Dsolr.install.dir=/opt/solr >> > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf >> > -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k >> > -Dsolr.jetty.https.port=8983 -jar start.jar --module=http >> > 4 S root 90 0 0 80 0 - 4988 - 14:37 pts/0 >> 00:00:00 >> > /bin/bash >> > 0 R root 95 90 0 80 0 - 9595 - 14:37 pts/0 >> 00:00:00 >> > ps -eflww >> > >> > Output of netstat from within the solr container (as root): >> > >> > root@fe0ad5b40b42:/opt/solr-8.2.0# netstat >> > Active Internet connections (w/o servers) >> > Proto Recv-Q Send-Q Local Address Foreign Address >> State >> > tcp 0 0 fe0ad5b40b42:43678 172.20.28.179:2181 >> > TIME_WAIT >> > tcp 0 0 fe0ad5b40b42:60164 172.20.155.241:2181 >> > TIME_WAIT >> > tcp 0 0 fe0ad5b40b42:60500 172.20.60.138:2181 >> > TIME_WAIT >> > Active UNIX domain sockets (w/o servers) >> > Proto RefCnt Flags Type State I-Node Path >> > unix 2 [ ] STREAM CONNECTED 129252 >> > unix 2 [ ] STREAM CONNECTED 129270 >> > >> > I'm beginning to think that ZK is not setup correctly. I haven't >> uploaded >> > any configuration files to ZK yet; my understanding was that I could >> start >> > up a solr cloud node with no collections and upload the configuration >> from >> > there. I was under the impression that it would try to connect to ZK >> and if >> > it couldn't get config files from there it would use local config >> files. Do >> > I need to upload the solr cloud configuration files to ZK before >> starting >> > up the cluster? The netstat output makes it look like the solr >> container >> > is indeed connected to the ZK containers, but there's no indication as >> to >> > why it cannot connect to Zookeeper that I can see. >> > >> > -- >> > Drew(i...@gmail.com) >> > http://wyntermute.dyndns.org/blog/ >> > >> > -- I Drive Way Too Fast To Worry About Cholesterol. >> > >> > >> > On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster < >> > mak-luc...@greenhills.co.uk> >> > wrote: >> > >> > > >> > > >> > > > On 18 Oct 2019, at 00:25, Drew Kidder <dre...@gmail.com> wrote: >> > > >> > > > * I'm using the following command line to start a basic solr cloud >> > > instance >> > > > as per the documentation: `bin/solr start -c -z >> > > zk1:2181,zk2:2181,zk3:2181` >> > > >> > > I assume you’re just looking to run a single Solr node in a single >> > > container, right? >> > > >> > > Just set the ZK_HOST environment variable, and remove the command-line >> > > arguments. >> > > And you don’t need to specify the port number unless you deviate from >> the >> > > default. >> > > Have a look at this example >> > > >> > >> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml >> > > < >> > > >> > >> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with >> > > > >> > > >> > > The “start” command starts Solr in the background, which is typically >> not >> > > what you want >> > > when running Solr under docker. >> > > >> > > >> > > Why your command isn’t working as is, is not clear. When you say >> you’re >> > > using that >> > > command-line, how do you actually do that? In a full docker command >> line, >> > > or a compose file, or from a “docker exec”, or from some orchestrator. >> > > Share the exact thing you’re doing; perhaps there is mistake there. >> > > Also, run `ps -eflww` in the container to see what command-line >> arguments >> > > the JVM actually got started with. >> > > And share the full startup log somewhere (in a GitHub gist perhaps), >> > there >> > > might be something of interest earlier on. >> > > >> > > >> (running `echo ruok | nc zk1 2181` returns the expected "imok" >> > response >> > > >> from ZK within the docker container where Solr is located) >> > > >> * The netcat command mentioned above shows up in the ZK logs, but >> the >> > > Solr >> > > >> attempts to connect do not (it's like the request isn't even >> getting >> > to >> > > ZK) >> > > >> > > Then it doesn’t sound like a environmental >> > firewall/security-group/routing >> > > issue. >> > > Next step to debug then could be to check if you actually see Solr >> make >> > > tcp connections >> > > to port 2181, in the Solr container, using tcpdump/sysdig/netstat or >> some >> > > such. >> > > If that gives a negative result, then you know it’s an issue in your >> Solr >> > > invocation config, or name resolution. >> > > If that gives a positive result, then it’s environmental after all; >> and >> > > you can dig further. >> > > >> > > >> > > But try the ZK_HOST thing first; it may just fix it. >> > > >> > > — Martijn >> > >> >