I wanted to follow up and share that I was able to get the compression logic to all work.
See https://github.com/apache/solr/pull/2298/commits/ee8657438799c07baadd8efa0f272bb57380d772 I *believe* that we could make the compression logic a option on SolrZkClient.Builder(), maybe SolrZkClient.Builder().withStateFileCompression(minStateByteLenForCompression, compressor) And that would simplify the code…? > On Mar 14, 2024, at 2:44 PM, Eric Pugh <ep...@opensourceconnections.com> > wrote: > > Justin, I went back and crawled through with fresh eyes based on what you > shared ;-). Now I understand that SOLRHOME variable in ZkCLI. > > Okay, so…. It appears that the logic to know if compression is enabled > REQUIRES you to have a solr_home, which means the compressed put only works > if you are on the same box as the solr_home? > > Unless, can I get it from ZK? So that we don’t have to be on the same box? > What if we assume that this property is defined in solr.xml in ZK, and just > look there? Oh wait, Ref Guide says: > > Loading solr.xml from Zookeeper is deprecated, and will not be supported in a > future version. Being the node config of Solr, this file must be available at > early startup and also be allowed to differ between nodes. > > Do we have an API that provides access to Solr’s solr.xml settings? > > I *think* this is hard to solve cleanly because the zk sub commands don’t > interact with Solr, instead they go directly to ZooKeeper.. If they were > mediated by Solr, then the logic about compression choices could remain > purely on the server side and not touch the client…. > > > Eric > > >> On Mar 5, 2024, at 9:39 AM, Eric Pugh <ep...@opensourceconnections.com> >> wrote: >> >> Thanks for sharing this….. >> >> So, maybe the least bad is to just copy the logic, maybe into SolrCLI.java? >> >> >>> On Mar 5, 2024, at 9:29 AM, Justin Sweeney <justin.sweene...@gmail.com> >>> wrote: >>> >>> The tricky part of setData as compared to getData is that in getData we can >>> tell the data is compressed based on the initial bytes read. For setData >>> the only way to know if we should compress the provided data is if we read >>> the solr.xml to know if compression is enabled or by adding an argument to >>> classes like ZkCpTool. >>> >>> Putting an uncompressed state.json into a cluster where compression is used >>> should still work fine so at this point I've erred on people using these >>> tools outside of the Solr cluster having an understanding of how they have >>> set up compression or not for when adding data into the cluster. Not >>> opposed to adding more arguments for some classes, but we don't have a way >>> to just handle compression in setData in the same way we do with getData. >>> >>> On Sat, Mar 2, 2024 at 11:23 AM Eric Pugh <ep...@opensourceconnections.com >>> <mailto:ep...@opensourceconnections.com>> >>> wrote: >>> >>>> Looking at this, when we use the ZkCpTool to upload content, I’m not sure >>>> it goes through the compression step? >>>> >>>> Looks like eventually we get to SolrZkClient.setData() and I don’t see >>>> anything about compression.. Unlike the SolrZkClient.getData()… >>>> >>>> Am I reading this right? Shouldn’t setData mimic getData in handling >>>> compression? >>>> >>>> Thanks for looking at this! >>>> >>>> >>>> >>>>> On Feb 29, 2024, at 12:29 PM, Justin Sweeney <justin.sweene...@gmail.com> >>>> wrote: >>>>> >>>>> I actually think that use case should just work since the SolrZkClient >>>> can >>>>> already handle compressed state.json, assuming you are just using the >>>>> default ZLib implementation of compression. When getting data it looks >>>> like >>>>> the ZkCpTool calls SolrZkClient.getData() which is able to check if the >>>>> data is compressed. >>>>> >>>>> On Thu, Feb 29, 2024 at 8:58 AM Eric Pugh < >>>> ep...@opensourceconnections.com <mailto:ep...@opensourceconnections.com> >>>> <mailto:ep...@opensourceconnections.com>> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am poking around ZkCLI.java, and noticed that the compression for a >>>>>> “state.json” file logic is in this file. I’m realizing that the >>>> existing >>>>>> bin/solr zk cp command knows nothing about a “state.json” file being >>>>>> compressed or not, and so if you do >>>>>> >>>>>> bin/solr zk cp my_local_state.json zk:/state.json -z >>>> localhost:9983 >>>>>> >>>>>> Then I think you don’t get the compression aspect kicking in. >>>>>> >>>>>> I could copy that logic into the ZkCpTool.java, but wondering if there >>>> is >>>>>> a better refactoring? Could this logic live in either >>>> SolrZkClient.java >>>>>> or ZkMaintenanceUtils.java ? >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Eric >>>>>> _______________________ >>>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >>>>>> http://www.opensourceconnections.com >>>>>> <http://www.opensourceconnections.com/> < >>>>>> http://www.opensourceconnections.com/> | My Free/Busy < >>>>>> http://tinyurl.com/eric-cal> >>>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >>>>>> >>>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >>>>> >>>>>> >>>>>> This e-mail and all contents, including attachments, is considered to be >>>>>> Company Confidential unless explicitly stated otherwise, regardless of >>>>>> whether attachments are marked as such. >>>> >>>> _______________________ >>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >>>> http://www.opensourceconnections.com >>>> <http://www.opensourceconnections.com/> < >>>> http://www.opensourceconnections.com/> | My Free/Busy < >>>> http://tinyurl.com/eric-cal> >>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >>>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> >>>> >>>> This e-mail and all contents, including attachments, is considered to be >>>> Company Confidential unless explicitly stated otherwise, regardless of >>>> whether attachments are marked as such. >> >> _______________________ >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >> http://www.opensourceconnections.com <http://www.opensourceconnections.com/> >> | My Free/Busy <http://tinyurl.com/eric-cal> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed >> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> >> >> This e-mail and all contents, including attachments, is considered to be >> Company Confidential unless explicitly stated otherwise, regardless of >> whether attachments are marked as such. >> > > _______________________ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com <http://www.opensourceconnections.com/> > | My Free/Busy <http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed > <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > _______________________ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.