[ https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050677#comment-17050677 ]
Shalin Shekhar Mangar commented on SOLR-13942: ---------------------------------------------- As someone who runs a managed search service and has to troubleshoot Solr issues, I want to add my 2 cents. There's plenty of information that is required for troubleshooting but is not available in clusterstatus or any other documented/public API. Sure there's the undocumented /admin/zookeeper which has a weird output format meant for I don't know who. But even that does not have a few things that I've found necessary to troubleshoot Solr. Here's a non-exhaustive list of things you need to troubleshoot Solr: # Length of overseer queues (available in overseerstatus API) # Contents of overseer queue (mildly useful, available in /admin/zookeeper) # Overseer election queue and current leader (former is available in /admin/zookeeper and latter in overseer status) # Cluster state (cluster status API) # Solr.xml (no API regardless of whether it is in ZK or filesystem) # Leader election queue and current leader for each shard (available in /admin/zookeeper) # Shard terms for each shard/replica (not available in any API) # Metrics/stats (metrics API) # Solr Logs (log API? unless it is rolled over) # GC logs (no API) The overseerstatus API cannot be hit if there is no overseer so there's that too. We run ZK and Solr inside kubernetes and we do not expose zookeeper publicly. So, to use a tool like zkcli means we have to port forward directly to the zk node which needs explicit privileges. Ideally we want to hit everything over http and never allow port forward privileges to anyone. So I see the following options: # Add missing information that is inside ZK (shard terms) to /admin/zookeeper and continue to live with its horrible output # Immediately change /admin/zookeeper to a better output format and change the UI to consume this new format # Deprecate /admin/zookeeper, introduce a clean API, migrate UI to this new endpoint or a better alternative and remove /admin/zookeeper in 9.0 # Not do anything and force people to use zkcli and existing solr apis for troubleshooting as we've been doing till now My vote is to go with #3 and we can debate what we want to call the API and whether it should a public, documented, supported API or an undocumented API like /admin/zookeeper. My preference is to keep this undocumented and unsupported just like /admin/zookeeper. The other question is how we can secure it -- is it enough to be the same as /admin/zookeeper from a security perspective? > /api/cluster/zk/* to fetch raw ZK data > -------------------------------------- > > Key: SOLR-13942 > URL: https://issues.apache.org/jira/browse/SOLR-13942 > Project: Solr > Issue Type: Bug > Reporter: Noble Paul > Assignee: Noble Paul > Priority: Major > Fix For: 8.5 > > Time Spent: 10m > Remaining Estimate: 0h > > example > download the {{state.json}} of > {code} > GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json > {code} > get a list of all children under {{/live_nodes}} > {code} > GET http://localhost:8983/api/cluster/zk/live_nodes > {code} > If the requested path is a node with children show the list of child nodes > and their meta data -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org