[
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050677#comment-17050677
]
Shalin Shekhar Mangar commented on SOLR-13942:
----------------------------------------------
As someone who runs a managed search service and has to troubleshoot Solr
issues, I want to add my 2 cents.
There's plenty of information that is required for troubleshooting but is not
available in clusterstatus or any other documented/public API. Sure there's the
undocumented /admin/zookeeper which has a weird output format meant for I don't
know who. But even that does not have a few things that I've found necessary to
troubleshoot Solr.
Here's a non-exhaustive list of things you need to troubleshoot Solr:
# Length of overseer queues (available in overseerstatus API)
# Contents of overseer queue (mildly useful, available in /admin/zookeeper)
# Overseer election queue and current leader (former is available in
/admin/zookeeper and latter in overseer status)
# Cluster state (cluster status API)
# Solr.xml (no API regardless of whether it is in ZK or filesystem)
# Leader election queue and current leader for each shard (available in
/admin/zookeeper)
# Shard terms for each shard/replica (not available in any API)
# Metrics/stats (metrics API)
# Solr Logs (log API? unless it is rolled over)
# GC logs (no API)
The overseerstatus API cannot be hit if there is no overseer so there's that
too.
We run ZK and Solr inside kubernetes and we do not expose zookeeper publicly.
So, to use a tool like zkcli means we have to port forward directly to the zk
node which needs explicit privileges. Ideally we want to hit everything over
http and never allow port forward privileges to anyone.
So I see the following options:
# Add missing information that is inside ZK (shard terms) to /admin/zookeeper
and continue to live with its horrible output
# Immediately change /admin/zookeeper to a better output format and change the
UI to consume this new format
# Deprecate /admin/zookeeper, introduce a clean API, migrate UI to this new
endpoint or a better alternative and remove /admin/zookeeper in 9.0
# Not do anything and force people to use zkcli and existing solr apis for
troubleshooting as we've been doing till now
My vote is to go with #3 and we can debate what we want to call the API and
whether it should a public, documented, supported API or an undocumented API
like /admin/zookeeper. My preference is to keep this undocumented and
unsupported just like /admin/zookeeper. The other question is how we can secure
it -- is it enough to be the same as /admin/zookeeper from a security
perspective?
> /api/cluster/zk/* to fetch raw ZK data
> --------------------------------------
>
> Key: SOLR-13942
> URL: https://issues.apache.org/jira/browse/SOLR-13942
> Project: Solr
> Issue Type: Bug
> Reporter: Noble Paul
> Assignee: Noble Paul
> Priority: Major
> Fix For: 8.5
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> example
> download the {{state.json}} of
> {code}
> GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json
> {code}
> get a list of all children under {{/live_nodes}}
> {code}
> GET http://localhost:8983/api/cluster/zk/live_nodes
> {code}
> If the requested path is a node with children show the list of child nodes
> and their meta data
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]