[ 
https://issues.apache.org/jira/browse/SOLR-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050677#comment-17050677
 ] 

Shalin Shekhar Mangar commented on SOLR-13942:
----------------------------------------------

As someone who runs a managed search service and has to troubleshoot Solr 
issues, I want to add my 2 cents.

There's plenty of information that is required for troubleshooting but is not 
available in clusterstatus or any other documented/public API. Sure there's the 
undocumented /admin/zookeeper which has a weird output format meant for I don't 
know who. But even that does not have a few things that I've found necessary to 
troubleshoot Solr.

Here's a non-exhaustive list of things you need to troubleshoot Solr:
# Length of overseer queues (available in overseerstatus API)
# Contents of overseer queue (mildly useful, available in /admin/zookeeper)
# Overseer election queue and current leader (former is available in 
/admin/zookeeper and latter in overseer status)
# Cluster state (cluster status API)
# Solr.xml (no API regardless of whether it is in ZK or filesystem)
# Leader election queue and current leader for each shard (available in 
/admin/zookeeper)
# Shard terms for each shard/replica (not available in any API)
# Metrics/stats (metrics API)
# Solr Logs (log API? unless it is rolled over)
# GC logs (no API)

The overseerstatus API cannot be hit if there is no overseer so there's that 
too.

We run ZK and Solr inside kubernetes and we do not expose zookeeper publicly. 
So, to use a tool like zkcli means we have to port forward directly to the zk 
node which needs explicit privileges. Ideally we want to hit everything over 
http and never allow port forward privileges to anyone.

So I see the following options:
# Add missing information that is inside ZK (shard terms) to /admin/zookeeper 
and continue to live with its horrible output
# Immediately change /admin/zookeeper to a better output format and change the 
UI to consume this new format
# Deprecate /admin/zookeeper, introduce a clean API, migrate UI to this new 
endpoint or a better alternative and remove /admin/zookeeper in 9.0
# Not do anything and force people to use zkcli and existing solr apis for 
troubleshooting as we've been doing till now

My vote is to go with #3 and we can debate what we want to call the API and 
whether it should a public, documented, supported API or an undocumented API 
like /admin/zookeeper. My preference is to keep this undocumented and 
unsupported just like /admin/zookeeper. The other question is how we can secure 
it -- is it enough to be the same as /admin/zookeeper from a security 
perspective?

> /api/cluster/zk/* to fetch raw ZK data
> --------------------------------------
>
>                 Key: SOLR-13942
>                 URL: https://issues.apache.org/jira/browse/SOLR-13942
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>            Priority: Major
>             Fix For: 8.5
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> example
> download the {{state.json}} of
> {code}
> GET http://localhost:8983/api/cluster/zk/collections/gettingstarted/state.json
> {code}
> get a list of all children under {{/live_nodes}}
> {code}
> GET http://localhost:8983/api/cluster/zk/live_nodes
> {code}
> If the requested path is a node with children show the list of child nodes 
> and their meta data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to