nickva opened a new pull request, #5014:
URL: https://github.com/apache/couchdb/pull/5014
[WIP] Everything is place except docs and tests
The app scans all the dbs and docs. It has a plugin system to allow
gathering various things from a cluster. The first use is to scan all the
javascript design docs and run them through the new QuickJS javascript engine.
Other possible uses:
- Gather total db and view sizes
- Scan for document features (docs of certain sizes, contained certain
fields and values).
The plugins are managed as individual process by the couch_scanner_server
with the start_link/1 and stop/1 functions. After a plugin runner is spawned,
the only thing couch_scanner_server does is wait for it to exit.
The plugin runner process may exit normally, crash, or exit with {shutdown,
{reschedule, TSec}} if they want to reschedule to run again at some point the
future (next day, a week later, etc).
After the process starts, it will load and validate the plugin module. Then,
it will start scanning all the dbs and docs on the local node. Shard ranges
will be scanned only on one of the cluster nodes to avoid duplicating work. For
instance, if there are 2 shard ranges, 0-7, 8-f, with copies on nodes n1, n2,
n3. Then, 0-7 might be scanned on n1 only, and 8-f on n3.
The plugin API is the following (as OTP callback definitions):
```erlang
-callback start(ScanId :: binary(), EJson :: #{}) ->
{ok, St :: term()} | skip.
-callback resume(ScanId :: binary(), EJson :: #{}) ->
{ok, St :: term()} | skip.
-callback stop(St :: term()) ->
{ok, EJson :: #{}}.
-callback checkpoint(St :: term()) ->
{ok, EJson :: #{}}.
-callback db(St :: term(), DbName :: binary()) ->
{ok | skip | stop, St1 :: term()}.
-callback ddoc(St :: term(), DbName :: binary(), #doc{}) ->
{ok | stop, St1 :: term()}.
-callback shards(St :: term(), [#shard{}]) ->
{[#shard{}], St1 :: term()}.
-callback db_opened(St :: term(), Db :: term()) ->
{ok, St :: term()}.
-callback doc_id(St :: term(), DocId :: binary(), Db :: term()) ->
{ok | skip | stop, St1 :: term()}.
-callback doc(St :: term(), Db :: term(), #doc{}) ->
{ok | stop, St1 :: term()}.
-callback db_closing(St :: term(), Db :: term()) ->
{ok, St1 :: term()}.
```
A simple plugin `couch_scanner_plugin_ddoc_features` is included as first
example implementation. It traverses the design docs on a cluster and reports
when it finds Apache CouchDB 4.x deprecated features (lists, shows, etc).
Plugin module are enabled by `$plugin_mod = true` entries in the
`[couch_scanner_plugins]` section. For example, to enable
`couch_scanner_plugin_ddoc_features`:
```
[couch_scanner_plugins]
couch_scanner_plugin_ddoc_features = true
```
Plugins may configure their scheduling using `after` and `repeat` config
values. For example, to start after Unix time stamp 1711249693 and then run
every 3 days:
```
[couch_scanner_plugin_ddoc_features]
after = 1711249693
repeat = 3_days
```
The default values for `after` and `repeat` is ` = restart`, meaning to run
once after the node starts up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]