nickva commented on code in PR #5014:
URL: https://github.com/apache/couchdb/pull/5014#discussion_r1556404799


##########
src/couch_scanner/src/couch_scanner_plugin.erl:
##########
@@ -0,0 +1,662 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+% Scanner plugin runner process
+%
+% This is the process which is spawned and run for each enabled plugin.
+%
+% A number of these processes are managed by the couch_scanner_server via
+% start_link/1 and complete/1 functions. After a plugin runner is spawned, the 
only
+% thing couch_scanner_server does is wait for it to exit.
+%
+% The plugin runner process may exit normally, crash, or exit with {shutdown,
+% {reschedule, TSec}} if they want to reschedule to run again at some point the
+% future (next day, a week later, etc).
+%
+% After the process starts, it will load and validate the plugin module. Then,
+% it will start scanning all the dbs and docs on the local node. Shard ranges
+% will be scanned only on one of the cluster nodes to avoid duplicating work.
+% For instance, if there are 2 shard ranges, 0-7, 8-f, with copies on nodes n1,
+% n2, n3. Then, 0-7 might be scanned on n1 only, and 8-f on n3.
+%
+% The plugin API defined in the behavior definition section.
+%
+% The start/2 function is called when the plugin starts running. It returns
+% some context (St), which can be any Erlang term. All subsequent function
+% calls will be called with the same St object, and may return an updated
+% version of it.
+%
+% If the plugin hasn't finished runing and has resumed running after the node
+% was restarted or an error happened, the resume/2 function will be called.
+% That's the difference between start and resume: start/2 is called when the
+% scan starts from the beginning (first db, first shard, ...), and resume/2 is
+% called when the scanning hasn't finished and has to continue.
+%
+% If start/2 or resume/2 returns `reset` then the checkpoint will be reset and
+% the plugin will be restarted. This may be useful in cases when the plugin
+% detects configuration changes since last scanning session had already
+% started, or when the plugin module was updated and the checkpoint version is
+% stale.
+%
+% The checkpoint/1 callback is periodically called to checkpoint the scanning
+% progress. start/2 and resume/2 function will be called with the last saved
+% checkpoint map value.
+%
+% The complete/1 callback is called when the scan has finished. The complete
+% callback should return final checkpoint map object. The last checkoint will
+% be written and then.

Review Comment:
   Oh good catch. I updated it to add that the last checkpointed data will then 
be passed to `start/2` if the plugin runs again in the future. This allows the 
plugin to store some data from one scanning session and pass it to another.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to