On Mon, Sep 21, 2015 at 6:08 PM, Junio C Hamano <[email protected]> wrote:
> Stefan Beller <[email protected]> writes:
>
>> +void default_start_failure(void *data,
>> + struct child_process *cp,
>> + struct strbuf *err)
>> +{
>> + int i;
>> + struct strbuf sb = STRBUF_INIT;
>> +
>> + for (i = 0; cp->argv[i]; i++)
>> + strbuf_addf(&sb, "%s ", cp->argv[i]);
>> + die_errno("Starting a child failed:\n%s", sb.buf);
>
> Do we want that trailing SP after the last element of argv[]?
> Same question applies to the one in "return-value".
done
>
>> +static void run_processes_parallel_init(struct parallel_processes *pp,
>> + int n, void *data,
>> + get_next_task_fn get_next_task,
>> + start_failure_fn start_failure,
>> + return_value_fn return_value)
>> +{
>> + int i;
>> +
>> + if (n < 1)
>> + n = online_cpus();
>> +
>> + pp->max_processes = n;
>> + pp->data = data;
>> + if (!get_next_task)
>> + die("BUG: you need to specify a get_next_task function");
>> + pp->get_next_task = get_next_task;
>> +
>> + pp->start_failure = start_failure ? start_failure :
>> default_start_failure;
>> + pp->return_value = return_value ? return_value : default_return_value;
>
> I would actually have expected that leaving these to NULL will just
> skip pp->fn calls, instead of a "default implementation", but a pair
> of very simple default implementation would not hrtut.
Ok, I think the default implementation provided is a reasonable default, as
it provides enough information in case of an error.
>
>> +static void run_processes_parallel_cleanup(struct parallel_processes *pp)
>> +{
>> + int i;
>
> Have a blank between the decl block and the first stmt here (and
> elsewhere, too---which you got correct in the function above)?
done
>
>> + for (i = 0; i < pp->max_processes; i++)
>> + strbuf_release(&pp->children[i].err);
>
>> +static void run_processes_parallel_start_one(struct parallel_processes *pp)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < pp->max_processes; i++)
>> + if (!pp->children[i].in_use)
>> + break;
>> + if (i == pp->max_processes)
>> + die("BUG: bookkeeping is hard");
>
> Mental note: the caller is responsible for not calling this when all
> slots are taken.
>
>> + if (!pp->get_next_task(pp->data,
>> + &pp->children[i].process,
>> + &pp->children[i].err)) {
>> + pp->all_tasks_started = 1;
>> + return;
>> + }
>
> Mental note: but it is OK to call this if get_next_task() previously
> said "no more task".
>
> The above two shows a slight discrepancy (nothing earth-breaking).
I see. Maybe this can be improved by having the
run_processes_parallel_start_as_needed call get_next_task
and pass the information into the run_processes_parallel_start_one
or as we had it before, combine these two functions again.
>
> I have this suspicion that the all-tasks-started bit may turn out to
> be a big mistake that we may later regret. Don't we want to allow
> pp->more_task() to say "no more task to run at this moment" implying
> "but please do ask me later, because I may have found more to do by
> the time you ask me again"?
And this task would arise because the current running children produce
more work to be done?
So you would have a
more_tasks() question. If that returns true
get_next_task() must provide that next task?
In case we had more work to do, which is based on the outcome of the
children, we could just wait in get_next_task for a semaphore/condition
variable from the return_value. Though that would stop progress reporting
end maybe lock up the whole program due to pipe clogging.
It seems to be a better design as we come back to the main loop fast
which does the polling. Although I feel like it is over engineered for now.
So how would you find out when we are done?
* more_tasks() could have different return values in an enum
(YES_THERE_ARE, NO_BUT_ASK_LATER, NO_NEVER_ASK_AGAIN)
* There could be yet another callback more_tasks_available() and
parallel_processing_should_stop()
* Hand back a callback ourselfs [Call signal_parallel_processing_done(void*)
when more_tasks will never return true again, with a void* we provide to
more_tasks()]
* ...
>
> That is one of the reasons why I do not think the "very top level is
> a bulleted list" organization is a good idea in general. A good
> scheduling decision can seldom be made in isolation without taking
> global picture into account.
>
>> +static void run_processes_parallel_collect_finished(struct
>> parallel_processes *pp)
>> +{
>> + int i = 0;
>> + pid_t pid;
>> + int wait_status, code;
>> + int n = pp->max_processes;
>> +
>> + while (pp->nr_processes > 0) {
>> + pid = waitpid(-1, &wait_status, WNOHANG);
>> + if (pid == 0)
>> + return;
>> +
>> + if (pid < 0)
>> + die_errno("wait");
>> +
>> + for (i = 0; i < pp->max_processes; i++)
>> + if (pp->children[i].in_use &&
>> + pid == pp->children[i].process.pid)
>> + break;
>> + if (i == pp->max_processes)
>> + /*
>> + * waitpid returned another process id
>> + * which we are not waiting for.
>> + */
>> + return;
>
> If we culled a child process that this machinery is not in charge
> of, waitpid() in other places that wants to see that child will not
> see it. Perhaps such a situation might even warrant an error() or
> BUG()? Do we want a "NEEDSWORK: Is this a bug?" comment here at
> least?
>
>> + if (strbuf_read_once(&pp->children[i].err,
>> + pp->children[i].process.err, 0) < 0 &&
>> + errno != EAGAIN)
>> + die_errno("strbuf_read_once");
>
> Don't we want to read thru to the end here? The reason read_once()
> did not read thru to the end may not have anything to do with
> NONBLOCK (e.g. xread_nonblock() caps len, and it does not loop).
right.
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html