> -----Original Message-----
> From: Xen-devel [mailto:[email protected]] On Behalf Of
> Paul Durrant
> Sent: 06 March 2019 11:24
> To: 'Jan Beulich' <[email protected]>
> Cc: Stefano Stabellini <[email protected]>; Wei Liu
> <[email protected]>; Konrad Rzeszutek Wilk
> <[email protected]>; Andrew Cooper <[email protected]>; Tim
> (Xen.org) <[email protected]>;
> George Dunlap <[email protected]>; Julien Grall
> <[email protected]>; xen-devel <xen-
> [email protected]>; Ian Jackson <[email protected]>; Roger Pau
> Monne
> <[email protected]>
> Subject: Re: [Xen-devel] [PATCH v3 8/9] viridian: add implementation of
> synthetic timers
>
> > -----Original Message-----
> > From: Jan Beulich [mailto:[email protected]]
> > Sent: 25 February 2019 14:54
> > To: Paul Durrant <[email protected]>
> > Cc: Julien Grall <[email protected]>; Andrew Cooper
> > <[email protected]>; Roger Pau Monne
> > <[email protected]>; Wei Liu <[email protected]>; George Dunlap
> > <[email protected]>; Ian
> > Jackson <[email protected]>; Stefano Stabellini
> > <[email protected]>; xen-devel <xen-
> > [email protected]>; Konrad Rzeszutek Wilk
> > <[email protected]>; Tim (Xen.org)
> > <[email protected]>
> > Subject: Re: [PATCH v3 8/9] viridian: add implementation of synthetic timers
> >
> > >>> On 31.01.19 at 11:47, <[email protected]> wrote:
> > > --- a/xen/arch/x86/hvm/viridian/synic.c
> > > +++ b/xen/arch/x86/hvm/viridian/synic.c
> > > @@ -329,7 +329,53 @@ void viridian_synic_domain_deinit(struct domain *d)
> > >
> > > void viridian_synic_poll_messages(struct vcpu *v)
> > > {
> > > - /* There are currently no message sources */
> > > + viridian_time_poll_timers(v);
> > > +}
> > > +
> > > +bool viridian_synic_deliver_timer_msg(struct vcpu *v, unsigned int sintx,
> > > + unsigned int index,
> > > + int64_t expiration, int64_t
> > > delivery)
> > > +{
> > > + const union viridian_sint_msr *vs =
> > > &v->arch.hvm.viridian->sint[sintx];
> > > + HV_MESSAGE *msg = v->arch.hvm.viridian->simp.ptr;
> > > + struct {
> > > + uint32_t TimerIndex;
> > > + uint32_t Reserved;
> > > + uint64_t ExpirationTime;
> > > + uint64_t DeliveryTime;
> > > + } payload = {
> > > + .TimerIndex = index,
> > > + .ExpirationTime = expiration,
> > > + .DeliveryTime = delivery,
> > > + };
> > > +
> > > + if ( test_bit(sintx, &v->arch.hvm.viridian->msg_pending) )
> > > + return false;
> > > +
> > > + BUILD_BUG_ON(sizeof(*msg) != HV_MESSAGE_SIZE);
> > > + msg += sintx;
> > > +
> > > + /*
> > > + * To avoid using an atomic test-and-set this function must be called
> > > + * in context of the vcpu receiving the message.
> > > + */
> > > + ASSERT(v == current);
> > > + if ( msg->Header.MessageType != HvMessageTypeNone )
> > > + {
> > > + msg->Header.MessageFlags.MessagePending = 1;
> > > + set_bit(sintx, &v->arch.hvm.viridian->msg_pending);
> >
> > As per the comment above this is always in context of the subject
> > vCPU. It looks to me as if this was also the case for the two
> > clear_bit() on the field in the prior patch. If so, all three could be
> > the non-atomic variants instead.
>
> The only slight subtlety I think is the one in the wrmsr function, which can
> be called in context of a
> domain restore. I think it's still ok for it to be non-atomic in this case
> but I'll assert (v =
> current || !v->running), which I think should cover it.
>
> >
> > > + return false;
> > > + }
> > > +
> > > + msg->Header.MessageType = HvMessageTimerExpired;
> > > + msg->Header.MessageFlags.MessagePending = 0;
> > > + msg->Header.PayloadSize = sizeof(payload);
> > > + memcpy(msg->Payload, &payload, sizeof(payload));
> > > +
> > > + if ( !vs->fields.mask )
> > > + vlapic_set_irq(vcpu_vlapic(v), vs->fields.vector, 0);
> >
> > If this wasn't with v == current, I think you'd also need a barrier
> > here. Could you extend the comment above to also mention this
> > aspect?
>
> Ok.
>
> >
> > > @@ -118,14 +119,237 @@ static int64_t time_ref_count(struct domain *d)
> > > return raw_trc_val(d) + trc->off;
> > > }
> > >
> > > +static int64_t time_now(struct domain *d)
> >
> > Why would this return a signed value? And can't the function
> > parameter be const?
>
> The function parameter can be const, but I think the result needs to be
> signed for the missed ticks
> logic in start_timer() to work correctly.
>
> >
> > > +{
> > > + const struct viridian_page *rt =
> > > &d->arch.hvm.viridian->reference_tsc;
> > > + HV_REFERENCE_TSC_PAGE *p = rt->ptr;
> > > + uint32_t start, end;
> > > + __int128_t tsc;
> > > + __int128_t scale;
> >
> > I don't think you need both of them be 128 bits wide. I also don't
> > see why either would want to be of a signed type.
>
> The spec says (as in the comment below):
>
> "The partition reference time is computed by the following formula:
>
> ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset
>
> The multiplication is a 64 bit multiplication, which results in a 128 bit
> number which is then shifted
> 64 times to the right to obtain the high 64 bits.TscScale"
>
> Again, I'm using signed arithmetic as I think it's necessary for the missed
> ticks logic to work
> correctly in the event of an overflow.
FAOD the code that I am concerned about is:
/*
* The timer is already started, so we're re-scheduling.
* Hence advance the timer expiration by one tick.
*/
next = vs->expiration + vs->count;
/* Now check to see if any expirations have been missed */
if ( now - next > 0 )
missed = (now - next) / vs->count;
If now and next were unsigned then next may overflow such that (now - next)
ends up being very large, rather than negative, so I'd end up calculating a
completely bogus value for missed.
Paul
>
> >
> > > + int64_t offset;
> > > +
> > > + /*
> > > + * If the reference TSC page is not enabled, or has been invalidated
> > > + * fall back to the partition reference counter.
> > > + */
> > > + if ( !p || !p->TscSequence )
> > > + return time_ref_count(d);
> > > +
> > > + /*
> > > + * The following sampling algorithm for tsc, scale and offset is
> > > + * documented in the specifiction.
> > > + */
> > > + start = p->TscSequence;
> > > +
> > > + do {
> > > + tsc = rdtsc();
> > > + scale = p->TscScale;
> > > + offset = p->TscOffset;
> > > +
> > > + smp_mb();
> > > + end = p->TscSequence;
> >
> > Why is this a full barrier, rather than just a read one? And don't you need
> > to add a counterpart in update_reference_tsc()?
>
> Yes, a read barrier is enough with the counterpart write barrier added.
>
> >
> > > + } while (end != start);
> >
> > update_reference_tsc() increments TscSequence. If end doesn't match
> > start at this point, you're entering a near infinite loop here as long as
> > you don't update start inside the loop. I also think that there's a
> > second read barrier needed between this initial reading of the sequence
> > number and the reading of the actual values.
>
> Yes, the start value should be inside the loop of course.
>
> >
> > > + /*
> > > + * The specification says: "The partition reference time is computed
> > > + * by the following formula:
> > > + *
> > > + * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset
> > > + *
> > > + * The multiplication is a 64 bit multiplication, which results in a
> > > + * 128 bit number which is then shifted 64 times to the right to
> > > obtain
> > > + * the high 64 bits."
> > > + */
> > > + return ((tsc * scale) >> 64) + offset;
> > > +}
> > > +
> > > +static void stop_stimer(struct viridian_stimer *vs)
> > > +{
> > > + struct vcpu *v = vs->v;
> >
> > const?
>
> Ok.
>
> >
> > > + unsigned int stimerx = vs - &v->arch.hvm.viridian->stimer[0];
> > > +
> > > + if ( !vs->started )
> > > + return;
> > > +
> > > + stop_timer(&vs->timer);
> > > + clear_bit(stimerx, &v->arch.hvm.viridian->stimer_pending);
> > > + vs->started = false;
> > > +}
> > > +
> > > +static void stimer_expire(void *data)
> > > +{
> > > + struct viridian_stimer *vs = data;
> >
> > const?
>
> Ok.
>
> >
> > > + struct vcpu *v = vs->v;
> > > + unsigned int stimerx = vs - &v->arch.hvm.viridian->stimer[0];
> > > +
> > > + if ( !vs->config.fields.enabled )
> > > + return;
> > > +
> > > + set_bit(stimerx, &v->arch.hvm.viridian->stimer_pending);
> > > + vcpu_kick(v);
> > > +}
> > > +
> > > +static void start_stimer(struct viridian_stimer *vs)
> > > +{
> > > + struct vcpu *v = vs->v;
> > > + unsigned int stimerx = vs - &v->arch.hvm.viridian->stimer[0];
> > > + int64_t now = time_now(v->domain);
> > > + s_time_t timeout;
> > > +
> > > + if ( !test_and_set_bit(stimerx,
> > > &v->arch.hvm.viridian->stimer_enabled) )
> > > + printk(XENLOG_G_INFO "%pv: VIRIDIAN STIMER%u: enabled\n", v,
> > > + stimerx);
> > > +
> > > + if ( vs->config.fields.periodic )
> > > + {
> > > + unsigned int missed = 0;
> > > + int64_t next;
> > > +
> > > + /*
> > > + * The specification says that if the timer is lazy then we
> > > + * skip over any missed expirations so we can treat this case
> > > + * as the same as if the timer is currently stopped, i.e. we
> > > + * just schedule expiration to be 'count' ticks from now.
> > > + */
> > > + if ( !vs->started || vs->config.fields.lazy )
> > > + {
> > > + next = now + vs->count;
> > > + }
> >
> > Unnecessary braces.
>
> Yes.
>
> >
> > > @@ -149,6 +373,57 @@ int viridian_time_wrmsr(struct vcpu *v, uint32_t
> > > idx, uint64_t val)
> > > }
> > > break;
> > >
> > > + case HV_X64_MSR_TIME_REF_COUNT:
> > > + return X86EMUL_EXCEPTION;
> >
> > Isn't this an unrelated change?
>
> It is. I'll call it out in the comment comment.
>
> >
> > > + case HV_X64_MSR_STIMER0_CONFIG:
> > > + case HV_X64_MSR_STIMER1_CONFIG:
> > > + case HV_X64_MSR_STIMER2_CONFIG:
> > > + case HV_X64_MSR_STIMER3_CONFIG:
> > > + {
> > > + unsigned int stimerx = (idx - HV_X64_MSR_STIMER0_CONFIG) / 2;
> > > + struct viridian_stimer *vs =
> > > &v->arch.hvm.viridian->stimer[stimerx];
> > > +
> > > + if ( !(viridian_feature_mask(d) & HVMPV_stimer) )
> > > + return X86EMUL_EXCEPTION;
> > > +
> > > + stop_stimer(vs);
> > > +
> > > + vs->config.raw = val;
> > > +
> > > + if ( !vs->config.fields.sintx )
> > > + vs->config.fields.enabled = 0;
> > > +
> > > + if ( vs->config.fields.enabled )
> > > + start_stimer(vs);
> > > +
> > > + break;
> > > + }
> > > + case HV_X64_MSR_STIMER0_COUNT:
> >
> > Missing blank line again (and also further down here as well as in the
> > rdmsr code).
> >
>
> Ok. TBH I've always thought the normal style was to omit the blank line if
> the case statement has
> braces.
>
> > > + case HV_X64_MSR_STIMER1_COUNT:
> > > + case HV_X64_MSR_STIMER2_COUNT:
> > > + case HV_X64_MSR_STIMER3_COUNT:
> > > + {
> > > + unsigned int stimerx = (idx - HV_X64_MSR_STIMER0_COUNT) / 2;
> > > + struct viridian_stimer *vs =
> > > &v->arch.hvm.viridian->stimer[stimerx];
> > > +
> > > + if ( !(viridian_feature_mask(d) & HVMPV_stimer) )
> > > + return X86EMUL_EXCEPTION;
> > > +
> > > + stop_stimer(vs);
> > > +
> > > + vs->count = val;
> > > +
> > > + if ( !vs->count )
> >
> > Any reason you don't use val here (which the compiler likely will do
> > anyway)?
>
> Not particularly, I just think it reads better and is more consistent with
> other code.
>
> >
> > > @@ -201,6 +476,32 @@ int viridian_time_rdmsr(const struct vcpu *v,
> > > uint32_t idx, uint64_t *val)
> > > break;
> > > }
> > >
> > > + case HV_X64_MSR_STIMER0_CONFIG:
> > > + case HV_X64_MSR_STIMER1_CONFIG:
> > > + case HV_X64_MSR_STIMER2_CONFIG:
> > > + case HV_X64_MSR_STIMER3_CONFIG:
> > > + {
> > > + unsigned int stimerx = (idx - HV_X64_MSR_STIMER0_CONFIG) / 2;
> > > +
> > > + if ( !(viridian_feature_mask(d) & HVMPV_stimer) )
> > > + return X86EMUL_EXCEPTION;
> > > +
> > > + *val = v->arch.hvm.viridian->stimer[stimerx].config.raw;
> >
> > While more noticeable here and ...
> >
> > > + break;
> > > + }
> > > + case HV_X64_MSR_STIMER0_COUNT:
> > > + case HV_X64_MSR_STIMER1_COUNT:
> > > + case HV_X64_MSR_STIMER2_COUNT:
> > > + case HV_X64_MSR_STIMER3_COUNT:
> > > + {
> > > + unsigned int stimerx = (idx - HV_X64_MSR_STIMER0_COUNT) / 2;
> > > +
> > > + if ( !(viridian_feature_mask(d) & HVMPV_stimer) )
> > > + return X86EMUL_EXCEPTION;
> > > +
> > > + *val = v->arch.hvm.viridian->stimer[stimerx].count;
> >
> > ... here, array_access_nospec() are probably needed not just here,
> > but also in the wrmsr logic.
>
> Really? stimerx is calculated based on hitting the case statement in the
> first place.
>
> >
> > > --- a/xen/include/asm-x86/hvm/viridian.h
> > > +++ b/xen/include/asm-x86/hvm/viridian.h
> > > @@ -40,6 +40,33 @@ union viridian_sint_msr
> > > } fields;
> > > };
> > >
> > > +union viridian_stimer_config_msr
> > > +{
> > > + uint64_t raw;
> > > + struct
> > > + {
> > > + uint64_t enabled:1;
> > > + uint64_t periodic:1;
> > > + uint64_t lazy:1;
> > > + uint64_t auto_enable:1;
> > > + uint64_t vector:8;
> > > + uint64_t direct_mode:1;
> > > + uint64_t reserved_zero1:3;
> > > + uint64_t sintx:4;
> > > + uint64_t reserved_zero2:44;
> > > + } fields;
> > > +};
> > > +
> > > +struct viridian_stimer {
> > > + struct vcpu *v;
> >
> > Isn't a full 8-byte pointer a little too much overhead here? You could
> > instead store the timer index ...
>
> I think I need it in stimer_expire() which can be called in any vcpu context
> IIUC.
>
> >
> > > + struct timer timer;
> > > + union viridian_stimer_config_msr config;
> > > + uint64_t count;
> > > + int64_t expiration;
> > > + s_time_t timeout;
> > > + bool started;
> >
> > ... in a field using the 7-byte padding here, and use container_of()
> > to get at the outer structure.
>
> That would get me as far as viridian_vcpu, but there's no pointer to struct
> vcpu in there, and I need
> one to call vcpu_kick().
>
> Paul
>
> >
> > Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/xen-devel