On 14.05.19 11:23, Christian Borntraeger wrote:
>
>
> On 14.05.19 11:20, David Hildenbrand wrote:
>> On 14.05.19 11:10, Christian Borntraeger wrote:
>>>
>>>
>>> On 14.05.19 10:59, David Hildenbrand wrote:
>>>> On 14.05.19 10:49, Cornelia Huck wrote:
>>>>> On Tue, 14 May 2019 10:37:32 +0200
>>>>> Christian Borntraeger <[email protected]> wrote:
>>>>>
>>>>>> On 14.05.19 09:28, David Hildenbrand wrote:
>>>>>>>>>> But that can be tested using the runability information if I am not
>>>>>>>>>> wrong.
>>>>>>>>>
>>>>>>>>> You mean the cpu level information, right?
>>>>>>>
>>>>>>> Yes, query-cpu-definition includes for each model runability information
>>>>>>> via "unavailable-features" (valid under the started QEMU machine).
>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> and others that we have today.
>>>>>>>>>>>
>>>>>>>>>>> So yes, I think this would be acceptable.
>>>>>>>>>>
>>>>>>>>>> I guess it is acceptable yes. I doubt anybody uses that many CPUs in
>>>>>>>>>> production either way. But you never know.
>>>>>>>>>
>>>>>>>>> I think that using that many cpus is a more uncommon setup, but I
>>>>>>>>> still
>>>>>>>>> think that having to wait for actual failure
>>>>>>>>
>>>>>>>> That can happen all the time today. You can easily say z14 in the xml
>>>>>>>> when
>>>>>>>> on a zEC12. Only at startup you get the error. The question is really:
>>>>>>>>
>>>>>>>
>>>>>>> "-smp 248 -cpu host" will no longer work, while e.g. "-smp 248 -cpu z12"
>>>>>>> will work. Actually, even "-smp 248" will no longer work on affected
>>>>>>> machines.
>>>>>>>
>>>>>>> That is why wonder if it is better to disable the feature and print a
>>>>>>> warning. Similar to CMMA, where want want to tolerate when CMMA is not
>>>>>>> possible in the current environment (huge pages).
>>>>>>>
>>>>>>> "Diag318 will not be enabled because it is not compatible with more than
>>>>>>> 240 CPUs".
>>>>>>>
>>>>>>> However, I still think that implementing support for more than one SCLP
>>>>>>> response page is the best solution. Guests will need adaptions for > 240
>>>>>>> CPUs with Diag318, but who cares? Existing setups will continue to work.
>>>>>>>
>>>>>>> Implementing that SCLP thingy will avoid any warnings and any errors. It
>>>>>>> just works from the QEMU perspective.
>>>>>>>
>>>>>>> Is implementing this realistic?
>>>>>>
>>>>>> Yes it is but it will take time. I will try to get this rolling. To make
>>>>>> progress on the diag318 thing, can we error on startup now and simply
>>>>>> remove that check when when have implemented a larger sccb? If we would
>>>>>> now do all kinds of "change the max number games" would be harder to
>>>>>> "fix".
>>>>>
>>>>> So, the idea right now is:
>>>>>
>>>>> - fail to start if you try to specify a diag318 device and more than
>>>>> 240 cpus (do we need a knob to turn off the device?)
>>>>> - in the future, support more than one SCLP response page
>>>>>
>>>>> I'm getting a bit lost in the discussion; but the above sounds
>>>>> reasonable to me.
>>>>>
>>>>
>>>> We can
>>>>
>>>> 1. Fail to start with #cpus > 240 when diag318=on
>>>> 2. Remove the error once we support more than one SCLP response page
>>>>
>>>> Or
>>>>
>>>> 1. Allow to start with #cpus > 240 when diag318=on, but indicate only
>>>> 240 CPUs via SCLP
>>>> 2. Print a warning
>>>> 3. Remove the restriction and the warning once we support more than one
>>>> SCLP response page
>>>>
>>>> While I prefer the second approach (similar to defining zPCI devices
>>>> without zpci=on), I could also live with the first approach.
>>>
>>> I prefer approach 1.
>>>
>>
>> Isn't approach #2 what we discussed (limiting sclp, but of course to 247
>> CPUs), but with an additional warning? I'm confused.
>
> Different numbering interpretion. I was talking about 1 = "Allow to start
> with #cpus > 240 when diag318=on, but indicate only
> 240 CPUs via SCLP"
So yes, variant 2 when I use your numbering. The only question is: do we need
a warning? It probably does not hurt.