Any progress on this?

On Fri, Apr 21, 2023 at 3:40 PM Jim Idle <[email protected]> wrote:

> So, I can no longer find much documentation on AIX 6.1 because it is
> end-of-lifed. But I think that you can determine if this is the cause of
> your problem by:
>
> edit the file /sbin/rc.boot using sudo (it may be /etc/rc.boot on your
> version of AIX), and find where it starts the syncd daemon. Start out with
> 5 seconds - in some cases you can make this shorter. It looks like the -i
> option is what you need:
>
> start /usr/sbin/syncd -i 5
>
> You can change it for the current system without making it permanent by
> killing the syncd and restarting it with a new seconds value. That way you
> can try different values until you get one that suits your system.
>
> In the AIX 7.1 documentation it also recommends to turn on
> the random write behind function using the ioo command, but I suspect that
> that is not there on AIX 6.1.
>
> I suspect that things hand while this is happening because by default, the
> process causes locks to be held against the inodes (jBASE files) that have
> dirty writes outstanding. In AIX 7.1, you can prevent syncd from locking
> the inode with ioo -o sync_release_ilock=1. See if that is also an option
> in 6.1.
>
> Please let us know if that helps.
>
> However, if it does help, then you need to work out why this is now an
> issue when it was not before. I can only think that there has been a change
> to your application software, but that is speculation of course.
>
> Jim
>
> On Thu, Apr 20, 2023 at 11:25 AM Jim Idle <[email protected]> wrote:
>
>> Ah, right. I think you can rule out the network then, as you are seeing
>> intermittent stutter in actual program response time here. That's kicked
>> out a lot of issues. To be honest, I should have recognized this as an XY
>> problem from the start - my apologies.
>>
>> Now, I used to be a dab hand at tuning AIX, especially with jBASE of
>> course, but it's been a "number of years" ;)
>>
>> So, something has changed, but unless you changed some of your
>> application software, then it is something that has happened over time that
>> has now hit a bottleneck.There have been a few good suggestions here
>> already, so i will assume that you have looked at those by now.
>>
>> So, the main thing I remember was that there is a kernel tuning parameter
>> for memory flushing of dirty memory buffers, which was (and I assume still
>> is) controlled by a flush daemon, which I think used to be syncd or flushd.
>> The parameter controls how often this daemon runs.
>>
>> Right now, this is my first guess as to what is happening as this was
>> always the answer back in the day. And, guess what? The default time for
>> this demon to run is either 30 or 40 seconds, which seems to fit the bill
>>
>> The scenario is as follows:
>>
>>    - Someone thinks that this kernel parameter should be high and
>>    changes it such that the system doesn't try to flush dirty memory to disk
>>    until it gets a lot of dirty buffers
>>    - Nothing seems to change right away, but one day your workload
>>    changes slightly and...
>>    - The syncd (or whatever it is these days) wakes up every 30 seconds
>>    or so, sees that 70% of your memory is in need of being written and it
>>    tries to do that all at once in one massive glob of writes - everything
>>    else has to pause and wait.
>>    - The actual setting should be that the flush cycle runs more often,
>>    not less often so that you get a smooth, averaged out performance.
>>    - The setting, especially on a write busy system, should be about 5
>>    seconds
>>
>> This was performance problem #1 with jBASE on AIX. AIX is generally a
>> great system, but tuning it is a bit of a nightmare sometimes. I used to
>> have a whole instruction set for people in the field to do this, but I
>> don't have access to that and haven't for a long time. I don't know if
>> maybe someone like *Bruce Decker *has a copy of that email - he might.
>> If not, then we will need to find out what I used to do starting from first
>> principles.
>>
>> The daemon is either flushd or syncd (it is called different things on
>> different systems). As I say, the default is 30 seconds or
>> something similar. You want this to run MORE often, not LESS often. Also,
>> have a think about whether the system workload has changed in terms of
>> writes. More users? Extra business? Someone changed the background tasks to
>> do more writes?
>>
>> I will try and find my notes etc about this, but while I cannot guarantee
>> that this is your issue, I would be willing to bet a pint on it. We would
>> need to run some vmstat and related commands to put this together, but I
>> bet if you ran that command at the same time as your script that measured
>> above, that you will find that the delay corresponds to a massive spike in
>> disk writes.
>>
>> BTW, your system is quite a bit out of date; AIX has been basically
>> end-of-lifed and we are on AIX 7.1 now I think. I would recommend
>> upgrading, and probably moving to AWS rather than physical hardware. Also,
>> upgrade jBASE and switch to the file type that does not need any sizing
>> maintenance. My own tests show those files to be the fastest we ever had. I
>> don't know how many users you have, but even if you wanted local hardware,
>> I think it would be a trivial cost to move to a decent rack based modern
>> system with Linux. probably save the money on power costs!
>>
>> There is no work out there in the world right now, so if this is a big
>> issue for you, then I am available for hire on a no win no fee basis ;)
>>
>> Jim
>>
>>
>>
>>
>> On Wed, Apr 19, 2023 at 2:58 AM Alan Metz <[email protected]> wrote:
>>
>>> Well...
>>> I did some more testing.
>>> btw AIX 6.1, no changes to AIX, and yes, using telnet
>>>
>>> I assume that you have ruled out network configuration changes?  (We did
>>> increase bandwidth across the entire network recently)
>>>
>>>
>>> I removed the SD-WAN network from the equation over the weekend.
>>> I attached my laptop to a switch and the server ONLY to the same switch
>>> - I did notice the delay.
>>>
>>> Tried same setup with a different switch - noticed delay
>>> Tried different Ethernet cable from server to switch - noticed delay
>>> (I wrote a program to track the frequency by Executing a LISTPEQS and
>>> recording the time it took to render the results, if greater then 1 second
>>> I tracked the time - most
>>> iterations are less than 1 second. What I found out was that it appears
>>> a ~40 second delay occurs approximately 5 to 6 minutes apart. (there were a
>>> few 2 to 3 second pauses
>>> between that I excluded))
>>> Event Date  Start Time    End Time    Delay Seconds
>>> 04/18/2023  11:42:10AM  11:42:53AM    43
>>> 04/18/2023  11:47:56AM  11:48:39AM    43
>>> 04/18/2023  11:53:33AM  11:54:16AM    43
>>> 04/18/2023  11:59:28AM  12:00:12PM    44
>>> 04/18/2023  12:05:13PM  12:05:54PM    41
>>> 04/18/2023  12:11:03PM  12:11:39PM    36
>>> 04/18/2023  12:16:53PM  12:17:34PM    41
>>> 04/18/2023  12:22:34PM  12:23:17PM    43
>>> 04/18/2023  12:28:34PM  12:29:16PM    42
>>> 04/18/2023  12:34:05PM  12:34:48PM    43
>>> 04/18/2023  12:39:50PM  12:40:34PM    44
>>> 04/18/2023  12:45:43PM  12:46:26PM    43
>>> 04/18/2023  12:51:26PM  12:52:09PM    43
>>> 04/18/2023  12:57:05PM  12:57:49PM    44
>>> 04/18/2023  01:02:33PM  01:03:16PM    43
>>> 04/18/2023  01:08:26PM  01:09:09PM    43
>>> 04/18/2023  01:14:22PM  01:15:05PM    43
>>> 04/18/2023  01:19:58PM  01:20:42PM    44
>>> 04/18/2023  01:25:42PM  01:26:27PM    45
>>> 04/18/2023  01:31:41PM  01:32:25PM    44
>>>
>>> My question is can I somehow determine if a background process is
>>> causing the hangs?  I do have Phantoms jobs in Jbase running; however, The
>>> code has not changed in years and no new Phantoms
>>> have been added.
>>>
>>> I have added more users on  the network over time, but removing the
>>> network as mentioned above was tested.  Unfortunately, I didn't write the
>>> tracking program until Monday, after the "removing the network" test.
>>> (I will say that the delays didn't appear to be as frequent with just me
>>> and the server test - I suppose I could test that this weekend...)
>>>
>>> I wish I could provide more information, but I don't know what else to
>>> test??
>>> Thanks,
>>> Al
>>>
>>>
>>> On Sat, Apr 15, 2023 at 11:42 AM Kannan Seshadri <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>> Is it possible for you to execute whatever you are executing directly
>>>> on the AIX console with a telnet session?   This will clearly tell you
>>>> whether you have a network issue or not?
>>>>
>>>> Thanks and Regards
>>>>
>>>> On Sat, Apr 15, 2023 at 6:42 PM Bruce Decker <[email protected]>
>>>> wrote:
>>>>
>>>>> Is the delay between the login prompt and the password prompt?  As
>>>>> jimi asked, more details.
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Apr 15, 2023, at 8:45 AM, Jim Idle <[email protected]> wrote:
>>>>>
>>>>> 
>>>>> I think more details are needed Alan.
>>>>>
>>>>> What version of AIX are you running?
>>>>> Are you really using telnet and not ssh? Telnet is unlikely to be
>>>>> maintained.
>>>>> I assume that you have ruled out network configuration changes?
>>>>> Any upgrades to AIX lately?
>>>>> Any change to the network load? New devices?
>>>>>
>>>>> In the absence of any changes, then I would definitely be looking at
>>>>> network problems. When you say you tried with just one user, do you mean
>>>>> literally one device and the server only on the network? If there is a
>>>>> faulty system somewhere, or malware, then that would still eat your 
>>>>> network
>>>>> Bandwidth.
>>>>>
>>>>> Finally, I presume you have done the obvious and rebooted the server
>>>>> and all the network gear? You’ll probably have to start from first
>>>>> principles with no devices on the network and gradually add them in.
>>>>>
>>>>> On Fri, Apr 14, 2023 at 21:12 Alan Metz <[email protected]> wrote:
>>>>>
>>>>>> All,
>>>>>>    I have recently been experiencing sporadic response delays when
>>>>>> accessing Jbase, (version 5.6.0.2), from telnet sessions with all users 
>>>>>> in
>>>>>> my company. At first I thought it was a network issue; however, I have
>>>>>> tested this with only one user and the Jbase server plugged into a switch
>>>>>> and was able to duplicate the hesitation. I am not logging any errors on 
>>>>>> my
>>>>>> AIX server that would indicate a hardware issue. I am not sure how to
>>>>>> further trouble-shoot this issue and am asking for suggestions.  This
>>>>>> system has been rock solid since 2018.
>>>>>> Thanks,
>>>>>> Al
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum.
>>>>>>
>>>>>> To post, send email to [email protected]
>>>>>> To unsubscribe, send email to [email protected]
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/jBASE?hl=en
>>>>>>
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "jBASE" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/jbase/CAPPLyKCej9SMfqOPoBnnQLSUVSDWMEsP-1CFsCgMSZya0yS0NQ%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/jbase/CAPPLyKCej9SMfqOPoBnnQLSUVSDWMEsP-1CFsCgMSZya0yS0NQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>>>>> --
>>>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum.
>>>>>
>>>>> To post, send email to [email protected]
>>>>> To unsubscribe, send email to [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/jBASE?hl=en
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "jBASE" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/jbase/CAGPPfg_KxpM_KaKWMJFJUenJq7vhwYdQsB4-7LKKWz0dasggyQ%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/jbase/CAGPPfg_KxpM_KaKWMJFJUenJq7vhwYdQsB4-7LKKWz0dasggyQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> --
>>>>> --
>>>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum.
>>>>>
>>>>> To post, send email to [email protected]
>>>>> To unsubscribe, send email to [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/jBASE?hl=en
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "jBASE" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/jbase/61B6FF9C-3382-473A-9505-A2F842E7A48D%40bluepinc.com
>>>>> <https://groups.google.com/d/msgid/jbase/61B6FF9C-3382-473A-9505-A2F842E7A48D%40bluepinc.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>>> --
>>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum.
>>>>
>>>> To post, send email to [email protected]
>>>> To unsubscribe, send email to [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/jBASE?hl=en
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "jBASE" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/jbase/CAOJugBvh7vH0Vrswnbu2B9oLCr0GeOAE_-%2B2Rz4qOBHTz43EvA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/jbase/CAOJugBvh7vH0Vrswnbu2B9oLCr0GeOAE_-%2B2Rz4qOBHTz43EvA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> --
>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum.
>>>
>>> To post, send email to [email protected]
>>> To unsubscribe, send email to [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/jBASE?hl=en
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "jBASE" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/jbase/CAPPLyKCBBC0S7PCtcM-fLEwnFeQwSvwTfmEKaZZF0waPyYhW1g%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/jbase/CAPPLyKCBBC0S7PCtcM-fLEwnFeQwSvwTfmEKaZZF0waPyYhW1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
-- 
IMPORTANT: T24/Globus posts are no longer accepted on this forum.

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"jBASE" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jbase/CAGPPfg80ktBdQUMAXgKjT3yuBXALvj1PTUFhCpw8g0EJpKzsuA%40mail.gmail.com.

Reply via email to