Any progress on this? On Fri, Apr 21, 2023 at 3:40 PM Jim Idle <[email protected]> wrote:
> So, I can no longer find much documentation on AIX 6.1 because it is > end-of-lifed. But I think that you can determine if this is the cause of > your problem by: > > edit the file /sbin/rc.boot using sudo (it may be /etc/rc.boot on your > version of AIX), and find where it starts the syncd daemon. Start out with > 5 seconds - in some cases you can make this shorter. It looks like the -i > option is what you need: > > start /usr/sbin/syncd -i 5 > > You can change it for the current system without making it permanent by > killing the syncd and restarting it with a new seconds value. That way you > can try different values until you get one that suits your system. > > In the AIX 7.1 documentation it also recommends to turn on > the random write behind function using the ioo command, but I suspect that > that is not there on AIX 6.1. > > I suspect that things hand while this is happening because by default, the > process causes locks to be held against the inodes (jBASE files) that have > dirty writes outstanding. In AIX 7.1, you can prevent syncd from locking > the inode with ioo -o sync_release_ilock=1. See if that is also an option > in 6.1. > > Please let us know if that helps. > > However, if it does help, then you need to work out why this is now an > issue when it was not before. I can only think that there has been a change > to your application software, but that is speculation of course. > > Jim > > On Thu, Apr 20, 2023 at 11:25 AM Jim Idle <[email protected]> wrote: > >> Ah, right. I think you can rule out the network then, as you are seeing >> intermittent stutter in actual program response time here. That's kicked >> out a lot of issues. To be honest, I should have recognized this as an XY >> problem from the start - my apologies. >> >> Now, I used to be a dab hand at tuning AIX, especially with jBASE of >> course, but it's been a "number of years" ;) >> >> So, something has changed, but unless you changed some of your >> application software, then it is something that has happened over time that >> has now hit a bottleneck.There have been a few good suggestions here >> already, so i will assume that you have looked at those by now. >> >> So, the main thing I remember was that there is a kernel tuning parameter >> for memory flushing of dirty memory buffers, which was (and I assume still >> is) controlled by a flush daemon, which I think used to be syncd or flushd. >> The parameter controls how often this daemon runs. >> >> Right now, this is my first guess as to what is happening as this was >> always the answer back in the day. And, guess what? The default time for >> this demon to run is either 30 or 40 seconds, which seems to fit the bill >> >> The scenario is as follows: >> >> - Someone thinks that this kernel parameter should be high and >> changes it such that the system doesn't try to flush dirty memory to disk >> until it gets a lot of dirty buffers >> - Nothing seems to change right away, but one day your workload >> changes slightly and... >> - The syncd (or whatever it is these days) wakes up every 30 seconds >> or so, sees that 70% of your memory is in need of being written and it >> tries to do that all at once in one massive glob of writes - everything >> else has to pause and wait. >> - The actual setting should be that the flush cycle runs more often, >> not less often so that you get a smooth, averaged out performance. >> - The setting, especially on a write busy system, should be about 5 >> seconds >> >> This was performance problem #1 with jBASE on AIX. AIX is generally a >> great system, but tuning it is a bit of a nightmare sometimes. I used to >> have a whole instruction set for people in the field to do this, but I >> don't have access to that and haven't for a long time. I don't know if >> maybe someone like *Bruce Decker *has a copy of that email - he might. >> If not, then we will need to find out what I used to do starting from first >> principles. >> >> The daemon is either flushd or syncd (it is called different things on >> different systems). As I say, the default is 30 seconds or >> something similar. You want this to run MORE often, not LESS often. Also, >> have a think about whether the system workload has changed in terms of >> writes. More users? Extra business? Someone changed the background tasks to >> do more writes? >> >> I will try and find my notes etc about this, but while I cannot guarantee >> that this is your issue, I would be willing to bet a pint on it. We would >> need to run some vmstat and related commands to put this together, but I >> bet if you ran that command at the same time as your script that measured >> above, that you will find that the delay corresponds to a massive spike in >> disk writes. >> >> BTW, your system is quite a bit out of date; AIX has been basically >> end-of-lifed and we are on AIX 7.1 now I think. I would recommend >> upgrading, and probably moving to AWS rather than physical hardware. Also, >> upgrade jBASE and switch to the file type that does not need any sizing >> maintenance. My own tests show those files to be the fastest we ever had. I >> don't know how many users you have, but even if you wanted local hardware, >> I think it would be a trivial cost to move to a decent rack based modern >> system with Linux. probably save the money on power costs! >> >> There is no work out there in the world right now, so if this is a big >> issue for you, then I am available for hire on a no win no fee basis ;) >> >> Jim >> >> >> >> >> On Wed, Apr 19, 2023 at 2:58 AM Alan Metz <[email protected]> wrote: >> >>> Well... >>> I did some more testing. >>> btw AIX 6.1, no changes to AIX, and yes, using telnet >>> >>> I assume that you have ruled out network configuration changes? (We did >>> increase bandwidth across the entire network recently) >>> >>> >>> I removed the SD-WAN network from the equation over the weekend. >>> I attached my laptop to a switch and the server ONLY to the same switch >>> - I did notice the delay. >>> >>> Tried same setup with a different switch - noticed delay >>> Tried different Ethernet cable from server to switch - noticed delay >>> (I wrote a program to track the frequency by Executing a LISTPEQS and >>> recording the time it took to render the results, if greater then 1 second >>> I tracked the time - most >>> iterations are less than 1 second. What I found out was that it appears >>> a ~40 second delay occurs approximately 5 to 6 minutes apart. (there were a >>> few 2 to 3 second pauses >>> between that I excluded)) >>> Event Date Start Time End Time Delay Seconds >>> 04/18/2023 11:42:10AM 11:42:53AM 43 >>> 04/18/2023 11:47:56AM 11:48:39AM 43 >>> 04/18/2023 11:53:33AM 11:54:16AM 43 >>> 04/18/2023 11:59:28AM 12:00:12PM 44 >>> 04/18/2023 12:05:13PM 12:05:54PM 41 >>> 04/18/2023 12:11:03PM 12:11:39PM 36 >>> 04/18/2023 12:16:53PM 12:17:34PM 41 >>> 04/18/2023 12:22:34PM 12:23:17PM 43 >>> 04/18/2023 12:28:34PM 12:29:16PM 42 >>> 04/18/2023 12:34:05PM 12:34:48PM 43 >>> 04/18/2023 12:39:50PM 12:40:34PM 44 >>> 04/18/2023 12:45:43PM 12:46:26PM 43 >>> 04/18/2023 12:51:26PM 12:52:09PM 43 >>> 04/18/2023 12:57:05PM 12:57:49PM 44 >>> 04/18/2023 01:02:33PM 01:03:16PM 43 >>> 04/18/2023 01:08:26PM 01:09:09PM 43 >>> 04/18/2023 01:14:22PM 01:15:05PM 43 >>> 04/18/2023 01:19:58PM 01:20:42PM 44 >>> 04/18/2023 01:25:42PM 01:26:27PM 45 >>> 04/18/2023 01:31:41PM 01:32:25PM 44 >>> >>> My question is can I somehow determine if a background process is >>> causing the hangs? I do have Phantoms jobs in Jbase running; however, The >>> code has not changed in years and no new Phantoms >>> have been added. >>> >>> I have added more users on the network over time, but removing the >>> network as mentioned above was tested. Unfortunately, I didn't write the >>> tracking program until Monday, after the "removing the network" test. >>> (I will say that the delays didn't appear to be as frequent with just me >>> and the server test - I suppose I could test that this weekend...) >>> >>> I wish I could provide more information, but I don't know what else to >>> test?? >>> Thanks, >>> Al >>> >>> >>> On Sat, Apr 15, 2023 at 11:42 AM Kannan Seshadri <[email protected]> >>> wrote: >>> >>>> Hi, >>>> Is it possible for you to execute whatever you are executing directly >>>> on the AIX console with a telnet session? This will clearly tell you >>>> whether you have a network issue or not? >>>> >>>> Thanks and Regards >>>> >>>> On Sat, Apr 15, 2023 at 6:42 PM Bruce Decker <[email protected]> >>>> wrote: >>>> >>>>> Is the delay between the login prompt and the password prompt? As >>>>> jimi asked, more details. >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Apr 15, 2023, at 8:45 AM, Jim Idle <[email protected]> wrote: >>>>> >>>>> >>>>> I think more details are needed Alan. >>>>> >>>>> What version of AIX are you running? >>>>> Are you really using telnet and not ssh? Telnet is unlikely to be >>>>> maintained. >>>>> I assume that you have ruled out network configuration changes? >>>>> Any upgrades to AIX lately? >>>>> Any change to the network load? New devices? >>>>> >>>>> In the absence of any changes, then I would definitely be looking at >>>>> network problems. When you say you tried with just one user, do you mean >>>>> literally one device and the server only on the network? If there is a >>>>> faulty system somewhere, or malware, then that would still eat your >>>>> network >>>>> Bandwidth. >>>>> >>>>> Finally, I presume you have done the obvious and rebooted the server >>>>> and all the network gear? You’ll probably have to start from first >>>>> principles with no devices on the network and gradually add them in. >>>>> >>>>> On Fri, Apr 14, 2023 at 21:12 Alan Metz <[email protected]> wrote: >>>>> >>>>>> All, >>>>>> I have recently been experiencing sporadic response delays when >>>>>> accessing Jbase, (version 5.6.0.2), from telnet sessions with all users >>>>>> in >>>>>> my company. At first I thought it was a network issue; however, I have >>>>>> tested this with only one user and the Jbase server plugged into a switch >>>>>> and was able to duplicate the hesitation. I am not logging any errors on >>>>>> my >>>>>> AIX server that would indicate a hardware issue. I am not sure how to >>>>>> further trouble-shoot this issue and am asking for suggestions. This >>>>>> system has been rock solid since 2018. >>>>>> Thanks, >>>>>> Al >>>>>> >>>>>> -- >>>>>> -- >>>>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum. >>>>>> >>>>>> To post, send email to [email protected] >>>>>> To unsubscribe, send email to [email protected] >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/jBASE?hl=en >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "jBASE" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/jbase/CAPPLyKCej9SMfqOPoBnnQLSUVSDWMEsP-1CFsCgMSZya0yS0NQ%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/jbase/CAPPLyKCej9SMfqOPoBnnQLSUVSDWMEsP-1CFsCgMSZya0yS0NQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >>>>> -- >>>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum. >>>>> >>>>> To post, send email to [email protected] >>>>> To unsubscribe, send email to [email protected] >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/jBASE?hl=en >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "jBASE" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/jbase/CAGPPfg_KxpM_KaKWMJFJUenJq7vhwYdQsB4-7LKKWz0dasggyQ%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/jbase/CAGPPfg_KxpM_KaKWMJFJUenJq7vhwYdQsB4-7LKKWz0dasggyQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> -- >>>>> -- >>>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum. >>>>> >>>>> To post, send email to [email protected] >>>>> To unsubscribe, send email to [email protected] >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/jBASE?hl=en >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "jBASE" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/jbase/61B6FF9C-3382-473A-9505-A2F842E7A48D%40bluepinc.com >>>>> <https://groups.google.com/d/msgid/jbase/61B6FF9C-3382-473A-9505-A2F842E7A48D%40bluepinc.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> -- >>>> IMPORTANT: T24/Globus posts are no longer accepted on this forum. >>>> >>>> To post, send email to [email protected] >>>> To unsubscribe, send email to [email protected] >>>> For more options, visit this group at >>>> http://groups.google.com/group/jBASE?hl=en >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "jBASE" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/jbase/CAOJugBvh7vH0Vrswnbu2B9oLCr0GeOAE_-%2B2Rz4qOBHTz43EvA%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/jbase/CAOJugBvh7vH0Vrswnbu2B9oLCr0GeOAE_-%2B2Rz4qOBHTz43EvA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> -- >>> IMPORTANT: T24/Globus posts are no longer accepted on this forum. >>> >>> To post, send email to [email protected] >>> To unsubscribe, send email to [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/jBASE?hl=en >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "jBASE" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/jbase/CAPPLyKCBBC0S7PCtcM-fLEwnFeQwSvwTfmEKaZZF0waPyYhW1g%40mail.gmail.com >>> <https://groups.google.com/d/msgid/jbase/CAPPLyKCBBC0S7PCtcM-fLEwnFeQwSvwTfmEKaZZF0waPyYhW1g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- -- IMPORTANT: T24/Globus posts are no longer accepted on this forum. To post, send email to [email protected] To unsubscribe, send email to [email protected] For more options, visit this group at http://groups.google.com/group/jBASE?hl=en --- You received this message because you are subscribed to the Google Groups "jBASE" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jbase/CAGPPfg80ktBdQUMAXgKjT3yuBXALvj1PTUFhCpw8g0EJpKzsuA%40mail.gmail.com.
