Hi
  To Patrick: Never mind .Thank you for your suggestion all the same.
  To Otis. We do not use SPM. We monintor the JVM just use jstat becasue my
system went well before ,so we do not need  other tools.
But SPM is really awesome .

  Still looking for help.....

Best Regards

2016-03-18 6:01 GMT+08:00 Patrick Plaatje <pplaa...@gmail.com>:

> Yeah, I did’t pay attention to the cached memory at all, my bad!
>
> I remember running into a similar situation a couple of years ago, one of
> the things to investigate our memory profile was to produce a full heap
> dump and manually analyse that using a tool like MAT.
>
> Cheers,
> -patrick
>
>
>
>
> On 17/03/2016, 21:58, "Otis Gospodnetić" <otis.gospodne...@gmail.com>
> wrote:
>
> >Hi,
> >
> >On Wed, Mar 16, 2016 at 10:59 AM, Patrick Plaatje <pplaa...@gmail.com>
> >wrote:
> >
> >> Hi,
> >>
> >> From the sar output you supplied, it looks like you might have a memory
> >> issue on your hosts. The memory usage just before your crash seems to be
> >> *very* close to 100%. Even the slightest increase (Solr itself, or
> possibly
> >> by a system service) could caused the system crash. What are the
> >> specifications of your hosts and how much memory are you allocating?
> >
> >
> >That's normal actually - http://www.linuxatemyram.com/
> >
> >You *want* Linux to be using all your memory - you paid for it :)
> >
> >Otis
> >--
> >Monitoring - Log Management - Alerting - Anomaly Detection
> >Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >
> >>
> >
> >
> >>
> >>
> >> On 16/03/2016, 14:52, "YouPeng Yang" <yypvsxf19870...@gmail.com> wrote:
> >>
> >> >Hi
> >> > It happened again,and worse thing is that my system went to crash.we
> can
> >> >even not connect to it with ssh.
> >> > I use the sar command to capture the statistics information about
> it.Here
> >> >are my details:
> >> >
> >> >
> >> >[1]cpu(by using sar -u),we have to restart our system just as the red
> font
> >> >LINUX RESTART in the logs.
> >>
> >>
> >--------------------------------------------------------------------------------------------------
> >> >03:00:01 PM     all      7.61      0.00      0.92      0.07      0.00
> >> >91.40
> >> >03:10:01 PM     all      7.71      0.00      1.29      0.06      0.00
> >> >90.94
> >> >03:20:01 PM     all      7.62      0.00      1.98      0.06      0.00
> >> >90.34
> >> >03:30:35 PM     all      5.65      0.00     31.08      0.04      0.00
> >> >63.23
> >> >03:42:40 PM     all     47.58      0.00     52.25      0.00      0.00
> >> > 0.16
> >> >Average:        all      8.21      0.00      1.57      0.05      0.00
> >> >90.17
> >> >
> >> >04:42:04 PM       LINUX RESTART
> >> >
> >> >04:50:01 PM     CPU     %user     %nice   %system   %iowait    %steal
> >> >%idle
> >> >05:00:01 PM     all      3.49      0.00      0.62      0.15      0.00
> >> >95.75
> >> >05:10:01 PM     all      9.03      0.00      0.92      0.28      0.00
> >> >89.77
> >> >05:20:01 PM     all      7.06      0.00      0.78      0.05      0.00
> >> >92.11
> >> >05:30:01 PM     all      6.67      0.00      0.79      0.06      0.00
> >> >92.48
> >> >05:40:01 PM     all      6.26      0.00      0.76      0.05      0.00
> >> >92.93
> >> >05:50:01 PM     all      5.49      0.00      0.71      0.05      0.00
> >> >93.75
> >>
> >>
> >--------------------------------------------------------------------------------------------------
> >> >
> >> >[2]mem(by using sar -r)
> >>
> >>
> >--------------------------------------------------------------------------------------------------
> >> >03:00:01 PM   1519272 196633272     99.23    361112  76364340 143574212
> >> >47.77
> >> >03:10:01 PM   1451764 196700780     99.27    361196  76336340 143581608
> >> >47.77
> >> >03:20:01 PM   1453400 196699144     99.27    361448  76248584 143551128
> >> >47.76
> >> >03:30:35 PM   1513844 196638700     99.24    361648  76022016 143828244
> >> >47.85
> >> >03:42:40 PM   1481108 196671436     99.25    361676  75718320 144478784
> >> >48.07
> >> >Average:      5051607 193100937     97.45    362421  81775777 142758861
> >> >47.50
> >> >
> >> >04:42:04 PM       LINUX RESTART
> >> >
> >> >04:50:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
> >> >%commit
> >> >05:00:01 PM 154357132  43795412     22.10     92012  18648644 134950460
> >> >44.90
> >> >05:10:01 PM 136468244  61684300     31.13    219572  31709216 134966548
> >> >44.91
> >> >05:20:01 PM 135092452  63060092     31.82    221488  32162324 134949788
> >> >44.90
> >> >05:30:01 PM 133410464  64742080     32.67    233848  32793848 134976828
> >> >44.91
> >> >05:40:01 PM 132022052  66130492     33.37    235812  33278908 135007268
> >> >44.92
> >> >05:50:01 PM 130630408  67522136     34.08    237140  33900912 135099764
> >> >44.95
> >> >Average:    136996792  61155752     30.86    206645  30415642 134991776
> >> >44.91
> >>
> >>
> >--------------------------------------------------------------------------------------------------
> >> >
> >> >
> >> >As the blue font parts show that my hardware crash from 03:30:35.It is
> >> hung
> >> >up until I restart it manually at 04:42:04
> >> >ALl the above information just snapshot the performance when it crashed
> >> >while there is nothing cover the reason.I have also
> >> >check the /var/log/messages and find nothing useful.
> >> >
> >> >Note that I run the command- sar -v .It shows something abnormal:
> >>
> >>
> >------------------------------------------------------------------------------------------------
> >> >02:50:01 PM  11542262      9216     76446       258
> >> >03:00:01 PM  11645526      9536     76421       258
> >> >03:10:01 PM  11748690      9216     76451       258
> >> >03:20:01 PM  11850191      9152     76331       258
> >> >03:30:35 PM  11972313     10112    132625       258
> >> >03:42:40 PM  12177319     13760    340227       258
> >> >Average:      8293601      8950     68187       161
> >> >
> >> >04:42:04 PM       LINUX RESTART
> >> >
> >> >04:50:01 PM dentunusd   file-nr  inode-nr    pty-nr
> >> >05:00:01 PM     35410      7616     35223         4
> >> >05:10:01 PM    137320      7296     42632         6
> >> >05:20:01 PM    247010      7296     42839         9
> >> >05:30:01 PM    358434      7360     42697         9
> >> >05:40:01 PM    471543      7040     42929        10
> >> >05:50:01 PM    583787      7296     42837        13
> >>
> >>
> >------------------------------------------------------------------------------------------------
> >> >
> >> >and I check the man info about the -v option :
> >>
> >>
> >------------------------------------------------------------------------------------------------
> >> >*-v*  Report status of inode, file and other kernel tables.  The
> following
> >> >values are displayed:
> >> >       *dentunusd*
> >> >Number of unused cache entries in the directory cache.
> >> >*file-nr*
> >> >Number of file handles used by the system.
> >> >*inode-nr*
> >> >Number of inode handlers used by the system.
> >> >*pty-nr*
> >> >Number of pseudo-terminals used by the system.
> >>
> >>
> >------------------------------------------------------------------------------------------------
> >> >
> >> >Is the any clue about the crash? Would you please give me some
> >> suggestions?
> >> >
> >> >
> >> >Best Regards.
> >> >
> >> >
> >> >2016-03-16 14:01 GMT+08:00 YouPeng Yang <yypvsxf19870...@gmail.com>:
> >> >
> >> >> Hello
> >> >>    The problem appears several times ,however I could not capture the
> >> top
> >> >> output .My script is as follows code.
> >> >> I check the sys cpu usage whether it exceed 30%.the other metric
> >> >> information can be dumpped successfully except the top .
> >> >> Would you like to check my script that I am not able to figure out
> what
> >> is
> >> >> wrong.
> >> >>
> >> >>
> >> >>
> >>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >> >> #!/bin/bash
> >> >>
> >> >> while :
> >> >>   do
> >> >>     sysusage=$(mpstat 2 1 | grep -A 1 "%sys" | tail -n 1 | awk
> '{if($6 <
> >> >> 30) print 1; else print 0;}' )
> >> >>
> >> >>     if [ $sysusage -eq 0 ];then
> >> >>         #echo $sysusage
> >> >>         #perf record -o perf$(date +%Y%m%d%H%M%S).data  -a -g -F 1000
> >> >> sleep 30
> >> >>         file=$(date +%Y%m%d%H%M%S)
> >> >>         top -n 2 >> top$file.data
> >> >>         iotop -b -n 2  >> iotop$file.data
> >> >>         iostat >> iostat$file.data
> >> >>         netstat -an | awk '/^tcp/ {++state[$NF]} END {for(i in state)
> >> >> print i,"\t",state[i]}' >> netstat$file.data
> >> >>     fi
> >> >>     sleep 5
> >> >>   done
> >> >> You have new mail in /var/spool/mail/root
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >> >>
> >> >> 2016-03-08 21:39 GMT+08:00 YouPeng Yang <yypvsxf19870...@gmail.com>:
> >> >>
> >> >>> Hi all
> >> >>>   Thanks for your reply.I do some investigation for much time.and I
> >> will
> >> >>> post some logs of the 'top' and IO in a few days when the crash come
> >> again.
> >> >>>
> >> >>> 2016-03-08 10:45 GMT+08:00 Shawn Heisey <apa...@elyograg.org>:
> >> >>>
> >> >>>> On 3/7/2016 2:23 AM, Toke Eskildsen wrote:
> >> >>>> > How does this relate to YouPeng reporting that the CPU usage
> >> increases?
> >> >>>> >
> >> >>>> > This is not a snark. YouPeng mentions kernel issues. It might
> very
> >> well
> >> >>>> > be that IO is the real problem, but that it manifests in a
> >> >>>> non-intuitive
> >> >>>> > way. Before memory-mapping it was easy: Just look at IO-Wait.
> Now I
> >> am
> >> >>>> > not so sure. Can high kernel load (Sy% in *nix top) indicate that
> >> the
> >> >>>> IO
> >> >>>> > system is struggling, even if IO-Wait is low?
> >> >>>>
> >> >>>> It might turn out to be not directly related to memory, you're
> right
> >> >>>> about that.  A very high query rate or particularly CPU-heavy
> queries
> >> or
> >> >>>> analysis could cause high CPU usage even when memory is plentiful,
> but
> >> >>>> in that situation I would expect high user percentage, not kernel.
> >> I'm
> >> >>>> not completely sure what might cause high kernel usage if iowait is
> >> low,
> >> >>>> but no specific information was given about iowait.  I've seen
> iowait
> >> >>>> percentages of 10% or less with problems clearly caused by iowait.
> >> >>>>
> >> >>>> With the available information (especially seeing 700GB of index
> >> data),
> >> >>>> I believe that the "not enough memory" scenario is more likely than
> >> >>>> anything else.  If the OP replies and says they have plenty of
> memory,
> >> >>>> then we can move on to the less common (IMHO) reasons for high CPU
> >> with
> >> >>>> a large index.
> >> >>>>
> >> >>>> If the OS is one that reports load average, I am curious what the 5
> >> >>>> minute average is, and how many real (non-HT) CPU cores there are.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Shawn
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
>
>

Reply via email to