Friday, March 13, 2015

Solaris 10 Performance Tuning - Analyzing CPU

Tools :
psrinfo
uptime
vmstat
sar -u
sar -q  (cpu queue, same as vmstat kthr:r)
mpstat
dtrace



$ psrinfo -vp
The physical processor has 8 virtual processors (32-39)
  SPARC64-VII (portid 1056 impl 0x7 ver 0x91 clock 2400 MHz)
The physical processor has 8 virtual processors (40-47)
  SPARC64-VII (portid 1064 impl 0x7 ver 0x91 clock 2400 MHz)
The physical processor has 8 virtual processors (48-55)
  SPARC64-VII (portid 1072 impl 0x7 ver 0x91 clock 2400 MHz)
The physical processor has 8 virtual processors (56-63)
  SPARC64-VII (portid 1080 impl 0x7 ver 0x91 clock 2400 MHz)



$ uptime
  1:04pm  up 114 day(s), 14:15,  6 users,  load average: 4.41, 4.39, 3.76

the intervals are 1,5,15 minutes



$ vmstat 5
 kthr      memory            page            disk          faults      cpu
 r b w   swap    free     re  mf pi po fr de sr s0 s1 sd sd   in   sy     cs    us sy id
 0 0 0 107752976 11913152 74 296 901 2 2  0  0  2  2 -0 -349 5960  727992 5910  9  2 89
 0 0 0 106811152 8983688  8  31  2   0 0  0  0  0  0  0  2   8818  615705 13764 13 2 85
 0 0 0 106701904 8884808  0  14  0   0 0  0  0  0  0  0  2   14250 584633 24692 12 3 85
 0 0 0 106535552 8730784  9  87  0   0 0  0  0  9  9  0  0   9247  379414 14642 12 2 87
 0 0 0 106403648 8603144  0  10  0   0 0  0  0  0  0  0  0   6668  550100 9664  12 2 87
 0 0 0 106296328 8504712  0  5   0   0 0  0  0  0  0  0  0   10490 471906 17677 12 2 85
 0 0 0 106277472 8510232  1  3   6   0 0  0  0  0  0  0  0   12011 535986 20477 11 2 87
 0 0 0 106277472 8527712  3  37  0   0 0  0  0  0  0  0  0   13685 668057 23689 9  2 89


The first line is the summary since boot. vmstat reads its statistic from kstat,
which maintains CPU utilizztion statistics for each CPU.

cpu:id : for percent idle.
kthr:r : the total number of threads on the ready to run queues, CPU saturation.

faults:in : number of interrupts/sec
faults:sys: number of system calls/sec
faults:cs : number of context switches/sec

cpu:us  : percent user time/user-mode threads
cpu:sy  : percent system time
cpu:id  : percent idle


sar : turn on sar if it's not on.

$ svcs -a | grep sar
online         Jan_22   svc:/system/sar:default
$ svcs -l svc:/system/sar:default
fmri         svc:/system/sar:default
name         system activity reporting package
enabled      true
state        online
next_state   none
state_time   Fri Jan 22 22:08:22 2010
logfile      /var/svc/log/system-sar:default.log
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/system/filesystem/minimal (online)



$ sar 5 10

SunOS betapm 5.10 Generic_142900-03 sun4u    05/17/2010

13:07:48    %usr    %sys    %wio   %idle
13:07:53      13       4       0      84
13:07:58      11       4       0      85
13:08:03       9       3       0      88
13:08:08       9       3       0      88
13:08:13      11       4       0      85
13:08:18      11       3       0      86
13:08:23      10       3       0      86
13:08:28       9       3       0      88
13:08:33       9       3       0      88
13:08:38      11       3       0      86

Average       10       3       0      86


%usr vs %sys : 70% vs 30%

%wio is always 0 in solaris 10 (disabled)


$ sar -q 2 10

SunOS betapm 5.10 Generic_142900-03 sun4u    05/17/2010

13:11:34 runq-sz %runocc swpq-sz %swpocc
13:11:36     1.0      50     0.0       0
13:11:38     0.0       0     0.0       0
13:11:40     0.0       0     0.0       0
13:11:42     0.0       0     0.0       0
13:11:44     0.0       0     0.0       0
13:11:46     0.0       0     0.0       0
13:11:48     0.0       0     0.0       0
13:11:50     1.0      50     0.0       0
13:11:52     0.0       0     0.0       0
13:11:54     0.0       0     0.0       0

Average      1.0      10     0.0       0


runq-sz: run queue size , same as vmstat kthr:r

%runocc: identify short bursts of run queue activity occurred

swpq-sz: number of swapped-out threads

%swpocc: swapped out occupancy, percentage of time there were swapped out threads


$ mpstat 3

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 32    0   0   18   460  126  257    0    3   10    0  2858    0   0   0  99
 33    0   0   14   131   34  180    1    7   49    0   740    5   1   0  94
 34    0   0    5   134   45  161    1    6    3    0   879    0   0   0 100
 35    0   0   60    78   33   93    0    7   54    0   457    0   0   0 100
 36    0   0    3   199   91  204    0    4    2    0  1125    0   0   0 100
 37    0   0  105    49   13   55    0    3   17    0   164    0   0   0 100
 38    0   0    2   124   56  126    0    2    1    0  1716    0   0   0  99
 39   30   0   21     5    0    7    0    2    4    0    13    0   0   0 100
 40    0   0    1     6    2    3    0    1    0    0    50    0   0   0 100
 41    0   0    0     3    0    5    0    3    5    0     0    0   0   0 100
 42    0   0   10    22   10    0    0    0    0    0     0    0   0   0 100
 43    0   0    1     7    1   10    0    5   12    0     1    0   0   0 100
 44    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 45    0   0    1     6    0    8    0    4    7    0     2    0   0   0 100
 46    0   0   29    68   30    0    6    0  510    0 795891   72  28   0   0
 47    0   0    2     5    1    2    0    1    4    0     0    0   0   0 100
 48    0   0   26   346  153  331    1    6    8    0  4467    1   0   0  99
 49    0   0    0     3    0    7    0    4    8    0     0    0   0   0 100
 50    0   0   53   129   92   22    0    0    3    0     8    0   0   0 100
 51    0   0   65   131   42  142    0    7   61    0   568    1   0   0  99
 52    0   0   23   878  671  382    0    4   22    0  3765    1   2   0  97
 53    0   0   43    43   32   15    0    3    9    0     0    0   0   0 100
 54    0   0    0    22    6   27    0    0    0    0    39    0   0   0 100
 55    0   0   91   237  108  241    0    6   11    0   213    0   0   0 100
 56    0   0   38   303  143  231    0    5    2    0  3360    1   1   0  99
 57    0   0    8    59   14   81    0    2   19    0   352    3   0   0  97
 58    0   0    1     5    1    3    0    1    0    0     7    0   0   0 100
 59    0   0    9    61   18   57    0    3   13    0    62    0   1   0  99
 60    0   0   23   212   96  183    1    6    2    0  1159    0   0   0  99
 61    0   0   25   134   38  169    0    5   81    0   921    5   0   0  95
 62    0   0    0    19    3   27    0    0    2    0    27    0   0   0 100
 63    0   0    6    62   22   62    0    6    7    0   271    0   0   0 100


csw : the number of context switch (both voluntary and involuntary);
occurs when a thread perfomrs a blocking system call e.g. performing I/O when
the thread sleeps until the I/O event has completed.

icsw: the number of involuntary context switches; indicates if there were more
threads ready to run than available phyiscal processors.

migr: migrations of threads between processors; the number of times  the OS
scheduler moves ready-to-run threads to an idle prcessor. OS normally keeps
the threads on the last processor it ran unless the process is busy, then the
thread migrates. Such migration is bad for performance.

intr: the number of interrupts taken on CPU.

ithr: the number of interrupts that are converted to real threads, as result of
inbound network packets, blocking for a mutex, or a synchronization event.


smtx: the number of mutex contentions in the kernel, high shows high %sys time.

srw : the number of reader-writer lock contention events in the kernel.
Excessive reader/write lock contention results in nonscaling performance and
systems that are unable to use all the available CPU resources (symptom is idle time).


$ mpstat -ap 5 20
SET minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
  0  183   1 1186   874  207 1104    7  158  156    0  4229    0   1   0  99  16
  1  112 104 2329  5085 2222 4805  215  710 1958  187 723869   17   4   0  80  16
SET minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
  0  107   0 1505   995  323 1128    5   78  368    0  3755    2   1   0  97  16
  1    0   0  374  3011 1364 2399    7   68   35    0 38155    0   0   0  99  16
SET minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
  0    4   0  244  1010  322 1170    4   74  304    0  3549    2   0   0  98  16
  1    0   0  472  2907 1313 2303    5   62   25    0 45064    0   0   0  99  16
SET minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
  0    1   0  290   990  323 1156    4   79  322    0  3577    2   1   0  97  16
  1    0   0  265  2600 1119 2080    6   51   37    0 48610    1   0   0  99  16
SET minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
  0   74   0  715  1074  323 1275    6   90  287    0  3816    3   1   0  96  16
  1    0   0  325  2695 1203 2077    6   67   31    0 59557    1   0   0  99  16


$prstat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
 29302 sybase     11G   11G sleep   59    0 195:35:53 0.1% dataserver/259
 29347 sybase     11G   11G sleep   59    0 151:18:42 0.1% dataserver/259
 29348 sybase     11G   11G sleep   59    0 153:15:41 0.0% dataserver/259
 29340 sybase     11G   11G sleep   59    0 147:23:56 0.0% dataserver/259
 29341 sybase     11G   11G sleep   59    0 153:08:31 0.0% dataserver/259
 29342 sybase     11G   11G sleep   59    0 149:55:02 0.0% dataserver/259
 29346 sybase     11G   11G sleep   59    0 149:56:52 0.0% dataserver/259
 29334 sybase     11G   11G sleep   59    0 153:15:53 0.0% dataserver/259
 29349 sybase     11G   11G sleep   59    0 182:37:54 0.0% dataserver/259
 29343 sybase     11G   11G sleep   59    0 148:44:13 0.0% dataserver/259
 29345 sybase     11G   11G sleep   59    0 152:04:00 0.0% dataserver/259
 29333 sybase     11G   11G sleep   59    0 154:10:37 0.0% dataserver/259
 29332 sybase     11G   11G sleep   59    0 146:54:46 0.0% dataserver/259
 29335 sybase     11G   11G sleep   59    0 156:39:37 0.0% dataserver/259
 29344 sybase     11G   11G sleep   59    0 144:58:21 0.0% dataserver/259
  1540 sybase     14M   11M sleep   59    0   0:00:00 0.0% xpserver/1
 29320 sybase     10M 9152K sleep   59    0   0:00:15 0.0% backupserver/1
 28892 sybase   4296K 2136K sleep   59    0   0:00:06 0.0% SYBASE/1
 29304 sybase     10M 6872K sleep   59    0   0:00:00 0.0% backupserver/1
 28891 sybase   3440K 1880K sleep   59    0   0:00:06 0.0% lmgrd/1
Total: 20 processes, 3890 lwps, load averages: 1.02, 1.73, 2.12

Quick note for prstat:
CPU field is thread->pr_pctcpu
load average is the same as 'uptime'
When average load average exceeds the number of CPUs in the system (psrinfo),
the system is very likely over loaded.

prstat -mL  ; turn on microstates '-m' and show per-thread '-L'
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
 14853 sjing     21  72 0.7 0.0 0.0 0.0 6.5 0.0  73  73 78K   0 prstat/1
 12047 betapm    34 0.8 0.0 0.0 0.0 0.0  65 0.1 773  21 12K   0 MathKernel/1
 29302 sybase    22 4.5 0.0 0.0 0.0 0.0  74 0.0 227  71 .3M   9 dataserver/1
 29349 sybase    20 3.7 0.0 0.0 0.0 0.0  76 0.0 234  75 .2M   7 dataserver/1
 29335 sybase    17 3.4 0.0 0.0 0.0 0.0  79 0.0 240  67 .2M   7 dataserver/1
 29333 sybase    17 3.4 0.0 0.0 0.0 0.0  79 0.0 242  67 .2M   8 dataserver/1
 29341 sybase    17 3.2 0.0 0.0 0.0 0.0  80 0.0 244  66 .2M   9 dataserver/1
 29348 sybase    17 3.2 0.0 0.0 0.0 0.0  80 0.0 243  67 .2M   8 dataserver/1
 29334 sybase    17 3.2 0.0 0.0 0.0 0.0  80 0.0 242  67 .2M   7 dataserver/1
 29345 sybase    17 3.2 0.0 0.0 0.0 0.0  80 0.0 242  66 .2M   7 dataserver/1
 29347 sybase    17 3.1 0.0 0.0 0.0 0.0  80 0.0 243  67 .2M   8 dataserver/1
 29342 sybase    17 3.2 0.0 0.0 0.0 0.0  80 0.0 245  66 .2M  11 dataserver/1
 29346 sybase    17 3.2 0.0 0.0 0.0 0.0  80 0.0 244  67 .2M   8 dataserver/1
 29343 sybase    17 3.0 0.0 0.0 0.0 0.0  80 0.0 244  67 .2M   8 dataserver/1
 29340 sybase    17 3.1 0.0 0.0 0.0 0.0  80 0.0 245  66 .2M   8 dataserver/1
Total: 141 processes, 4479 lwps, load averages: 0.87, 1.24, 1.83
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
 29302 sybase    10 4.5 0.0 0.0 0.0 0.0  85 0.0 367   9 .3M 126 dataserver/1
 29347 sybase   9.9 4.0 0.0 0.0 0.0 0.0  86 0.0 373   8 .2M 127 dataserver/1
 12047 betapm   9.1 0.4 0.0 0.0 0.0 0.0  90 0.1 536   6  8K   0 MathKernel/1
 29348 sybase   1.0 0.4 0.0 0.0 0.0 0.0  99 0.0 303   1 29K  12 dataserver/1
 29340 sybase   0.8 0.4 0.0 0.0 0.0 0.0  99 0.0 302   2 26K   9 dataserver/1
 14853 sjing    0.3 0.8 0.0 0.0 0.0 0.0  99 0.0  17   1 727   0 prstat/1
 14831 dbop     0.0 0.9 0.0 0.0 0.0 0.0  99 0.1 826   2  1K   0 find/1
 29341 sybase   0.6 0.3 0.0 0.0 0.0 0.0  99 0.0 297   1 19K   7 dataserver/1
 29342 sybase   0.6 0.2 0.0 0.0 0.0 0.0  99 0.0 298   0 17K   4 dataserver/1
 29349 sybase   0.5 0.2 0.0 0.0 0.0 0.0  99 0.0 298   1 16K   5 dataserver/1
 12063 betapm   0.2 0.4 0.0 0.0 0.0 0.0  99 0.0 505   1  6K   0 MathAddon/1
 29346 sybase   0.4 0.2 0.0 0.0 0.0 0.0  99 0.0 297   1 14K   4 dataserver/1
 29333 sybase   0.4 0.2 0.0 0.0 0.0 0.0  99 0.0 297   0 11K   3 dataserver/1
 29334 sybase   0.4 0.2 0.0 0.0 0.0 0.0  99 0.0 299   0 12K   3 dataserver/1
 29335 sybase   0.4 0.2 0.0 0.0 0.0 0.0  99 0.0 297   1  9K   2 dataserver/1
Total: 141 processes, 4479 lwps, load averages: 0.86, 1.23, 1.82

LAT : (latency) is the amount of time spent waiting for CPU.

No comments:

Post a Comment