本文共 12203 字,大约阅读时间需要 40 分钟。
针对 systemd-journald.service 服务默认只把信息存放与内存进行优化, 改变当服务器长期启动, 无法利用 journald 命令获取服务状态信息修改 /etc/systemd/journald.conf 添加 Storage=persistent可利用系统文件对 journal 信息进行永久保存
/proc/sys/kernel/panic 10 (kernel panic 重启时间)/proc/sys/kernel/perf_event_max_sample_rate 100000 (提高 kernel 中断)/proc/sys/kernel/printk 7 4 1 7 (内核调试信息打印)/proc/sys/net/netfilter/nf_conntrack_max 4194304 (增加 iptables 链路) echo $[128*1024*1024*1024/16384/2]/proc/sys/net/netfilter/nf_conntrack_buckets 524288 (hash size) echo $[128*1024*1024*1024/131072/2]/proc/sys/vm/dirty_ratio 30/proc/sys/vm/swappiness 10/proc/sys/vm/overcommit_memory 2 (拒绝内存超配)/proc/sys/vm/max_map_count 可 > 65535 (某个进程可能使用的最大内存映射区域)
参考: http://www.pcp.io/docs/guide.html
PCP主要用于对系统进程进行分析, 显示当前进程资源使用情况, 可以根据返回值对进程资源使用进行判断
查询可用项目pminfo -h localhost
例子: 查询磁盘启动到现在的read 信息
[root@gx-yun-084036 .ssh]# pminfo -h localhost -dfmtT disk.partitions.read_bytes disk.partitions.read_bytes PMID: 60.10.6 [number of bytes read for storage partitions] Data Type: 32-bit unsigned int InDom: 60.10 0xf00000a Semantics: counter Units: Kbyte Help: Cumulative number of bytes read since system boot time (subject to counter wrap) for individual disk partitions or logical volumes. inst [0 or "sda1"] value 22367 inst [1 or "sda2"] value 592513 inst [2 or "sda3"] value 1424 inst [3 or "sdb1"] value 557470 inst [4 or "sdc1"] value 1568
持续地观察当前磁盘的读写状态
[root@gx-yun-084036 .ssh]# pmval -t 2sec -f 3 disk.partitions.write -h localhostmetric: disk.partitions.writehost: gx-yun-084036.vclound.comsemantics: cumulative counter (converting to rate)units: count (converting to count / sec)samples: all sda1 sda2 sda3 sdb1 sdc1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.498 0.000 0.500 0.500 0.000 2.498 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
另外一个例
[root@gx-yun-084036 .ssh]# pmdumptext -Xlimu -t 2sec 'kernel.all.load[1]' mem.util.used disk.partitions.write -h localhost[ 1] localhost:kernel.all.load["1 minute"][ 2] localhost:mem.util.used[ 3] localhost:disk.partitions.write["sda1"][ 4] localhost:disk.partitions.write["sda2"][ 5] localhost:disk.partitions.write["sda3"][ 6] localhost:disk.partitions.write["sdb1"][ 7] localhost:disk.partitions.write["sdc1"] Column 1 2 3 4 5 6 7 Source localh localh localh localh localh localh localh Metric load used write write write write write Inst 1 minu n/a sda1 sda2 sda3 sdb1 sdc1 Units none b c/s c/s c/s c/s c/sMon Sep 26 16:20:07 0.08 14.78G ? ? ? ? ?Mon Sep 26 16:20:09 0.08 14.78G 0.00 0.00 0.00 0.00 0.00Mon Sep 26 16:20:11 0.08 14.78G 0.00 36.00 0.00 0.00 0.00Mon Sep 26 16:20:13 0.07 14.78G 0.00 2.50 0.00 0.50 0.50Mon Sep 26 16:20:15 0.07 14.78G 0.00 0.00 0.00 0.50 0.00Mon Sep 26 16:20:17 0.07 14.78G 0.00 0.00 0.00 0.00 0.00
pcp atop 可实时监控当前资源(RHEL7.2 以上可用)
实时监控系统资源
[root@gx-yun-084036 .ssh]# pmcollectl#<---------CPU---------><------------Disks----------><--------Network---------->#cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 0 0 888 1773 0 0 8 6 54 81 7 73 0 0 620 1712 0 0 4 3 55 96 7 88 0 0 736 1539 0 0 0 0 187 229 32 208 0 0 647 1222 0 0 0 0 210 206 18 159 0 0 555 1061 0 0 64 2 61 90 10 83 0 0 432 840 0 0 0 0 84 110 11 101 0 0 511 1025 0 0 0 0 7 49 7 49
stap 一个不错的系统监控工具, 能够满足对进程的监控
yum install -y systemtap*
其中 systemtap-client 软件包中包含了一些常用的管理脚本, 当然, 可以自行进行编程实现对系统的监控显示
需要安装下面软件包才可以满足 stap 命令使用rpm -ivh kernel-debuginfo-3.10.0-327.el7.x86_64.rpm kernel-debuginfo-common-x86_64-3.10.0-327.el7.x86_64.rpm
当前系统进程 IO 最猛的几个进程
iostop.stp
#!/usr/bin/stapglobal io_stat,deviceglobal read_bytes,write_bytesprobe vfs.read.return { if ($return>0) { if (devname!="N/A") { /*skip read from cache*/ io_stat[pid(),execname(),uid(),ppid(),"R"] += $return device[pid(),execname(),uid(),ppid(),"R"] = devname read_bytes += $return } }}probe vfs.write.return { if ($return>0) { if (devname!="N/A") { /*skip update cache*/ io_stat[pid(),execname(),uid(),ppid(),"W"] += $return device[pid(),execname(),uid(),ppid(),"W"] = devname write_bytes += $return } }}probe timer.ms(5000) { /* skip non-read/write disk */ if (read_bytes+write_bytes) { printf("\n%-25s, %-8s%4dKb/sec, %-7s%6dKb, %-7s%6dKb\n\n", ctime(gettimeofday_s()), "Average:", ((read_bytes+write_bytes)/1024)/5, "Read:",read_bytes/1024, "Write:",write_bytes/1024) /* print header */ printf("%8s %8s %8s %25s %8s %4s %12s\n", "UID","PID","PPID","CMD","DEVICE","T","BYTES") } /* print top ten I/O */ foreach ([process,cmd,userid,parent,action] in io_stat- limit 10) printf("%8d %8d %8d %25s %8s %4s %12d\n", userid,process,parent,cmd, device[process,cmd,userid,parent,action], action,io_stat[process,cmd,userid,parent,action]) /* clear data */ delete io_stat delete device read_bytes = 0 write_bytes = 0}probe end{ delete io_stat delete device delete read_bytes delete write_bytes}
参考
[root@gx-yun-084036 io]# cat iotop.stp#!/usr/bin/stapglobal reads, writes, total_ioprobe vfs.read.return { reads[execname()] += bytes_read}probe vfs.write.return { writes[execname()] += bytes_written}# print top 10 IO processes every 5 secondsprobe timer.s(5) { foreach (name in writes) total_io[name] += writes[name] foreach (name in reads) total_io[name] += reads[name] printf ("%16s\t%10s\t%10s\n", "Process", "KB Read", "KB Written") foreach (name in total_io- limit 10) printf("%16s\t%10d\t%10d\n", name, reads[name]/1024, writes[name]/1024) delete reads delete writes delete total_io print("\n")}
效果如下 stap iotop.stp
Process KB Read KB Written dd 2048036 2048000 docker 41 1 zabbix_agentd 39 0 cadvisor 24 0 dmsetup 15 0 ovsdb-server 0 7 bash 7 0 netplugin 5 0 ovs-vswitchd 2 0 sshd 1 1
在一定时间内, 系统中那些文件执行了读写
iotime.stp
#!/usr/bin/stapglobal startglobal time_iofunction timestamp:long() { return gettimeofday_us() - start }function proc:string() { return sprintf("%d (%s)", pid(), execname()) }probe begin { start = gettimeofday_us() }global filehandles, fileread, filewriteprobe syscall.open.return { filename = user_string($filename) if ($return != -1) { filehandles[pid(), $return] = filename } else { printf("%d %s access %s fail\n", timestamp(), proc(), filename) }}probe syscall.read.return { p = pid() fd = $fd bytes = $return time = gettimeofday_us() - @entry(gettimeofday_us()) if (bytes > 0) fileread[p, fd] += bytes time_io[p, fd] <<< time}probe syscall.write.return { p = pid() fd = $fd bytes = $return time = gettimeofday_us() - @entry(gettimeofday_us()) if (bytes > 0) filewrite[p, fd] += bytes time_io[p, fd] <<< time}probe syscall.close { if ([pid(), $fd] in filehandles) { printf("%d %s access %s read: %d write: %d\n", timestamp(), proc(), filehandles[pid(), $fd], fileread[pid(), $fd], filewrite[pid(), $fd]) if (@count(time_io[pid(), $fd])) printf("%d %s iotime %s time: %d\n", timestamp(), proc(), filehandles[pid(), $fd], @sum(time_io[pid(), $fd])) } delete fileread[pid(), $fd] delete filewrite[pid(), $fd] delete filehandles[pid(), $fd] delete time_io[pid(),$fd]}
效果如下 [root@gx-yun-084036 io]# stap iotime.stp -c “sleep 1”
66449 28145 (sleep) access /etc/ld.so.cache read: 0 write: 066515 28145 (sleep) access /lib64/libc.so.6 read: 832 write: 066519 28145 (sleep) iotime /lib64/libc.so.6 time: 266739 28145 (sleep) access /usr/lib/locale/locale-archive read: 0 write: 0573033 2747 (zabbix_agentd) access /proc/stat read: 8191 write: 0573046 2747 (zabbix_agentd) iotime /proc/stat time: 13171034282 28148 (dmsetup) access /etc/ld.so.cache read: 0 write: 01034350 28148 (dmsetup) access /lib64/libdevmapper.so.1.02 read: 832 write: 01034355 28148 (dmsetup) iotime /lib64/libdevmapper.so.1.02 time: 21034394 28148 (dmsetup) access /lib64/librt.so.1 read: 832 write: 0
当前磁盘 IO 是主要由那个进程导致
disktop.stp
#!/usr/bin/stapglobal io_stat,deviceglobal read_bytes,write_bytesprobe vfs.read.return { if ($return>0) { if (devname!="N/A") { /*skip read from cache*/ io_stat[pid(),execname(),uid(),ppid(),"R"] += $return device[pid(),execname(),uid(),ppid(),"R"] = devname read_bytes += $return } }}probe vfs.write.return { if ($return>0) { if (devname!="N/A") { /*skip update cache*/ io_stat[pid(),execname(),uid(),ppid(),"W"] += $return device[pid(),execname(),uid(),ppid(),"W"] = devname write_bytes += $return } }}probe timer.ms(5000) { /* skip non-read/write disk */ if (read_bytes+write_bytes) { printf("\n%-25s, %-8s%4dKb/sec, %-7s%6dKb, %-7s%6dKb\n\n", ctime(gettimeofday_s()), "Average:", ((read_bytes+write_bytes)/1024)/5, "Read:",read_bytes/1024, "Write:",write_bytes/1024) /* print header */ printf("%8s %8s %8s %25s %8s %4s %12s\n", "UID","PID","PPID","CMD","DEVICE","T","BYTES") } /* print top ten I/O */ foreach ([process,cmd,userid,parent,action] in io_stat- limit 10) printf("%8d %8d %8d %25s %8s %4s %12d\n", userid,process,parent,cmd, device[process,cmd,userid,parent,action], action,io_stat[process,cmd,userid,parent,action]) /* clear data */ delete io_stat delete device read_bytes = 0 write_bytes = 0}probe end{ delete io_stat delete device delete read_bytes delete write_bytes}
效果如下
[root@gx-yun-084036 io]# stap disktop.stpThu Sep 22 07:40:41 2016 , Average:415962Kb/sec, Read: 68Kb, Write: 2079746Kb UID PID PPID CMD DEVICE T BYTES 0 48463 44947 dd sda2 W 209715200 0 48510 44947 dd sda2 W 209715200 0 48511 44947 dd sda2 W 209715200 0 48512 44947 dd sda2 W 209715200 0 48513 44947 dd sda2 W 209715200 0 48514 44947 dd sda2 W 209715200 0 48515 44947 dd sda2 W 209715200 0 48516 44947 dd sda2 W 209715200 0 48517 44947 dd sda2 W 209715200 0 48518 44947 dd sda2 W 209715200
转载地址:http://llnni.baihongyu.com/