analyselinuxkernelbyusingperf(代码片段)

X-Solomon-X X-Solomon-X     2023-02-15     204

关键词:

Perf简介:Perf是Linux内核自带的系统性能调优工具,支持CPU PMU事件和软件事件计数,可用于找到程序的热点代码,以及找到程序的性能瓶颈。

Perf参考文档:

linux/tools/perf/Documentation at master · torvalds/linux · GitHubhttps://github.com/torvalds/linux/tree/master/tools/perf/Documentation

perf Static Tracepointsperf Static Tracepointshttps://www.brendangregg.com/blog/2014-06-29/perf-static-tracepoints.html

  • perf list:枚举perf支持的所有的事件,可以在各种perf命令中通过-e选项选择对应的事件。
[root@VM-centos ~]# perf list

List of pre-defined events (to be used in -e):

  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]

  duration_time                                      [Tool event]

  msr/tsc/                                           [Kernel PMU event]

  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
   (see 'man perf-list' on how to encode it)

  mem:<addr>[/len][:access]                          [Hardware breakpoint]

  alarmtimer:alarmtimer_cancel                       [Tracepoint event]
  alarmtimer:alarmtimer_fired                        [Tracepoint event]
  alarmtimer:alarmtimer_start                        [Tracepoint event]
  alarmtimer:alarmtimer_suspend                      [Tracepoint event]
  binder:binder_alloc_lru_end                        [Tracepoint event]
  binder:binder_alloc_lru_start                      [Tracepoint event]
 
  ...


#统计所有Tracepoint的类别

[root@VM-centos ~]# perf list | awk -F: '/Tracepoint event/  lib[$1]++  END 
>     for (l in lib)  printf "  %-16.16s %d\\n", l, lib[l]  ' | sort | column
    alarmtimer     4        fib            1        iscsi          7        nmi            1        rpm            4        tlb            1
    binder         31       fib6           1        jbd2           16       nvme           4        rseq           2        udp            1
    block          18       filelock       12       kmem           12       oom            8        rtc            12       ufs            13
    bpf_test_run   1        filemap        4        kyber          3        page_isolation 1        sched          24       vmscan         18
    bridge         4        fs_dax         14       libata         6        pagemap        2        scsi           5        vsyscall       1
    cgroup         13       ftrace         2        mce            1        page_pool      3        signal         2        wbt            4
    clk            16       huge_memory    4        mdio           1        percpu         5        skb            3        workqueue      4
    compaction     14       hwmon          3        migrate        1        power          23       sock           3        writeback      35
    cpuhp          3        initcall       3        module         5        printk         1        sunrpc         52       x86_fpu        11
    damon          2        intel_iommu    6        mpx            5        qdisc          1        swiotlb        1        xdp            12
    devfreq        1        iocost         5        msr            3        random         15       syscalls       664      xhci-hcd       51
    devlink        5        iommu          7        napi           1        ras            6        task           2
    dma_fence      7        irq            5        nbd            5        raw_syscalls   2        tcp            7
    exceptions     2        irq_matrix     12       neigh          7        rcu            1        thermal        3
    ext4           105      irq_vectors    34       net            18       resctrl        3        timer          13
  • perf list 'keyword':以'keyword'字符串匹配所有perf事件。
[root@VM-centos ~]# perf list 'sched:*'

List of pre-defined events (to be used in -e):

  sched:sched_kthread_stop                           [Tracepoint event]
  sched:sched_kthread_stop_ret                       [Tracepoint event]
  sched:sched_migrate_task                           [Tracepoint event]
  sched:sched_move_numa                              [Tracepoint event]
  sched:sched_pi_setprio                             [Tracepoint event]
  sched:sched_process_exec                           [Tracepoint event]
  sched:sched_process_exit                           [Tracepoint event]
  sched:sched_process_fork                           [Tracepoint event]
  sched:sched_process_free                           [Tracepoint event]
  sched:sched_process_hang                           [Tracepoint event]
  sched:sched_process_wait                           [Tracepoint event]
  sched:sched_stat_blocked                           [Tracepoint event]
  sched:sched_stat_iowait                            [Tracepoint event]
  sched:sched_stat_runtime                           [Tracepoint event]
  sched:sched_stat_sleep                             [Tracepoint event]
  sched:sched_stat_wait                              [Tracepoint event]
  sched:sched_stick_numa                             [Tracepoint event]
  sched:sched_swap_numa                              [Tracepoint event]
  sched:sched_switch                                 [Tracepoint event]
  sched:sched_wait_task                              [Tracepoint event]
  sched:sched_wake_idle_without_ipi                  [Tracepoint event]
  sched:sched_wakeup                                 [Tracepoint event]
  sched:sched_wakeup_new                             [Tracepoint event]
  sched:sched_waking                                 [Tracepoint event]
  • perf stat [-e <event>] <command>:执行一个命令,统计这个命令在执行过程中事件的发生情况。
#执行sleep命令,并且进行事件统计
[root@VM-centos ~]# perf stat sleep 5

 Performance counter stats for 'sleep 5':

              0.47 msec task-clock                #    0.000 CPUs utilized          
                 1      context-switches          #    0.002 M/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                74      page-faults               #    0.156 M/sec                  
   <not supported>      cycles                                                      
   <not supported>      instructions                                                
   <not supported>      branches                                                    
   <not supported>      branch-misses                                               

       5.000771757 seconds time elapsed

       0.000724000 seconds user
       0.000000000 seconds sys

#执行sleep命令,对系统所有CPU事件进行统计
[root@VM-centos ~]# perf stat -a sleep 5

 Performance counter stats for 'system wide':

         40,004.92 msec cpu-clock                 #    8.000 CPUs utilized          
             6,325      context-switches          #    0.158 K/sec                  
               211      cpu-migrations            #    0.005 K/sec                  
            20,939      page-faults               #    0.523 K/sec                  
   <not supported>      cycles                                                      
   <not supported>      instructions                                                
   <not supported>      branches                                                    
   <not supported>      branch-misses                                               

       5.000791182 seconds time elapsed


#执行sleep命令,对系统所有CPU事件进行统计,并且每隔1000ms打印统计信息。
[root@VM-centos ~]# perf stat -a -I 1000 sleep 5
#           time             counts unit events
     1.000251373           8,002.05 msec cpu-clock                 #    8.002 CPUs utilized          
     1.000251373                950      context-switches          #    0.119 K/sec                  
     1.000251373                 12      cpu-migrations            #    0.001 K/sec                  
     1.000251373              1,239      page-faults               #    0.155 K/sec                  
     1.000251373    <not supported>      cycles                                                      
     1.000251373    <not supported>      instructions                                                
     1.000251373    <not supported>      branches                                                    
     1.000251373    <not supported>      branch-misses                                               
     2.000639272           8,003.10 msec cpu-clock                 #    8.003 CPUs utilized          
     2.000639272                906      context-switches          #    0.113 K/sec                  
     2.000639272                  6      cpu-migrations            #    0.001 K/sec                  
     2.000639272                 10      page-faults               #    0.001 K/sec                  
     2.000639272    <not supported>      cycles                                                      
     2.000639272    <not supported>      instructions                                                
     2.000639272    <not supported>      branches                                                    
     2.000639272    <not supported>      branch-misses                                               
     3.000999689           8,002.96 msec cpu-clock                 #    8.003 CPUs utilized          
     3.000999689                917      context-switches          #    0.115 K/sec                  
     3.000999689                  7      cpu-migrations            #    0.001 K/sec                  
     3.000999689                125      page-faults               #    0.016 K/sec                  
     3.000999689    <not supported>      cycles                                                      
     3.000999689    <not supported>      instructions                                                
     3.000999689    <not supported>      branches                                                    
     3.000999689    <not supported>      branch-misses                                               
     4.001390263           8,003.21 msec cpu-clock                 #    8.003 CPUs utilized          
     4.001390263                932      context-switches          #    0.116 K/sec                  
     4.001390263                  4      cpu-migrations            #    0.000 K/sec                  
     4.001390263                  1      page-faults               #    0.000 K/sec                  
     4.001390263    <not supported>      cycles                                                      
     4.001390263    <not supported>      instructions                                                
     4.001390263    <not supported>      branches                                                    
     4.001390263    <not supported>      branch-misses                                               
     5.000611536           7,993.74 msec cpu-clock                 #    7.994 CPUs utilized          
     5.000611536                990      context-switches          #    0.124 K/sec                  
     5.000611536                  7      cpu-migrations            #    0.001 K/sec                  
     5.000611536              1,176      page-faults               #    0.147 K/sec                  
     5.000611536    <not supported>      cycles                                                      
     5.000611536    <not supported>      instructions                                                
     5.000611536    <not supported>      branches                                                    
     5.000611536    <not supported>      branch-misses


#如果不使用-e进行指定监控的事件,perf默认只监控perf list中前8个事件,可以使用通配符来统计一类事件。
[root@VM-centos ~]# perf stat -e 'sched:*' sleep 5

 Performance counter stats for 'sleep 5':

                 0      sched:sched_kthread_stop                                    
                 0      sched:sched_kthread_stop_ret                                   
                 0      sched:sched_waking                                          
                 0      sched:sched_wakeup                                          
                 0      sched:sched_wakeup_new                                      
                 1      sched:sched_switch                                          
                 0      sched:sched_migrate_task                                    
                 0      sched:sched_process_free                                    
                 1      sched:sched_process_exit                                    
                 0      sched:sched_wait_task                                       
                 0      sched:sched_process_wait                                    
                 0      sched:sched_process_fork                                    
                 1      sched:sched_process_exec                                    
                 0      sched:sched_stat_wait                                       
                 0      sched:sched_stat_sleep                                      
                 0      sched:sched_stat_iowait                                     
                 0      sched:sched_stat_blocked                                    
           459,952      sched:sched_stat_runtime                                    
                 0      sched:sched_pi_setprio                                      
                 0      sched:sched_process_hang                                    
                 0      sched:sched_move_numa                                       
                 0      sched:sched_stick_numa                                      
                 0      sched:sched_swap_numa                                       
                 0      sched:sched_wake_idle_without_ipi                                   

       5.000703957 seconds time elapsed

       0.000638000 seconds user
       0.000000000 seconds sys


#可以使用-p选项来指定监控的进程PID
[root@VM-centos ~]# perf stat -e 'sched:*' -p 105605
^C
 Performance counter stats for process id '105605':

     <not counted>      sched:sched_kthread_stop                                    
     <not counted>      sched:sched_kthread_stop_ret                                   
     <not counted>      sched:sched_waking                                          
     <not counted>      sched:sched_wakeup                                          
     <not counted>      sched:sched_wakeup_new                                      
     <not counted>      sched:sched_switch                                          
     <not counted>      sched:sched_migrate_task                                    
     <not counted>      sched:sched_process_free                                    
     <not counted>      sched:sched_process_exit                                    
     <not counted>      sched:sched_wait_task                                       
     <not counted>      sched:sched_process_wait                                    
     <not counted>      sched:sched_process_fork                                    
     <not counted>      sched:sched_process_exec                                    
     <not counted>      sched:sched_stat_wait                                       
     <not counted>      sched:sched_stat_sleep                                      
     <not counted>      sched:sched_stat_iowait                                     
     <not counted>      sched:sched_stat_blocked                                    
     <not counted>      sched:sched_stat_runtime                                    
     <not counted>      sched:sched_pi_setprio                                      
     <not counted>      sched:sched_process_hang                                    
     <not counted>      sched:sched_move_numa                                       
     <not counted>      sched:sched_stick_numa                                      
     <not counted>      sched:sched_swap_numa                                       
     <not counted>      sched:sched_wake_idle_without_ipi                                   

      19.698014307 seconds time elapsed
  • perf record [-e <event>] <command>:执行一个命令,收集这个命令在运行时更全面的信息,并且保存在perf.data文件中。
#-c perf采样的间隔,每次发生minor-faults事件都要进行采样。
[root@VM-centos ~]# perf record -e minor-faults -c 1 sleep 10
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 2.191 MB perf.data (25261 samples) ]


#-a 选项指定采样系统中所有的 CPU。
#-g 选项指定记录下用户进程或者内核的调用栈。其中,--call-graph dwarf 指定调用栈收集的方式为 dwarf,即 libdwarf 和 libdunwind 的方式。Perf 还支持 fp 和 lbs 方式。
#-F 选项指定 perf 以 997 次每秒的频率对 CPU 上运行的用户进程或者内核上下文进行采样 (Sampling)。由于 Linux 内核的时钟中断是以 1000 次每秒的频率周期触发,所以按照 997 频率采样可以避免每次采样都采样到始终中断相关的处理,减少干扰。
#sleep 60 则是通过 perf 指定运行的命令,这个命令起到了让 perf 运行 60 秒然后退出的效果。

[root@VM-centos ~]# perf record -a -g --call-graph dwarf -F 997 sleep 10
[ perf record: Woken up 39 times to write data ]
[ perf record: Captured and wrote 10.978 MB perf.data (63514 samples) ]

[root@VM-centos ~]# ls
perf.data
  • perf script:使用perf script可以将perf.data中所有的采样点信息进行展示。
[root@VM-centos ~]# perf script
swapper     0 [000] 32190.034251:    1003009 cpu-clock:pppH: 
        ffffffff81c86235 default_idle+0x35 (/lib/modules/.../build/vmlinux)
        ffffffff8102b59f arch_cpu_idle+0xf (/lib/modules/.../build/vmlinux)
        ffffffff81c867a6 default_idle_call+0x26 (/lib/.../build/vmlinux)

#-F选项可以指定展示的列
[root@VM-centos ~]# perf script -F comm,pid
            perf 120910 
            perf 120910 
            perf 120910 
            perf 120911 
            perf 120910 
            perf 120910 
            perf 120910 
            perf 120911 
            perf 120911
  • perf report:使用perf record可以将perf.data中的信息更结构化的展示。
#用stdio对采样结果以调用栈的形式呈现
[root@VM-centos ~]# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 63K of event 'cpu-clock:pppH'
# Event count (approx.): 63705113626
#
# Children      Self  Command     Shared Object               Symbol                                                                           
# ........  ........  ..........  ..........................  .................................................................................
#
    99.84%     0.00%  swapper     [kernel.vmlinux]            [k] 0xffffffff810000e6
            |
            ---0xffffffff810000e6
               |          
               |--84.16%--start_secondary
               |          cpu_startup_entry
               |          do_idle
               |          |          
               |           --84.14%--default_idle_call
               |                     arch_cpu_idle
               |                     default_idle
               |          
                --15.68%--x86_64_start_kernel
                          x86_64_start_reservations
                          start_kernel
                          arch_call_rest_init
                          rest_init
                          cpu_startup_entry
                          do_idle
                          default_idle_call
                          arch_cpu_idle
                          default_idle

    99.84%     0.00%  swapper     [kernel.vmlinux]            [k] cpu_startup_entry
            |
            ---cpu_startup_entry
               do_idle
               |          
                --99.82%--default_idle_call
                          arch_cpu_idle
                          default_idle

#查看header里面的captured on时间,应该表示结束时间,time of last sample最后采集时间戳,单位是秒,可往前追溯现场时间
[root@VM-centos ~]# perf report --header-only
...
# CACHE info available, use -I to display
# time of first sample : 7898.887141
# time of last sample : 7908.898817
...

#根据时间戳索引
[root@VM-centos ~]# perf report --time start_tsc,end_tsc
...
  • 祭出终极法宝——火焰图
[root@VM-centos ~]# git clone https://github.com/brendangregg/FlameGraph
Cloning into 'FlameGraph'...
remote: Enumerating objects: 1147, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 1147 (delta 13), reused 13 (delta 5), pack-reused 1119
Receiving objects: 100% (1147/1147), 1.90 MiB | 3.84 MiB/s, done.
Resolving deltas: 100% (659/659), done.

[root@VM-centos ~]# ls
perf.data

[root@VM-centos ~]# cp perf.data ./FlameGraph/

[root@VM-centos ~]# cd ./FlameGraph/

[root@VM-centos FlameGraph]# ls

aix-perf.pl                 example-perf.svg  record-test.sh                   stackcollapse-java-exceptions.pl  stackcollapse-sample.awk
demos                       files.pl          stackcollapse-aix.pl             stackcollapse-jstack.pl           stackcollapse-stap.pl
dev                         flamegraph.pl     stackcollapse-bpftrace.pl        stackcollapse-ljp.awk             stackcollapse-vsprof.pl
difffolded.pl               jmaps             stackcollapse-chrome-tracing.py  stackcollapse-perf.pl             stackcollapse-vtune.pl
docs                        perf.data         stackcollapse-elfutils.pl        stackcollapse-perf-sched.awk      stackcollapse-wcp.pl
example-dtrace-stacks.txt   pkgsplit-perf.pl  stackcollapse-gdb.pl             stackcollapse.pl                  stackcollapse-xdebug.php
example-dtrace.svg          range-perf.pl     stackcollapse-go.pl              stackcollapse-pmc.pl              test
example-perf-stacks.txt.gz  README.md         stackcollapse-instruments.pl     stackcollapse-recursive.pl        test.sh

[root@VM-centos FlameGraph]# perf script | ./stackcollapse-perf.pl > out.perf-folded

[root@VM-centos FlameGraph]# cat out.perf-folded | ./flamegraph.pl > perf-kernel.svg

#用浏览器打开perf-kernel.svg矢量图。