網(wǎng)站首頁 編程語言 正文
proc
- NAME (名稱解釋):
proc - process information pseudo-filesystem (存儲進(jìn)程信息的偽文件系統(tǒng))
- DESCRIPTION (詳細(xì))
The ?proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures. ?
It is commonly mounted at /proc. ?Most of it is read-only, but some files allow kernel variables to
be changedpooc文件系統(tǒng)是一個偽裝的文件系統(tǒng),它提供接口給內(nèi)核來存儲數(shù)據(jù),通常掛載在設(shè)備的/proc目錄,
大部分文件是只讀的,但是有些文件可以被內(nèi)和變量給改變.
具體代表的含義可以通過man proc
去查看. 以上信息就是通過man
獲取.翻譯不一定精確.
loadavg
cat /proc/loadavg
/proc/loadavg
? The first three fields in this file are load average figures giving the number of?
? jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1, 5,?
? and ?15 ?minutes. ??這個文件的前三個數(shù)字是平均負(fù)載的數(shù)值,計(jì)算平均1分鐘,5分鐘,15分鐘內(nèi)的運(yùn)行隊(duì)列中(R狀態(tài))或等待磁盤I/O(D狀態(tài))的任務(wù)數(shù).
The first of these is the number of cur‐rently runnable kernel scheduling entities?
? (processes, threads). ?The value after the slash is the number of kernel scheduling?
? entities that currently exist on the system.?第四個參數(shù)/前面是可運(yùn)行的內(nèi)核調(diào)度實(shí)體的數(shù)量(調(diào)度實(shí)體指 進(jìn)程,線程), /后的值是系統(tǒng)中存在的內(nèi)核調(diào)度實(shí)體的數(shù)量.
The fifth field ?is the PID of the process that was most recently created on the system.
第五個參數(shù)是系統(tǒng)最新創(chuàng)建進(jìn)程的PID
1: 問題起源
在從事的大屏領(lǐng)域遇到一個問題,就是loadavg
中的數(shù)值其高無比,對比8
核手機(jī)的3+
,4+
,目前的手頭的設(shè)備loadavg
竟然高達(dá)70+
,這個問題一直困擾了我很久,最近騰出一個整塊的時間來研究一下這個數(shù)值的計(jì)算規(guī)則.
在kernel
中的loadvg.c
文件中有這樣的一個函數(shù).我們看到它就是最終的輸出函數(shù).
static int loadavg_proc_show(struct seq_file *m, void *v) { unsigned long avnrun[3]; get_avenrun(avnrun, FIXED_1/200, 0); seq_printf(m, "%lu.%02lu %lu.%02lu %lu.%02lu %ld/%d %d\n", LOAD_INT(avnrun[0]), LOAD_FRAC(avnrun[0]), // 1分鐘平均值 LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]), // 5分鐘平均值 LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]), // 15分鐘平均值 // 可運(yùn)行實(shí)體使用 nr_running()獲取, nr_threads 是存在的所有實(shí)體 nr_running() , nr_threads, // 獲取最新創(chuàng)建的進(jìn)程PID task_active_pid_ns(current)->last_pid); return 0; }
看過上面的代碼獲取具體平均負(fù)載的函數(shù)是get_avenrun()
,我們接著找一下它的具體實(shí)現(xiàn).
unsigned long avenrun[3]; EXPORT_SYMBOL(avenrun); /* should be removed */ /** * get_avenrun - get the load average array * @loads: pointer to dest load array * @offset: offset to add * @shift: shift count to shift the result left * * These values are estimates at best, so no need for locking. */ void get_avenrun(unsigned long *loads, unsigned long offset, int shift) { //數(shù)據(jù)來源主要是avenrun數(shù)組 loads[0] = (avenrun[0] + offset) << shift; loads[1] = (avenrun[1] + offset) << shift; loads[2] = (avenrun[2] + offset) << shift; }
2: 數(shù)據(jù)來源
接著我們接著尋找avenrun[]
在哪里賦值,我們先看數(shù)據(jù)的來源問題.
-
kernel
版本4.9
代碼路徑kernel/sched/core.c
,kernel/sched/loadavg.c
.
2.1:scheduler_tick
/* * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. * 這里注釋就比較清楚了,由計(jì)時器調(diào)度,調(diào)度的頻率為HZ */ void scheduler_tick(void) { int cpu = smp_processor_id(); struct rq *rq = cpu_rq(cpu); struct task_struct *curr = rq->curr; sched_clock_tick(); raw_spin_lock(&rq->lock); walt_set_window_start(rq); walt_update_task_ravg(rq->curr, rq, TASK_UPDATE, walt_ktime_clock(), 0); update_rq_clock(rq); curr->sched_class->task_tick(rq, curr, 0); cpu_load_update_active(rq); calc_global_load_tick(rq); // 這里調(diào)度 raw_spin_unlock(&rq->lock); perf_event_task_tick(); #ifdef CONFIG_SMP rq->idle_balance = idle_cpu(cpu); trigger_load_balance(rq); #endif rq_last_tick_reset(rq); if (curr->sched_class == &fair_sched_class) check_for_migration(rq, curr); }
2.2: calc_global_load_tick
/* * Called from scheduler_tick() to periodically update this CPU's * active count. */ void calc_global_load_tick(struct rq *this_rq) { long delta; //過濾系統(tǒng)負(fù)載重復(fù)更新,這里是同過jiffies進(jìn)行過濾,jiffies也在下面統(tǒng)一介紹 if (time_before(jiffies, this_rq->calc_load_update)) return; // 更新數(shù)據(jù) delta = calc_load_fold_active(this_rq, 0); if (delta) // 將數(shù)據(jù)同步到calc_load_tasks, atomic_long_add 是kernel中的一個原子操作函數(shù) atomic_long_add(delta, &calc_load_tasks); // 下一次系統(tǒng)更新系統(tǒng)負(fù)載的時間 LOAD_FREQ定義在include/linux/sched.h // #define LOAD_FREQ (5*HZ+1) /* 5 sec intervals */ this_rq->calc_load_update += LOAD_FREQ; }
2.3: calc_load_fold_active
long calc_load_fold_active(struct rq *this_rq, long adjust) { long nr_active, delta = 0; nr_active = this_rq->nr_running - adjust; //統(tǒng)計(jì)調(diào)度器中nr_running的task數(shù)量 adjust傳入為0,不做討論. nr_active += (long)this_rq->nr_uninterruptible; //統(tǒng)計(jì)調(diào)度器中nr_uninterruptible的task的數(shù)量. // calc_load_active代表了nr_running和nr_uninterruptible的數(shù)量,如果存在差值就計(jì)算差值 if (nr_active != this_rq->calc_load_active) { delta = nr_active - this_rq->calc_load_active; this_rq->calc_load_active = nr_active; } // 統(tǒng)計(jì)完成,return后,將數(shù)據(jù)更新到 calc_load_tasks. return delta; }
3: 數(shù)據(jù)計(jì)算
看完數(shù)據(jù)來源的邏輯,我們接著梳理數(shù)據(jù)計(jì)算的邏輯
這里前半部分的邏輯設(shè)計(jì)的底層驅(qū)動的高分辨率定時器模塊,我并不是十分了解.簡單的介紹一下,感興趣的可以自己去研究一下.(類名:tick-sched.c,因?yàn)?code>planuml不支持類名存在-
)
3.1: tick_sched_timer
/* * High resolution timer specific code */ //這里要看下內(nèi)核是否開啟了高分辨率定時器+ CONFIG_HIGH_RES_TIMERS = y #ifdef CONFIG_HIGH_RES_TIMERS /* * We rearm the timer until we get disabled by the idle code. * Called with interrupts disabled. */ // tick_sched_timer函數(shù)是高分辨率定時器的到期函數(shù),也就是定時的每個周期結(jié)束都會執(zhí)行 static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) { struct tick_sched *ts = container_of(timer, struct tick_sched, sched_timer); struct pt_regs *regs = get_irq_regs(); ktime_t now = ktime_get(); tick_sched_do_timer(now); ... return HRTIMER_RESTART; }
3.2: calc_global_load
中間的定時器模塊的函數(shù)就跳過了,已經(jīng)超出本文的范圍,我也并不是完全了解其中的邏輯.
/* * calc_load - update the avenrun load estimates 10 ticks after the * CPUs have updated calc_load_tasks. * * Called from the global timer code. */ void calc_global_load(unsigned long ticks) { long active, delta; // 在前文出現(xiàn)過的時間,這里有加上了10個tick,總間隔就是5s + 10 tick if (time_before(jiffies, calc_load_update + 10)) return; /* * Fold the 'old' idle-delta to include all NO_HZ cpus. */ // 統(tǒng)計(jì)NO_HZ模式下,cpu陷入空閑時間段錯過統(tǒng)計(jì)的task數(shù)據(jù) delta = calc_load_fold_idle(); if (delta) atomic_long_add(delta, &calc_load_tasks); // 更新數(shù)據(jù) active = atomic_long_read(&calc_load_tasks); // 原子的方式讀取前面存入的全局變量 active = active > 0 ? active * FIXED_1 : 0; // 乘FIXED_1 avenrun[0] = calc_load(avenrun[0], EXP_1, active); // 1分鐘負(fù)載 avenrun[1] = calc_load(avenrun[1], EXP_5, active); // 5分鐘負(fù)載 avenrun[2] = calc_load(avenrun[2], EXP_15, active); // 15分鐘負(fù)載 calc_load_update += LOAD_FREQ; //更新時間 /* * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk. */ //統(tǒng)計(jì)了NO_HZ模式下的task數(shù)據(jù),也要將NO_HZ模式下的tick數(shù)重新計(jì)算,要不然數(shù)據(jù)會不準(zhǔn). calc_global_nohz(); }
這里出現(xiàn)了一個NO_HZ
模式,這個是CPU的一個概念,后文專門介紹一下.下面就是負(fù)載的計(jì)算規(guī)則了
3.3:計(jì)算規(guī)則 calc_load
/* * a1 = a0 * e + a * (1 - e) */ static unsigned long calc_load(unsigned long load, unsigned long exp, unsigned long active) { unsigned long newload; newload = load * exp + active * (FIXED_1 - exp); if (active >= load) newload += FIXED_1-1; return newload / FIXED_1; }
具體的計(jì)算規(guī)則注釋也是非常清晰了,并不復(fù)雜,整體下來就和使用man proc
獲取到的信息一樣,系統(tǒng)負(fù)載統(tǒng)計(jì)的是nr_running
和nr_uninterruptible
的數(shù)量.這兩個數(shù)據(jù)的來源就是core.c
的struct rq
,rq
是CPU運(yùn)行隊(duì)列中重要的存儲結(jié)構(gòu)之一.
問題解析
回到最初的問題,我司的設(shè)備系統(tǒng)負(fù)載達(dá)到70+
還沒有卡爆炸的原因,通過上面的代碼邏輯還是沒有直接給出答案.不過已經(jīng)有了邏輯,其他就很簡單了.
- 1: 我輸出了
nr_running
和nr_uninterruptible
的task數(shù)量發(fā)現(xiàn),nr_running
的數(shù)據(jù)是正常的,出問題的在與nr_uninterruptible
的數(shù)量. - 2:出問題的是
nr_uninterruptible
task數(shù)量,那么我司的設(shè)備真的有那么多任務(wù)在等待I/O么,真的有怎么多任務(wù)在等待I/O,設(shè)備依然會十分卡頓,我抓取了systrace
查看后,一切是正常的. - 3: 事情到了這里,就只能借助搜索引擎了.根據(jù)
nr_uninterruptible
的關(guān)鍵字,我查到了一些蛛絲馬跡.
簡述結(jié)果
首先在UNIX
系統(tǒng)上是沒有統(tǒng)計(jì)nr_uninterruptible
的,Linux
在引入后,有人提出不統(tǒng)計(jì)I/O
等待的任務(wù)數(shù)量,無法體現(xiàn)真正體現(xiàn)系統(tǒng)的負(fù)載狀況.
后面在很多Linux
大佬的文章中看到一個信息,NFS系統(tǒng)出現(xiàn)問題的的時候,會將所有訪問這個文件系統(tǒng)的線程都標(biāo)識為nr_uninterruptible
,這部分的知識太貼近內(nèi)核了.(ps:如果有大佬有相關(guān)的內(nèi)核書籍推薦的話,請務(wù)必推薦一下).
- 結(jié)論: 因?yàn)?code>nr_uninterruptible的數(shù)據(jù)異常,導(dǎo)致系統(tǒng)負(fù)載數(shù)據(jù)并沒有體現(xiàn)出目前設(shè)備的真實(shí)狀況.
收獲和總結(jié)
- 1: scheduler_tick這個函數(shù)注釋中提到的
HZ
,應(yīng)該是軟中斷,軟中斷和內(nèi)核配置中的CONFIG_HZ_250
,CONFIG_HZ_1000
是關(guān)聯(lián)的,例如CONFIG_HZ_1000=y,CONFIG_HZ=1000
,就是每秒內(nèi)核會發(fā)出1000的軟中斷信號. 對應(yīng)的時間就是1s/1000
. (通常CONFIG_HZ=250
) - 2:
jiffies
它就是時鐘中斷次數(shù),jiffies = 1s / HZ
- 3:
rq
結(jié)構(gòu)體太長了,就不全部貼出來了,結(jié)構(gòu)體定義在kernel/sched/sched.h
中,有興趣的自行查看.
struct rq *rq = cpu_rq(cpu); /* * This is the main, per-CPU runqueue data structure. * * Locking rule: those places that want to lock multiple runqueues * (such as the load balancing or the thread migration code), lock * acquire operations must be ordered by ascending &runqueue. */ struct rq { /* runqueue lock: */ raw_spinlock_t lock; /* * nr_running and cpu_load should be in the same cacheline because * remote CPUs use both these fields when doing load calculation. */ unsigned int nr_running; // 這里 #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; unsigned int nr_preferred_running; #endif #define CPU_LOAD_IDX_MAX 5 unsigned long cpu_load[CPU_LOAD_IDX_MAX]; unsigned int misfit_task; #ifdef CONFIG_NO_HZ_COMMON #ifdef CONFIG_SMP unsigned long last_load_update_tick; #endif /* CONFIG_SMP */ unsigned long nohz_flags; #endif /* CONFIG_NO_HZ_COMMON */ #ifdef CONFIG_NO_HZ_FULL unsigned long last_sched_tick; #endif #ifdef CONFIG_CPU_QUIET /* time-based average load */ u64 nr_last_stamp; u64 nr_running_integral; seqcount_t ave_seqcnt; #endif /* capture load from *all* tasks on this cpu: */ struct load_weight load; unsigned long nr_load_updates; u64 nr_switches; struct cfs_rq cfs; struct rt_rq rt; struct dl_rq dl; #ifdef CONFIG_FAIR_GROUP_SCHED /* list of leaf cfs_rq on this cpu: */ struct list_head leaf_cfs_rq_list; struct list_head *tmp_alone_branch; #endif /* CONFIG_FAIR_GROUP_SCHED */ /* * This is part of a global counter where only the total sum * over all CPUs matters. A task can increase this counter on * one CPU and if it got migrated afterwards it may decrease * it on another CPU. Always updated under the runqueue lock: */ unsigned long nr_uninterruptible; // 這里 struct task_struct *curr, *idle, *stop; unsigned long next_balance; struct mm_struct *prev_mm; unsigned int clock_skip_update; u64 clock; u64 clock_task; atomic_t nr_iowait; #ifdef CONFIG_SMP struct root_domain *rd; struct sched_domain *sd; unsigned long cpu_capacity; unsigned long cpu_capacity_orig; struct callback_head *balance_callback; unsigned char idle_balance; /* For active balancing */ int active_balance; int push_cpu; struct task_struct *push_task; struct cpu_stop_work active_balance_work; /* cpu of this runqueue: */ int cpu; int online; ... };
- 4高分辨率定時器針對單處理器系統(tǒng),可以為CPU提供的納米級定時精度.內(nèi)核配置
CONFIG_HIGH_RES_TIMERS=y
- 5:
NO_HZ
就是在CPU進(jìn)入休眠狀態(tài)時,不再持續(xù)的發(fā)送軟中斷信號,來減少設(shè)備功耗與耗電.內(nèi)核配置CONFIG_NO_HZ=y
&CONFIG_NO_HZ_IDLE=y
,那么相反,如果設(shè)備對功耗并不敏感,需要外部輸入電源,可以關(guān)閉這個模式,來提高性能. - 6:
Android
提取內(nèi)核配置:
adb pull /proc/config.gz .
原文鏈接:https://juejin.cn/post/7169417417599401998
相關(guān)推薦
- 2022-01-21 【每天一個 Linux 命令】Linux命令 mkdir,cat,touch,vi/vim
- 2022-10-16 Python?讀取?Word?文檔操作_python
- 2022-06-20 關(guān)于go-micro與其它gRPC框架之間的通信問題及解決方法_Golang
- 2022-05-10 IDEA中報錯 “Error running ‘Application‘: Command line
- 2021-12-12 c++虛函數(shù)與虛函數(shù)表原理_C 語言
- 2022-07-28 詳解Python?Flask框架的安裝及應(yīng)用_python
- 2022-12-24 詳解C#?parallel中并行計(jì)算的四種寫法總結(jié)_C#教程
- 2022-09-21 Android?Intent傳遞大量數(shù)據(jù)出現(xiàn)問題解決_Android
- 最近更新
-
- window11 系統(tǒng)安裝 yarn
- 超詳細(xì)win安裝深度學(xué)習(xí)環(huán)境2025年最新版(
- Linux 中運(yùn)行的top命令 怎么退出?
- MySQL 中decimal 的用法? 存儲小
- get 、set 、toString 方法的使
- @Resource和 @Autowired注解
- Java基礎(chǔ)操作-- 運(yùn)算符,流程控制 Flo
- 1. Int 和Integer 的區(qū)別,Jav
- spring @retryable不生效的一種
- Spring Security之認(rèn)證信息的處理
- Spring Security之認(rèn)證過濾器
- Spring Security概述快速入門
- Spring Security之配置體系
- 【SpringBoot】SpringCache
- Spring Security之基于方法配置權(quán)
- redisson分布式鎖中waittime的設(shè)
- maven:解決release錯誤:Artif
- restTemplate使用總結(jié)
- Spring Security之安全異常處理
- MybatisPlus優(yōu)雅實(shí)現(xiàn)加密?
- Spring ioc容器與Bean的生命周期。
- 【探索SpringCloud】服務(wù)發(fā)現(xiàn)-Nac
- Spring Security之基于HttpR
- Redis 底層數(shù)據(jù)結(jié)構(gòu)-簡單動態(tài)字符串(SD
- arthas操作spring被代理目標(biāo)對象命令
- Spring中的單例模式應(yīng)用詳解
- 聊聊消息隊(duì)列,發(fā)送消息的4種方式
- bootspring第三方資源配置管理
- GIT同步修改后的遠(yuǎn)程分支