linux电源管理(二),内核的CPUFreq(DVFS)和ARM的SCPI
更多linux系统电源管理相关的内容请看:https://blog.csdn.net/u010936265/article/details/146436725?spm=1011.2415.3001.5331
1 简介
CPUFreq子系统位于drivers/cpufreq目录下,负责进行运行过程中CPU频率和电压的动态调整,即DVFS (Dynamic Voltage Frequency Scaling,动态电源频率调整)。
《Linux设备驱动开发详解:基于最新的Linux4.0内核》19.2 CPUFreq驱动
CPU工作态电源管理在Linux内核中称为CPUFreq子系统(在一些文献中也称DVFS),它主要适用于CPU利用率在5%~100%(对单个CPU核而言)动态变化的场景,基本方法是动态变频和动态变压。
《用“芯”探核:基于龙芯的Linux内核探索解析》8.2 运行时电源管理
SoC CPUFreq驱动只是设定了CPU的频率参数,以及提供了设置频率的途径,但是它并不会管CPU自身究竟应该运行在哪种频率上。究竟频率依据的是哪种标准,进行何种变化,而这些完全由CPUFreq的策略决定。
系统的状态以及CPUFreq的策略共同决定了CPU频率跳变的目标,CPUFreq核心层并将目标频率传递给底层具体SoC的CPUFreq驱动,该驱动修改硬件,完成频率的变换。
《Linux设备驱动开发详解:基于最新的Linux4.0内核》19.2.2 CPUFreq的策略
2 cpufreq_driver
2.1 简介
每个SoC的具体CPUFreq驱动实例只需要实现电压、频率表,以及从硬件层面完成这些变化。
《Linux设备驱动开发详解:基于最新的Linux4.0内核》19.2.1 SoC的CPUFreq驱动实现
2.2 数据结构
//include/linux/cpufreq.h
struct cpufreq_driver {
char name[CPUFREQ_NAME_LEN];
......
int (*target)(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation); /* Deprecated */
int (*target_index)(struct cpufreq_policy *policy,
unsigned int index);
......
};
target()和target_index()
实现最终调频的接口,内部可以自行实现或调用CLK接口。
这是最重要的一个功能,在切换频率时调用。它会将当前CPU核的主频设置成CPUFreq策略提供的目标频率。
《SoC底层软件低功耗系统设计与实现》13.1.4 主要数据结构;3.driver相关数据结构
《⽤“芯”探核:基于⻰芯的Linux内核探索解析》8.2.1 动态变频;(一) CPUFreq的机制部分
register和unregister接口
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver);
2.3 实例分析:phytium (ARM)平台CPU的cpufreq_driver
2.3.1 phytium平台CPU相关功能简介
以phytium平台的FT-2000/4 CPU为例。
FT-2000/4 支持处理器的多种功耗管理技术,并通过 ARM 定义的 SCPI(System Control and Power Interface)[2]接口和 PSCI(Power State Corodination Interface)[3] 供系统功耗管理软件调用。
实现 core 运行频率的动态调节。通过 SCPI 接口,可以查询 CPU 支持的频率点集合,以及实现频率的动态切换。
《FT-2000/4软件编程手册》(V1.4); 7.1 CPU 功耗管理
2.3.2 ARM的SCP,SCPI简介
A System Control Processor (SCP) is a processor-based capability that provides a flexible and
extensible platform for provision of power management functions and services.
《ARM Compute Subsystem SCP Message Interface Protocols》
1.1 The System Control Processor
System Control and Power Interface (SCPI)
The SCPI is one of the primary interfaces to the SCP in an ARM CSS-based platform. It is used
to access many of the services that are exposed to the AP. The SCP is expected to be idle and
waiting for SCPI commands for most of the time after the system boot process completes.
《ARM Compute Subsystem SCP Message Interface Protocols》
Chapter 3 CSS System Control and Power Interface (SCPI)
2.3.3 数据结构
//drivers/cpufreq/scpi-cpufreq.c
static struct cpufreq_driver scpi_cpufreq_driver = {
.name = "scpi-cpufreq",
.flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
CPUFREQ_NEED_INITIAL_FREQ_CHECK |
CPUFREQ_IS_COOLING_DEV,
.verify = cpufreq_generic_frequency_table_verify,
.attr = cpufreq_generic_attr,
.get = scpi_cpufreq_get_rate,
.init = scpi_cpufreq_init,
.exit = scpi_cpufreq_exit,
.target_index = scpi_cpufreq_set_target,
};
2.3.4 scpi_cpufreq_init()代码大致流程
scpi_cpufreq_init();
-> scpi_ops->add_opps_to_device(cpu_dev);
-> scpi_dvfs_add_opps_to_device();
-> scpi_dvfs_info();
-> scpi_dvfs_get_info();
-> scpi_send_message(CMD_GET_DVFS_INFO, ...);
-> info->count = buf.opp_count;
-> opp->freq = le32_to_cpu(buf.opps[i].freq);
-> dev_pm_opp_add();
-> dev_pm_opp_init_cpufreq_table(); //create a cpufreq table for a device
scpi_cpufreq_init()函数会使用SCPI接口获取CPU的频率和电压等信息,然后根据这些信息实现一个struct cpufreq_frequency_table。
具体信息请看SCPI命令中的Get DVFS Info命令(《ARM Compute Subsystem SCP Message Interface Protocols》3.2.9 Get DVFS Info)
2.3.5 设置频率的流程
scpi_cpufreq_set_target();
-> clk_set_rate(priv->clk, rate);
-> clk_core_set_rate_nolock();
-> clk_change_rate();
-> core->ops->set_rate();
-> scpi_clk_set_rate();
-> clk->scpi_ops->clk_set_val();
-> scpi_clk_set_val();
-> scpi_send_message(CMD_SET_CLOCK_VALUE, ...);
《ARM Compute Subsystem SCP Message Interface Protocols》3.2.15 Set Clock Value
2.4 查看系统当前使用的cpufreq_driver
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
或者
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_driver
3 CPUFreq的governor
3.1 简介
CPUFreq策略(Governor)的主要原则是根据当前系统负载来选择最合适的主频/电压。
《用“芯”探核:基于龙芯的Linux内核探索解析》8.2 运行时电源管理
3.2 数据结构
3.2.1 struct cpufreq_governor
//include/linux/cpufreq.h
struct cpufreq_governor {
char name[CPUFREQ_NAME_LEN];
int (*init)(struct cpufreq_policy *policy);
void (*exit)(struct cpufreq_policy *policy);
int (*start)(struct cpufreq_policy *policy);
void (*stop)(struct cpufreq_policy *policy);
void (*limits)(struct cpufreq_policy *policy);
ssize_t (*show_setspeed) (struct cpufreq_policy *policy,
char *buf);
int (*store_setspeed) (struct cpufreq_policy *policy,
unsigned int freq);
/* For governors which change frequency dynamically by themselves */
bool dynamic_switching;
struct list_head governor_list;
struct module *owner;
};
register和unregister函数:
int cpufreq_register_governor(struct cpufreq_governor *governor)
void cpufreq_unregister_governor(struct cpufreq_governor *governor)
3.2.2 struct dbs_governor;
//drivers/cpufreq/cpufreq_governor.h
/* Common Governor data across policies */
struct dbs_governor {
struct cpufreq_governor gov;
struct kobj_type kobj_type;
/*
* Common data for platforms that don't set
* CPUFREQ_HAVE_GOVERNOR_PER_POLICY
*/
struct dbs_data *gdbs_data;
unsigned int (*gov_dbs_update)(struct cpufreq_policy *policy);
struct policy_dbs_info *(*alloc)(void);
void (*free)(struct policy_dbs_info *policy_dbs);
int (*init)(struct dbs_data *dbs_data);
void (*exit)(struct dbs_data *dbs_data);
void (*start)(struct cpufreq_policy *policy);
};
3.2.3 链表:cpufreq_governor_list
用来存放所有注册的governor节点
//drivers/cpufreq/cpufreq.c
static LIST_HEAD(cpufreq_governor_list);
cpufreq_register_governor();
-> list_add(&governor->governor_list, &cpufreq_governor_list);
3.3 现有的策略
3.3.1 performance
this governor causes the highest frequency, within the ``scaling_max_freq`` policy limit, to be requested for that policy.
//drivers/cpufreq/cpufreq_performance.c
static struct cpufreq_governor cpufreq_gov_performance = {
.name = "performance",
.owner = THIS_MODULE,
.limits = cpufreq_gov_performance_limits,
};
cpufreq_gov_performance_init();
-> cpufreq_register_governor(&cpufreq_gov_performance);
3.3.2 powersave
this governor causes the lowest frequency, within the ``scaling_min_freq`` policy limit, to be requested for that policy.
//drivers/cpufreq/cpufreq_powersave.c
static struct cpufreq_governor cpufreq_gov_powersave = {
.name = "powersave",
.limits = cpufreq_gov_powersave_limits,
.owner = THIS_MODULE,
};
cpufreq_gov_powersave_init();
-> cpufreq_register_governor(&cpufreq_gov_powersave);
3.3.3 userspace
This governor does not do anything by itself. Instead, it allows user space to set the CPU frequency for the policy it is attached to by writing to the ``scaling_setspeed`` attribute of that policy.
//drivers/cpufreq/cpufreq_userspace.c
static struct cpufreq_governor cpufreq_gov_userspace = {
.name = "userspace",
.init = cpufreq_userspace_policy_init,
.exit = cpufreq_userspace_policy_exit,
.start = cpufreq_userspace_policy_start,
.stop = cpufreq_userspace_policy_stop,
.limits = cpufreq_userspace_policy_limits,
.store_setspeed = cpufreq_set,
.show_setspeed = show_speed,
.owner = THIS_MODULE,
};
cpufreq_gov_userspace_init();
-> cpufreq_register_governor(&cpufreq_gov_userspace);
3.3.4 schedutil
This governor uses CPU utilization data available from the CPU scheduler. It generally is regarded as a part of the CPU scheduler, so it can access the scheduler's internal data structures directly.
//kernel/sched/cpufreq_schedutil.c
struct cpufreq_governor schedutil_gov = {
.name = "schedutil",
.owner = THIS_MODULE,
.dynamic_switching = true,
.init = sugov_init,
.exit = sugov_exit,
.start = sugov_start,
.stop = sugov_stop,
.limits = sugov_limits,
};
sugov_register();
-> cpufreq_register_governor(&schedutil_gov);
当系统负载发生变化时,会根据负载来调整CPU频率,流程大致如下:
cpufreq_update_util();
-> data->func();
-> sugov_update_single();
-> sugov_deferred_update();
-> irq_work_queue(&sg_policy->irq_work);
-> sugov_irq_work();
-> sugov_work();
-> __cpufreq_driver_target();
-> cpufreq_driver->target();
3.3.5 ondemand
按需(Ondemand)策略:设置CPU负载的阈值T,当负载低于T时,调节⾄⼀个刚好能够 满⾜当前负载需求的最低频/最低压;当负载⾼于T时,⽴即提升到最⾼性能状态。
//drivers/cpufreq/cpufreq_ondemand.c
static struct dbs_governor od_dbs_gov = {
.gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("ondemand"),
.kobj_type = { .default_attrs = od_attributes },
.gov_dbs_update = od_dbs_update,
.alloc = od_alloc,
.free = od_free,
.init = od_init,
.exit = od_exit,
.start = od_start,
};
cpufreq_gov_dbs_init();
-> cpufreq_register_governor(CPU_FREQ_GOV_ONDEMAND);
3.3.6 conservative
保守(Conservative)策略:跟Ondemand策略类似,设置CPU负载的阈值T,当 负载低于T时,调节⾄⼀个刚好能够满⾜当前负载需求的最低频/最低压;但当负载 ⾼于T时,不是⽴即设置为最⾼性能状态,⽽是逐级升⾼主频/电压。
//drivers/cpufreq/cpufreq_conservative.c
static struct dbs_governor cs_governor = {
.gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("conservative"),
.kobj_type = { .default_attrs = cs_attributes },
.gov_dbs_update = cs_dbs_update,
.alloc = cs_alloc,
.free = cs_free,
.init = cs_init,
.exit = cs_exit,
.start = cs_start,
};
cpufreq_gov_dbs_init();
-> cpufreq_register_governor(CPU_FREQ_GOV_CONSERVATIVE);
参考资料
Documentation/admin-guide/pm/cpufreq.rst
《Linux设备驱动开发详解:基于最新的Linux4.0内核》 19.2.2 CPUFreq的策略
《SoC底层软件低功耗系统设计与实现》 13.1.5主要函数实现;4.ondemand governor
《⽤“芯”探核:基于⻰芯的Linux内核探索解析》 8.2 运⾏时电源管理
3.4 配置系统当前的governor
查看当前支持的governor
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
performance powersave
或者
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors
performance powersave
设置当前的governor
echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
或者
echo powersave > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
4 其他数据结构
4.1 struct cpufreq_frequency_table;
当前CPU支持的频率表。
//include/linux/cpufreq.h
struct cpufreq_frequency_table {
unsigned int flags;
unsigned int driver_data; /* driver specific data, not used by core */
unsigned int frequency; /* kHz - doesn't need to be in ascending
* order */
};
4.2 struct cpufreq_policy;
每个CPU核都有自己的控制策略(cpufreq_policy)
//include/linux/cpufreq.h
struct cpufreq_policy {
/* CPUs sharing clock, require sw coordination */
cpumask_var_t cpus; /* Online CPUs only */
cpumask_var_t related_cpus; /* Online + Offline CPUs */
......
unsigned int min; /* in kHz */
unsigned int max; /* in kHz */
unsigned int cur; /* in kHz, only needed if cpufreq */
......
struct cpufreq_governor *governor;
......
struct cpufreq_frequency_table *freq_table; //当前CPU支持的频率表
......
};
结构体成员说明
<1> cpus和related_cpus
cpus及related_cpus表示当前policy管理的CPU,cpus代表当前处于online状态的CPU,related_cpus表示所有包含online/offline的CPU。
查看cpus和related_cpus的值
cat /sys/devices/system/cpu/cpufreq/policy0/affected_cpus
cat /sys/devices/system/cpu/cpufreq/policy0/related_cpus
<2> min/max/cur
min/max/cur表示当前policy支持的最大、最小及当前频率。
查看或者设置min/max的值
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
/sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
查看cur的值
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq
《SoC底层软件低功耗系统设计与实现》
13.1.4 主要数据结构;1.cpufreq_policy结构体
初始化函数:cpufreq_init_policy();
5 nofifier
5.1 简介
在频率变化的过程 中,会发送2次通知:
CPUFREQ_PRECHANGE:准备进⾏频率变更
CPUFREQ_POSTCHANGE:已经完成频率变更
数据结构:BLOCKING_NOTIFIER_HEAD(cpufreq_policy_notifier_list);
发出通知的代码:
srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
CPUFREQ_PRECHANGE, freqs);
srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
CPUFREQ_POSTCHANGE, freqs);
《Linux设备驱动开发详解:基于最新的Linux4.0内核》 19.2.4 CPUFreq通知
6 调试
6.1 cpufreq-stats
cpufreq-stats is a driver that provides CPU frequency statistics for each CPU.
/sys/devices/system/cpu/cpu0/cpufreq/stats # ls -l
total 0
drwxr-xr-x 2 root root 0 May 14 16:06 .
drwxr-xr-x 3 root root 0 May 14 15:58 ..
--w------- 1 root root 4096 May 14 16:06 reset
-r--r--r-- 1 root root 4096 May 14 16:06 time_in_state
-r--r--r-- 1 root root 4096 May 14 16:06 total_trans
-r--r--r-- 1 root root 4096 May 14 16:06 trans_table
Documentation/cpu-freq/cpufreq-stats.txt
注意:
当使⽤cpufreq_driver驱动是intel_pstate时,不会存在stats/⽬录
6.2 /sys/kernel/debug/tracing/events/power/
cpu_frequency_limits
cpu_frequency
6.3 cpufreq-bench
工具源码:<kernel_src>/tools/power/cpupower/bench/
cpufreq-bench工具的工作原理是模拟系统运行时候的“空闲→忙→空闲→忙”场景,从而触发系统的动态频率变化,然后在使用ondemand、conservative、interactive等策略的情况下,计算在做与performance高频模式下同样的运算完成任务的时间比例。
⼀般的⽬标是在采⽤CPUFreq动态调整频率和电压后,性能应该 为performance这个性能策略下的90%左右,这样才⽐较理想。
《Linux设备驱动开发详解:基于最新的Linux4.0内核》 19.2.3 CPUFreq的性能测试和调优
6.4 cpupower frequency-info|frequency-set
cpupower frequency-info
A small tool which prints out cpufreq information helpful to developers and interested users.
cpupower frequency-set
cpupower frequency-set allows you to modify cpufreq settings without having to type e.g. "/sys/devices/system/cpu/cpu0/cpufreq/scaling_set_speed" all the time.
6.5 cpufreq-info和cpufreq-set
cpufreq-info
A small tool which prints out cpufreq information helpful to developers and interested users.
cpufreq-set
cpufreq-set allows you to modify cpufreq settings without having to type e.g. "/sys/devices/system/cpu/cpu0/cpufreq/scaling_set_speed" all the time.