MYSQL之慢查询分析(Analysis of Slow MySQL Query)
MySQL之慢查询分析
阅读本文可以了解关于MYSQL慢查询分析的两个工具:mysqldumpslow 和 pt-query-digest,掌握这两个工具的使用有助于帮助我们诊断和优化数据库性能问题。
01
概述
-
mysqldumpslow 是 MySQL 自带的一个用于分析慢查询日志的工具;
-
pt-query-digest 是 Percona Toolkit 中的一个重要工具;
-
两者主要都用于分析 MySQL 数据库的慢查询日志,帮助DBA诊断执行效率低下的SQL,从而优化数据库性能。
02
用法和选项
-
mysqldumpslow 工具选项如下:
mysqldumpslow --help
Usage: mysqldumpslow [ OPTS... ] [ LOGS... ]
Parse and summarize the MySQL slow query log. Options are
--verbose verbose
--debug debug
--help write this text to standard output
-v verbose
-d debug
-s ORDER what to sort by (al, at, ar, c, l, r, t), 'at' is default
al: average lock time
ar: average rows sent
at: average query time
c: count
l: lock time
r: rows sent
t: query time
-r reverse the sort order (largest last instead of first)
-t NUM just show the top n queries
-a don't abstract all numbers to N and strings to 'S'
-n NUM abstract numbers with at least n digits within names
-g PATTERN grep: only consider stmts that include this string
-h HOSTNAME hostname of db server for *-slow.log filename (can be wildcard),
default is '*', i.e. match all
-i NAME name of server instance (if using mysql.server startup script)
-l don't subtract lock time from total time
-
pt-query-digest 工具的选项如下:
pt-query-digest --help
pt-query-digest analyzes MySQL queries from slow, general, and binary log files.
It can also analyze queries from C<SHOW PROCESSLIST> and MySQL protocol data
from tcpdump. By default, queries are grouped by fingerprint and reported in
descending order of query time (i.e. the slowest queries first). If no C<FILES>
are given, the tool reads C<STDIN>. The optional C<DSN> is used for certain
options like L<"--since"> and L<"--until">. For more details, please use the
--help option, or try 'perldoc
/usr/local/percona-toolkit-3.5.4/bin/pt-query-digest' for complete
documentation.
Usage: pt-query-digest [OPTIONS] [FILES] [DSN]
Options:
--ask-pass Prompt for a password when connecting to MySQL
--attribute-aliases=a List of attribute|alias, etc (default db|Schema)
--attribute-value-limit=i A sanity limit for attribute values (default 0)
--charset=s -A Default character set
--config=A Read this comma-separated list of config files;
if specified, this must be the first option on
the command line
--[no]continue-on-error Continue parsing even if there is an error (
default yes)
--[no]create-history-table Create the --history table if it does not exist (
default yes)
--[no]create-review-table Create the --review table if it does not exist (
default yes)
--daemonize Fork to the background and detach from the shell
--database=s -D Connect to this database
--defaults-file=s -F Only read mysql options from the given file
--embedded-attributes=a Two Perl regex patterns to capture pseudo-
attributes embedded in queries
--expected-range=a Explain items when there are more or fewer than
expected (default 5,10)
--explain=d Run EXPLAIN for the sample query with this DSN
and print results
--filter=s Discard events for which this Perl code doesn't
return true
--group-by=A Which attribute of the events to group by (
default fingerprint)
--help Show help and exit
--history=d Save metrics for each query class in the given
table. pt-query-digest saves query metrics (
query time, lock time, etc.) to this table so
you can see how query classes change over time
--host=s -h Connect to host
--ignore-attributes=a Do not aggregate these attributes (default arg,
cmd, insert_id, ip, port, Thread_id, timestamp,
exptime, flags, key, res, val, server_id,
offset, end_log_pos, Xid)
--inherit-attributes=a If missing, inherit these attributes from the
last event that had them (default db,ts)
--interval=f How frequently to poll the processlist, in
seconds (default .1)
--iterations=i How many times to iterate through the collect-
and-report cycle (default 1)
--limit=A Limit output to the given percentage or count (
default 95%:20)
--log=s Print all output to this file when daemonized
--max-hostname-length=i Trim host names in reports to this length. 0=Do
not trim host names (default 10)
--max-line-length=i Trim lines to this length. 0=Do not trim lines (
default 74)
--order-by=A Sort events by this attribute and aggregate
function (default Query_time:sum)
--outliers=a Report outliers by attribute:percentile:count (
default Query_time:1:10)
--output=s How to format and print the query analysis
results (default report)
--password=s -p Password to use when connecting
--pid=s Create the given PID file
--port=i -P Port number to use for connection
--preserve-embedded-numbers Preserve numbers in database/table names when
fingerprinting queries
--processlist=d Poll this DSN's processlist for queries, with --
interval sleep between
--progress=a Print progress reports to STDERR (default time,
30)
--read-timeout=m Wait this long for an event from the input; 0 to
wait forever (default 0). Optional suffix s=
seconds, m=minutes, h=hours, d=days; if no
suffix, s is used.
--[no]report Print query analysis reports for each --group-by
attribute (default yes)
--report-all Report all queries, even ones that have been
reviewed
--report-format=A Print these sections of the query analysis
report (default rusage,date,hostname,files,
header,profile,query_report,prepared)
--report-histogram=s Chart the distribution of this attribute's
values (default Query_time)
--resume=s If specified, the tool writes the last file
offset, if there is one, to the given filename
--review=d Save query classes for later review, and don't
report already reviewed classes
--run-time=m How long to run for each --iterations. Optional
suffix s=seconds, m=minutes, h=hours, d=days; if
no suffix, s is used.
--run-time-mode=s Set what the value of --run-time operates on (
default clock)
--sample=i Filter out all but the first N occurrences of
each query
--set-vars=A Set the MySQL variables in this comma-separated
list of variable=value pairs
--show-all=H Show all values for these attributes
--since=s Parse only queries newer than this value (parse
queries since this date)
--slave-password=s Sets the password to be used to connect to the
slaves
--slave-user=s Sets the user to be used to connect to the slaves
--socket=s -S Socket file to use for connection
--timeline Show a timeline of events
--type=A The type of input to parse (default slowlog)
--until=s Parse only queries older than this value (parse
queries until this date)
--user=s -u User for login if not current user
--variations=A Report the number of variations in these
attributes' values
--version Show version and exit
--[no]version-check Check for the latest version of Percona Toolkit,
MySQL, and other programs (default yes)
--[no]vertical-format Output a trailing "\G" in the reported SQL
queries (default yes)
--watch-server=s This option tells pt-query-digest which server
IP address and port (like "10.0.0.1:3306") to
watch when parsing tcpdump (for --type tcpdump);
all other servers are ignored
Option types: s=string, i=integer, f=float, h/H/a/A=comma-separated list, d=DSN, z=size, m=time
Rules:
This tool accepts additional command-line arguments. Refer to the SYNOPSIS and usage information for details.
DSN syntax is key=value[,key=value...] Allowable DSN keys:
KEY COPY MEANING
=== ==== =============================================
A yes Default character set
D yes Default database to use when connecting to MySQL
F yes Only read default options from the given file
P yes Port number to use for connection
S yes Socket file to use for connection
h yes Connect to host
p yes Password to use when connecting
t no The --review or --history table
u yes User for login if not current user
If the DSN is a bareword, the word is treated as the 'h' key.
Options and values after processing arguments:
--ask-pass FALSE
--attribute-aliases db|Schema
--attribute-value-limit 0
--charset (No value)
--config /etc/percona-toolkit/percona-toolkit.conf,/etc/percona-toolkit/pt-query-digest.conf,/home/mysql/.percona-toolkit.conf,/home/mysql/.pt-query-digest.conf
--continue-on-error TRUE
--create-history-table TRUE
--create-review-table TRUE
--daemonize FALSE
--database (No value)
--defaults-file (No value)
--embedded-attributes (No value)
--expected-range 5,10
--explain (No value)
--filter (No value)
--group-by fingerprint
--help TRUE
--history (No value)
--host (No value)
--ignore-attributes arg,cmd,insert_id,ip,port,Thread_id,timestamp,exptime,flags,key,res,val,server_id,offset,end_log_pos,Xid
--inherit-attributes db,ts
--interval .1
--iterations 1
--limit 95%:20
--log (No value)
--max-hostname-length 10
--max-line-length 74
--order-by Query_time:sum
--outliers Query_time:1:10
--output report
--password (No value)
--pid (No value)
--port (No value)
--preserve-embedded-numbers FALSE
--processlist (No value)
--progress time,30
--read-timeout 0
--report TRUE
--report-all FALSE
--report-format rusage,date,hostname,files,header,profile,query_report,prepared
--report-histogram Query_time
--resume (No value)
--review (No value)
--run-time (No value)
--run-time-mode clock
--sample (No value)
--set-vars
--show-all
--since (No value)
--slave-password (No value)
--slave-user (No value)
--socket (No value)
--timeline FALSE
--type slowlog
--until (No value)
--user (No value)
--variations
--version FALSE
--version-check TRUE
--vertical-format TRUE
--watch-server (No value)
03
常用技巧
-
mysqldump 常用技巧如下:
-- 预处理慢查询日志,输出指定时间段的日志
sed -n '/# Time: 241011 15:44:.../,/# Time: 241011 17:30:.../'p /data/mysql/slow.log
-- 输出记录次数最多的10条sql语句
mysqldumpslow -s -c -t 10 /data/mysql/slow.log
-- 得到返回记录集最多的10条sql语句
mysqldumpslow -s -r -t 10 /data/mysql/slow.log
-- 得到按照时间排序的前10条里边含有左连接的查询语句
mysqldumpslow -s t -t 10 -g "left join" /data/mysql/slow.log
参数解析:
-s参数:表示按照何种方式排序
c、t、l、r 按照记录次数、时间、查询时间、返回的记录数来排序
-t参数:即为返回前面多少条数据
-g参数:可以写正则匹配模式,大小写不敏感
-a参数:说明不合并类似的sql语句,显示具体的sql语句中的数字和字符串
-- 输出结果会使用N和S代替SQL中出现的数字和字符串。
-
pt-query-digest 常用技巧如下:
-- 直接分析慢日志文件
pt-query-digest [参数] /mysql/log/slow.log > slow_report.log
-- 分析最近12h内的查询
pt-query-digest --since=12h /mysql/log/slow.log > slow_report.log
-- 分析指定时间范围内的查询
pt-query-digest --since '2024-10-01 10:00:00' --until '2024-10-02 10:00:00' /mysql/log/slow.log > slow_report.log
-- 分析只含有select语句的慢查询
pt-query-digest --filter '($event->{fingerprint} || "") =~ m/^select/i' mysql-slow.log > slow_report.log
-- 针对某个用户的慢查询
pt-query-digest --filter '($event->{user} || "") =~ m/^root/i' mysql-slow.log > slow_report.log
-- 查询所有的全表扫描或full join慢查询
pt-query-digest --filter '((($event->{Full_scan} || "") eq "yes") || ($event->{Full_join} || "") eq "yes")' mysql-slow.log > slow_report.log
-- 把查询结果保存到slowlog_review表
pt-query-digest --user=root --password=root --review h=localhost,D=slowlog,t=slowlog_review --create-review-table /mysql/log/slow.log
-- 通过tcpdump抓取mysql的tcp协议数据后再用工具分析
tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt
pt-query-digest --type tcpdump mysql.tcp.txt > slow_report.log
-- 分析binlog
mysqlbinlog mysql-bin.000001 > mysql-bin000001.sql
pt-query-digest --type=binlog mysql-bin000001.sql > slow_report.log
-- 分析general.log
pt-query-digest --type=genlog general.log > slow_report.log
04
报告解读
-
mysqldumpslow 示例解读如下:
mysqldumpslow -s -c -t 1 slow.txt
-- 执行返回结果如下
Reading mysql slow query log from slow.txt
Count: 1 Time=13.42s (13s) Lock=0.48s (0s) Rows=2.0 (2), root[root]@localhost
SELECT * FROM app.bigtable
Count:1 代表在解析的日志文件中总共出现了1次;
Time=13.42s (13s) 代表该查询累计总耗时13.42s,括号内的13s代表平均时间;
Lock=0.48s (0s) 代表这个查询在等待锁上花费了0.48秒的时间,括号内的0s表示平均等待锁的时间是0秒;
Rows=20000000.0 (20000000) 代表这个查询返回了20000000条数据,括号内的20000000表示平均每次查询返回的行数也是20000000;
root[root]@localhost 代表这个查询是由用户名为 root 的用户在 localhost 主机上执行的。
-
pt-query-digest 分三部分解读,示例如下:
-
第一部分:
-- 用户时间,系统时间,物理内存占用大小,虚拟内存占用大小
# 5.6s user time, 280ms system time, 37.48M rss, 122.02M vsz
-- 执行工具的时间
# Current date: Fri Oct 11 10:17:49 2024
-- 主机名
# Hostname: HOST01
-- 被分析的文件名字
# Files: /tmp/slow_statement-20241011.log
-- 语句总数量,唯一的语句数量,QPS,并发数
# Overall: 10.93k total, 76 unique, 1.02 QPS, 0.02x concurrency __________
-- 日志记录时间范围
# Time range: 2024-10-11T06:30:43 to 2024-10-11T09:29:59
-- 属性 总计 最小 最大 平均 95% 标准 中位数
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 245s 82us 30s 22ms 925us 511ms 301us
# Lock time 119ms 0 22ms 10us 4us 330us 1us
# Rows sent 10.67k 0 897 1.00 3.89 12.07 0.99
# Rows examine 13.84M 0 167.37k 1.30k 76.28 10.49k 76.28
# Query size 6.17M 42 34.24k 591.87 719.66 413.55 719.66
注意:
unique 唯一查询数量指的是对查询条件进行参数化以后,总共有多少个不同的查询;
95% 指的是所有值从小到大排序,位置位于95%的数
median 中位数指的是把所有值从小到大排序,位于中间的数
-
第二部分:
# Profile
# Rank Query ID Response time Calls R/Call V/M
# ==== =================================== ============== ===== ====== ===
# 1 0x64EF0EA126730002088884A136067321 211.0015 86.3% 180 1.1722 12.35
# 2 0xE00773AA9237B878155458CDB3CF7750 5.2254 2.1% 28 0.1866 0.00 UPDATE aa
# 3 0x3BD914AE329298E799F07AF97F53D5EA 5.1410 2.1% 21 0.2448 0.02 SELECT bb
# 4 0x95FAE319F439A358514176137A666198 3.1459 1.3% 7777 0.0004 0.00 SELECT cc
# 5 0x934ED5560D4E81AC2C4FCCB4573C899E 3.0805 1.3% 14 0.2200 0.00 SELECT dd
# 6 0xD2E066E9DDBCF5016F972A3A6896C311 2.3942 1.0% 1 2.3942 0.00 INSERT SELECT ee
# 7 0xD54F51F8D170B47CF1F0BD7262EE63A2 2.3062 0.9% 9 0.2562 0.00 UPDATE ff
# 8 0xF0862C304387BB4569628FA57B607BC2 1.7430 0.7% 9 0.1937 0.00 SELECT gg
# MISC 0xMISC 10.6013 4.3% 2893 0.0037 0.0 <68 ITEMS>
这部分对查询进行参数化并分组,然后对各类查询的执行情况进行分析,结果按总执行时长,从大到小排序。
response:总响应时间
time:该查询在本次分析中总的时间占比。
Calls:执行次数,即本次分析总共有多少条这种类型的查询语句。
R/Call:平均每次执行的响应时间。
V/M:响应时间Variance-to-mean的比率。
Item:查询对象。
-
第三部分:
列出第一个查询的详细统计结果,列出执行次数、最大、最小、平均、95%、标准、中位数的统计。
# Query 1: 0.02 QPS, 0.02x concurrency, ID 0x64EF0EA126730002088884A136067321 at byte 8739087
# Scores: V/M = 12.35
# Time range: 2024-10-11T06:30:43 to 2024-10-11T09:29:47
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 1 180
# Exec time 86 211s 9ms 30s 1s 8s 4s 30ms
# Lock time 31 37ms 34us 1ms 207us 881us 238us 131us
# Rows sent 1 165 0 1 0.92 0.99 0.27 0.99
# Rows examine 0 167 0 1 0.93 0.99 0.26 0.99
# Query size 0 10.55k 60 60 60 60 0 60
# String:
# Databases aadb... (158/87%), bbdb (11/6%)... 2 more
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms #
# 10ms ################################################################
# 100ms #####
# 1s #######
# 10s+ ##
throttle: 246 'index not used' warning(s) suppressed.\G
...
Exec time:表示查询的执行时间;
Lock time:表示查询在等待锁的时间。
Rows sent:表示查询返回的行数。
Rows examined:表示查询扫描的行数。
Rows affected:表示查询影响的行数。
Bytes sent:表示查询发送的字节数。
Query size:表示查询的大小。
Query_time distribution:查询时间的分布,可以看到这个SQL查询执行时间都是10秒以上。
Tables:该SQL查询涉及的表。
EXPLAIN:查询的SQL语句。
结合上述解析,我们可以解析上述这个案例:
QUERY1占所有查询响应时间的86.3%,平均响应时间要1s以上;
这个查询涉及到aadb和bbdb两个数据库,其中aadb占比较高;
QUERY1中提示throttle: 246 'index not used' warning(s) suppressed.\G,代表检测到246次未使用索引的记录;说明数据库中存在较多未使用索引的sql,建议为对应的业务sql创建合理的索引,以提高数据库性能。