看到这个题目你敢相信自己的眼睛吗?居然有人敢动祖传代码?没错,那个人就是我,而且这次不仅要动而且要调优(心中一万个无奈,实在是没办法)。不过这次调优其实也挺经典的,于是整理了一下发出来给各位品鉴一下,希望对各位有用。
本次调优的难点:
- 本次脚本太过雍长,不知道之前那位高人几乎将所有业务逻辑都写到SQL里面了;
- 据了解本次脚本已经经过3位高人之手调整过3次,只不过一直没有调好。后来得知脚本在“登录”和“非登录”时会出现两个分支处理,这是不恰当使用Mybatis动态脚本特性出来的锅;
首先,先看看再“非登录”状态下接口的响应时间,如下图:
如上图所示接口在“非登录”状态下耗时1.76秒。 需要说明一下的是,图片显示的是7.83秒是整个事务操作的响应结果(里面存在大量的实时统计与运算,当时并没有针对运算和代码逻辑的优化...其实说白了也不敢优化,因此整个事务耗时比较长),图片上说的接口与本次文章中说的接口并不是同一个接口,而有问题的接口经排查耗时为1.76秒,因此本文中的图片是为了直观看出性能结果截取的并不是对应接口真实的执行时间(其实就是一句“懒”,不想写log展示数据库执行时间了......) 。
言归正传,当登录后再查询时性能急剧下降,如下图:
问了最后一位修改的高人得知,他已经在Java层面优化过了,若不重构的情况下已经没有可以继续优化的地方了。所以这次调优主要将集中精力优化SQL查询,先看看登录后的查询语句。执行的SQL脚本如下:
SELECT *
FROM
(SELECT
p.procurement_id,
p.display_type,
p.publish_type,
p.valid_time,
p.pay_type,
p.cust_id,
p.add_user,
t.trade_name,
p.add_time,
p.oper_user,
p.oper_time,
p.platform_audit_status,
p.platform_back_reason,
p.platform_audit_user,
p.platform_audit_time,
p.status,
p.procurement_title,
p.alive_flag,
c.is_gsp,
c.is_gmp,
c.customer_service_user,
IFNULL(IF(p.display_type = 2, sui.CONTACT_NAME, fc.CONTACT_NAME), '暂无') AS CONTACT_NAME,
IFNULL(IF(p.display_type = 2, sui.CELLPHONE, fc.cell_phone), '暂无') AS cellphone,
IF(fc.SEX = 1, '先生', '女士') AS sex,
IF(INSTR(GROUP_CONCAT(t.TRADE_PUBLISH_STATE), '0') > 0, 0, 1) AS TRADE_PUBLISH_STATE,
IF(p.display_type = 2, '*******', c.CUST_NAME) AS CUST_NAME,
SUM(IF((SELECT
COUNT(0)
FROM
spot_procurement_details spd
WHERE
FIND_IN_SET(spd.trade_name_id, '35,65,124,1145,1168,255,288,81,')
AND spd.procurement_detail_id = pd.procurement_detail_id) > 0, 1, 0)) AS flag,
pn.status AS inviteStatus,
pn.invitation_id,
pn.send_time,
(SELECT IF(p.valid_time >= DATE_FORMAT(NOW(), '%Y-%m-%d'), 1, 2)) AS info_status,
IF(p.status = 1, 1, IF(p.status = 6, 1.5, 2)) AS proc_status,
p.top_type AS topType,
p.top_time AS topTime
FROM spot_procurement p
LEFT JOIN spot_procurement_invitation pn ON pn.procurement_id = p.procurement_id
LEFT JOIN spot_procurement_details pd ON pd.procurement_id = p.procurement_id
LEFT JOIN spot_trade_name t ON t.trade_name_id = pd.trade_name_id
LEFT JOIN spot_frequent_contacts fc ON p.cust_id = fc.CUST_ID AND fc.ALIVE_FLAG = 1 AND fc.IS_FREQUENT = 1
LEFT JOIN spot_company c ON c.cust_id = p.cust_id
LEFT JOIN spot_user_info sui ON c.CUSTOMER_SERVICE_USER = sui.USER_ID
WHERE p.platform_audit_status = 1 AND p.alive_flag = 1 AND p.status >= 1
AND (pd.is_split IS NULL OR pd.is_split != 'Y')
AND (pn.receive_cust_id = '100000000000365' OR p.publish_type = 2)
AND p.top_Type IN (1 , '3')
GROUP BY p.procurement_id UNION (SELECT
p.procurement_id,
p.display_type,
p.publish_type,
p.valid_time,
p.pay_type,
p.cust_id,
p.add_user,
t.trade_name,
p.add_time,
p.oper_user,
p.oper_time,
p.platform_audit_status,
p.platform_back_reason,
p.platform_audit_user,
p.platform_audit_time,
p.status,
p.procurement_title,
p.alive_flag,
c.is_gsp,
c.is_gmp,
c.customer_service_user,
IFNULL(IF(p.display_type = 2, sui.CONTACT_NAME, fc.CONTACT_NAME), '暂无') AS CONTACT_NAME,
IFNULL(IF(p.display_type = 2, sui.CELLPHONE, fc.cell_phone), '暂无') AS cellphone,
IF(fc.SEX = 1, '先生', '女士') AS sex,
IF(INSTR(GROUP_CONCAT(t.TRADE_PUBLISH_STATE), '0') > 0, 0, 1) AS TRADE_PUBLISH_STATE,
IF(p.display_type = 2, '*******', c.CUST_NAME) AS CUST_NAME,
SUM(IF((SELECT COUNT(0)
FROM spot_procurement_details spd
WHERE FIND_IN_SET(spd.trade_name_id, '35,65,124,1145,1168,255,288,81,')
AND spd.procurement_detail_id = pd.procurement_detail_id) > 0, 1, 0)) AS flag,
pn.status AS inviteStatus,
pn.invitation_id,
pn.send_time,
(SELECT IF(p.valid_time >= DATE_FORMAT(NOW(), '%Y-%m-%d'), 1, 2)) AS info_status,
IF(p.status = 1, 1, IF(p.status = 6, 1.5, 2)) AS proc_status,
p.top_type AS topType,
p.top_time AS topTime
FROM spot_procurement p
LEFT JOIN spot_procurement_invitation pn ON pn.procurement_id = p.procurement_id
LEFT JOIN spot_procurement_details pd ON pd.procurement_id = p.procurement_id
LEFT JOIN spot_trade_name t ON t.trade_name_id = pd.trade_name_id
LEFT JOIN spot_frequent_contacts fc ON p.cust_id = fc.CUST_ID AND fc.ALIVE_FLAG = 1 AND fc.IS_FREQUENT = 1
LEFT JOIN spot_company c ON c.cust_id = p.cust_id
LEFT JOIN spot_user_info sui ON c.CUSTOMER_SERVICE_USER = sui.USER_ID
WHERE
p.platform_audit_status = 1 AND p.alive_flag = 1 AND p.status >= 1
AND (pd.is_split IS NULL OR pd.is_split != 'Y')
AND (pn.receive_cust_id = '100000000000365' OR p.publish_type = 2)
GROUP BY p.procurement_id)) sss
WHERE sss.TRADE_PUBLISH_STATE = 1
ORDER BY sss.info_status ASC , sss.add_time DESC
LIMIT 0 , 10
这浅浅的107行脚本...通过拆解分析,发现脚本可以通过UNION关键字拆解成两部分,在此之前先在客户端直接运行看看执行效率,如下图:
分页返回10条数据,总耗时为2.29秒。
之后将嵌套查询的内部脚本拆解成两部分,每部分都通过explain分析执行结果,先看第一部分,如下图:
从上图中可以看出,除pn和pd两表的连接出现异常外,其他表的连接都比较正常,最起码它们都能够走到索引了(key和key_len说明了索引的名称和索引长度)。之后就看看pn和pd对应的Extra列提示什么,返回的内容是“Range checked for each record (index map: 0x2)”。
“Range checked for each record”在以前其他调优分享里也说过,当前表的连接字段虽然有一个possibile_key的字段,但是MySQL的执行分析器在执行期间由于“某种”原因没有使用到该索引(从上图也看到了,虽然pn,pd两表都有possibile_key但是key和key_len都是null的,证明他们都没有走索引)因此出现了Range checked的提示,表示连接中的每一条记录都需要进行检查。因此这个报错也是MySQL里面最慢的错误提示之一。
既然没有走索引那就要看看为什么没有走索引。pn、pd表的连接如下所示:
FROM spot_procurement p
LEFT JOIN spot_procurement_invitation pn ON pn.procurement_id = p.procurement_id
LEFT JOIN spot_procurement_details pd ON pd.procurement_id = p.procurement_id
其实两个表都是p这张表的右连接,而且都是通过procurement_id字段进行连接的,procurement_id字段是p这张表的主键,而pn、pd两张表procurement_id字段是他们的数据外键,本应该是不存在问题的。但是通过对比p、pn、pd这三张表得知,p表中procurement_id字段是bigint的数据类型,而pn、pd表中procurement_id数据类型是varchar类型,因此explain中不走索引的原因极有可能是因为数据类型不一致导致的**(又是数据类型不一致导致的性能问题)** 。
因为字段数据类型不一致,所以在on的时候需要将外表中的字段先隐式转型成内表字段对应的数据类型后再做关联,在这个过程中其实跟下面的语句是等价的:
FROM spot_procurement p
LEFT JOIN spot_procurement_invitation pn ON CAST(pn.procurement_id AS UNSIGNED integer) = p.procurement_id
LEFT JOIN spot_procurement_details pd ON CAST(pd.procurement_id AS UNSIGNED integer) = p.procurement_id
在这里看出了其他问题,pn、pd作为外联表放在=的前面,而外表字段又要使用CAST函数对字段进行类型转换,因此该字段不走索引。
因此,在不改变原有逻辑的情况下修改成如下:
SELECT
p.procurement_id,
p.display_type,
p.publish_type,
p.valid_time,
p.pay_type,
p.cust_id,
p.add_user,
t.trade_name,
p.add_time,
p.oper_user,
p.oper_time,
p.platform_audit_status,
p.platform_back_reason,
p.platform_audit_user,
p.platform_audit_time,
p.status,
p.procurement_title,
p.alive_flag,
c.is_gsp,
c.is_gmp,
c.customer_service_user,
IFNULL(IF(p.display_type = 2, sui.CONTACT_NAME, fc.CONTACT_NAME), '暂无') AS CONTACT_NAME,
IFNULL(IF(p.display_type = 2, sui.CELLPHONE, fc.cell_phone), '暂无') AS cellphone,
IF(fc.SEX = 1, '先生', '女士') AS sex,
IF(INSTR(GROUP_CONCAT(t.TRADE_PUBLISH_STATE), '0') > 0, 0, 1) AS TRADE_PUBLISH_STATE,
IF(p.display_type = 2, '*******', c.CUST_NAME) AS CUST_NAME,
SUM(IF((SELECT COUNT(0)
FROM spot_procurement_details spd
WHERE FIND_IN_SET(spd.trade_name_id, '35,65,124,1145,1168,255,288,81')
AND spd.procurement_detail_id = pd.procurement_detail_id) > 0, 1, 0)) AS flag,
pn.status AS inviteStatus,
pn.invitation_id,
pn.send_time,
(SELECT IF(p.valid_time >= DATE_FORMAT(NOW(), '%Y-%m-%d'), 1, 2)) AS info_status,
IF(p.status = 1, 1, IF(p.status = 6, 1.5, 2)) AS proc_status,
p.top_type AS topType,
p.top_time AS topTime
FROM spot_procurement p
LEFT JOIN
(select a.receive_cust_id,a.status,a.invitation_id,a.send_time, CAST(a.procurement_id AS UNSIGNED integer) as procurement_id from spot_procurement_invitation a) pn ON pn.procurement_id = p.procurement_id
LEFT JOIN
(select b.procurement_detail_id,CAST(b.procurement_id AS UNSIGNED integer) as procurement_id,b.trade_name_id,b.is_split from spot_procurement_details b ) pd ON pd.procurement_id = p.procurement_id
LEFT JOIN spot_trade_name t ON t.trade_name_id = pd.trade_name_id
LEFT JOIN spot_frequent_contacts fc ON p.cust_id = fc.CUST_ID AND fc.ALIVE_FLAG = 1 AND fc.IS_FREQUENT = 1
LEFT JOIN spot_company c ON c.cust_id = p.cust_id
LEFT JOIN spot_user_info sui ON c.CUSTOMER_SERVICE_USER = sui.USER_ID
WHERE p.platform_audit_status = 1 AND (pd.is_split IS NULL OR pd.is_split != 'Y')
AND p.alive_flag = 1 AND p.status >= 1
AND (pn.receive_cust_id = '100000000000365' OR p.publish_type = 2)
AND p.top_Type IN (1 , '3')
GROUP BY p.procurement_id
这里先将需要转类型的字段做显式转换,然后再做join连接,通过explain后得出执行计划如下:
在外联的时候使用了auto_key1带代替了原来的null了,而a和b两个表由于只是转义用因此是全表扫描的。但是留意Extra列中已经不存在Range checked的提示了。
接下来再看看第二部分的语句,经过对比与第一部分的语句基本相似,因此可以使用同样的优化手段进行sql的优化,优化后的整体explain执行计划如下图:
如上图所示暂时没有发现其他特殊的情况,接下来就直接运行看看查询效果,如下图:
在修改了sql之后再去验证一下接口的加载速度,如下图:
在账号登录的状态下接口从5.42秒提升到0.82秒,执行效率提升了81.5%。