实际工作中用到的MySQL细节

原创已于 2024-11-15 21:33:24 修改 · 619 阅读

3 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#mysql

于 2024-07-10 13:56:23 首次发布

MySQL 专栏收录该内容

2 篇文章

订阅专栏

文章目录

1. last_insert_id()函数特点
2. 子查询如何先排序再分组
3. 复合索引比分开的索引更高效

1. last_insert_id()函数特点

先看官方文档的解释

LAST_INSERT_ID() returns a BIGINT UNSIGNED (64-bit) value representing the first automatically generated value successfully inserted for an AUTO_INCREMENT column as a result of the most recently executed INSERT statement. The value of LAST_INSERT_ID() remains unchanged if no rows are successfully inserted.

LAST_INSERT_ID()返回一个无符号的BIGINT类型值，表示对一个自增的列插入数据时第一个自动生成的值，受最近执行的一条插入语句的影响。如果没有新行成功插入，函数的返回值不变。

以下为各种情况下的实际测试：

先创建三个测试表。

create table last_insert1(
    id int auto_increment primary key,
    v int default 0
) comment '主键为自增';

create table last_insert2(
    id int auto_increment,
    v int default 0,
    index (id)
) comment '无主键，某一列自增';

create table last_insert3(
    id int primary key ,
    v int auto_increment,
    index(v)
) comment '有主键，非主键自增';

一个表仅可有一个自增列，且该自增列必须是索引。

1. 什么也不做直接查询

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
|                0 |
+------------------+
1 row in set (0.00 sec)

对于新创建的连接，初次查询结果总为0。
(mysql客户端输出太占地方了，除了第一条，后续都简化了。)

2. 先插入非自增数据后查询

mysql> insert into last_insert1(v) values(3);
mysql> select last_insert_id();
# 1

mysql> insert into last_insert2(v) values(3);
mysql> select last_insert_id();
# 1

mysql> insert into last_insert3(id) values(3);
mysql> select last_insert_id();
# 1

自增列初始产生的值就是1，因此空表中首次插入后，last_insert_id()输出都是1。

3. 人为给定自增值插入后查询

mysql> insert into last_insert1(id, v) values(2,4);
mysql> select last_insert_id();
# 1

mysql> insert into last_insert2(id, v) values(3,4);
mysql> select last_insert_id();
# 1

mysql> insert into last_insert3(id, v) values(4,4);
mysql> select last_insert_id();
# 1

人为给定自增值插入后，输出的是上一个自动产生的自增值。

4. 多行插入查询

mysql> insert into last_insert1(v) values(4), (5), (6);
mysql> select last_insert_id();
# 3

mysql> insert into last_insert2(v) values(4), (5), (6);
mysql> select last_insert_id();
# 4

mysql> insert into last_insert3(id) values(5), (6), (7);
mysql> select last_insert_id();
# 5

多行插入时，输出插入的第一行产生的自增值。

5. 跨连接的测试

想看下针对同一个表，不同连接对它的插入之间会否互相影响。
上述的4种测试结果，三个表没区别，所以后续测试只在第一个表进行。

# 连接1先运行
mysql> insert into last_insert1(v) values(7);
mysql> select last_insert_id();
# 6

# 连接2插入
mysql> insert into last_insert1(v) values(8);
mysql> select last_insert_id();
# 7

# 连接1再查询
mysql> select last_insert_id();
# 6

连接之间是相互独立的，函数last_insert_id只会输出由当前连接执行插入产生的自增值。

2. 子查询如何先排序再分组

问题描述

有表如下，要找到 family_name 为 Green 和 Smith 的身高最高的两个人的 given_name。

+----+-------------+------------+--------+
| id | family_name | given_name | height |
+----+-------------+------------+--------+
|  1 | Smith       | Mike       |    157 |
|  2 | Smith       | Ashlin     |    165 |
|  3 | Smith       | Eason      |    181 |
|  4 | Smith       | Kaley      |    175 |
|  5 | Green       | Jack       |    164 |
|  6 | Green       | Kate       |    179 |
|  7 | Green       | Kathryn    |    168 |
|  8 | Green       | Waller     |    187 |
|  9 | Brown       | Mary       |    178 |
| 10 | Brown       | Lana       |    173 |
| 11 | Brown       | Lange      |    175 |
| 12 | Brown       | Randy      |    177 |
+----+-------------+------------+--------+

错误处理

首先想到的方式就是在子查询里先按身高排序，再在外面用 family_name 分组：

select given_name from (
	SELECT * from height 
	where family_name in ('Green','Smith') 
	order by height desc 
) b 
group by family_name;

结果外层的分组并未按照子查询内的顺序进行。

+------------+
| given_name |
+------------+
| Jack       |
| Mike       |
+------------+

这样的查询效果和直接查结果一样。

select given_name from height 
where family_name in ('Green','Smith') 
group by family_name
order by height desc

两句 sql explain 的结果也一样。

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	height	NULL	ALL	NULL	NULL	NULL	NULL	12	20.00	Using where; Using temporary; Using filesort

正解和原因

在上面的子查询 sql 中使用 group by，就能达到预期结果。

select given_name from (
	SELECT * from height 
	where family_name in ('Green','Smith') 
	group by id
	order by height desc 
) b 
group by family_name;

+------------+
| given_name |
+------------+
| Waller     |
| Eason      |
+------------+

再看该语句的 explain：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<derived2>	NULL	ALL	NULL	NULL	NULL	NULL	2	100.00	Using temporary; Using filesort
2	DERIVED	height	NULL	ALL	PRIMARY	NULL	NULL	NULL	12	20.00	Using where; Using filesort

出现上述结果的原因：mysql 5.7 版本对子查询进行了优化，子查询会被改为联结查询的形式以节省时间和资源，除非子查询内包含union 、group by、distinct、limit 或使用了聚合函数。

3. 复合索引比分开的索引更高效

情景

某个数据表需要用到多个列进行查询。不确定是用复合索引还是分别建索引较好。测试一下。
测试数据是视频弹幕数据，共291万条。

按两种索引建表

表1 复合主键

create table danmu_test1(
vid varchar(60) not null comment '视频id',
danmu_id bigint not null comment '弹幕id',
pub_time int null comment '发弹幕的现实时间, 10位时间戳',
time_offset int null comment '弹幕在视频中出现的时间, 单位为秒',
content varchar(100) null comment '弹幕内容',
up_count int null comment '弹幕的赞数',
create_time timestamp default CURRENT_TIMESTAMP null comment '数据入库时间',
primary key (vid, danmu_id)
)comment '复合索引';

表2 主键+索引

create table danmu_test2(
vid varchar(60) not null comment '视频id',
danmu_id bigint not null comment '弹幕id' primary key,
pub_time int null comment '发弹幕的现实时间, 10位时间戳',
time_offset int null comment '弹幕在视频中出现的时间, 单位为秒',
content varchar(100) null comment '弹幕内容',
up_count int null comment '弹幕的赞数',
create_time timestamp default CURRENT_TIMESTAMP null comment '数据入库时间',
index (vid)
)comment '主键+列索引';

测试

# 插入测试数据
insert into danmu_test1 select * from danmu;
# 全量计数
select count(1) from danmu_test1;
# 分组计数
select vid, count(1) from danmu_test1 group by vid;
# 非主键字段查询
select content from danmu_test1 where vid="取某个vid";