Understanding InnoDB clustered indexes

本文详细介绍了MySQL中InnoDB存储引擎如何使用聚簇索引来组织数据,并探讨了其优势与劣势。此外,还解释了如何选择聚簇索引以及二级索引的工作原理。

开发板推荐:天空星STM32F407VET6开发板

超高性价比 STM32主控 | 超高主频 | 一板兼容百芯 | 比赛神器 | 沉金彩色丝印

Some people don’t probably know, but there is a difference between how indexes work in MyISAM and how they work in InnoDB, particularly when talking from the point of view of performance enhancement. Now since, InnoDB is starting to be widely used, it is important we understand how indexing works in InnoDB. Hence, the reason for this post!

The first and foremost thing to know is that InnoDB uses clustered index to store data in the table. Now what does clustered index mean?

Clustered Index

A clustered index determines the physical order of data in a table. When thinking of a clustered index think of a telephone directory, where data is physically arranged by the last name. Because the clustered index decides the physical storage order of the data in the table, a table can only have a single clustered index. But, a clustered index can comprise of multiple columns (a composite index), in the same way as a telephone directory is organized both by the first name and the last name.

Clustered Index with respect to InnoDB

InnoDB stores indexes as B+tree data structures, and same is the case with the clustered index. But the difference is that in the case of clustered index InnoDB actually stores the index and the rows together in the same structure. When a table has a clustered index, its rows are actually stored in the index’s leaf pages. Thus InnoDB tables can also be called index-organized tables.

Now lets consider how InnoDB decides which index to use as the clustered index!

How InnoDB selects a clustered index?

With InnoDB, typically PRIMARY KEY is synonymous with clustered index, but what if a PRIMARY KEY does not exist or there is not even a single index defined on the table. Then following is how InnoDB decides what to use as the clustered index:

  • If there is a PRIMARY KEY defined on the table, InnoDB uses it as the clustered index.
  • If there is no PRIMARY KEY defined on the table, InnoDB uses the first UNIQUE index where all the key columns are NOT NULL as the clustered index.
  • If there is no PRIMARY KEY or no suitable UNIQUE index present, InnoDB internally generates a hidden PRIMARY KEY and then uses this hidden key as the clustered index. This hidden PRIMARY KEY is a 6-byte field that increases monotonically as new rows are inserted.

Hence, my advice is that always define a PRIMARY KEY for each table that you create. If there is no logical key that can be created, add a new auto-increment column, and use it as the PRIMARY KEY.

Did you know that Secondary Index is related to the Primary Key?

In InnoDB, every SECONDARY INDEX contains the PRIMARY KEY column(s) together with the column(s) of the secondary index, automatically. That is because of the way InnoDB stores data, remember what I just told you when talking about how data is stored, a leaf node doesn’t store any pointer to the row’s physical location, but in fact stores the row’s data. So in other words the PRIMARY KEY is actually the pointer to the row data.

This makes us conclude on another interesting conclusion..

A secondary index requires two lookups! First a lookup for the secondary index itself, then a lookup for the primary key.

Advantages of clustering

Clustering provided by InnoDB has very significant performance benefits, some of which are mentioned below:

  • Because the data is physically stored according to the PRIMARY KEY, data lookups by PRIMARY KEY is very fast. For example, the fastest way to find a particular employee using the unique employee_id column is to create a PRIMARY KEY on the employee_id column.
  • With clustering, search for ranges can be extremely efficient. Suppose an application frequently searches records between a range of dates, a clustered index can quickly locate the row containing the beginning date, and then retrieve all adjacent rows in the table until the last date is reached. Thus improving the performance of range queries.
  • Another positive impact of clustering is on the performance of sorting data. Suppose there is a column that is used frequently to sort the data retrieved from a table, it can be advantageous to cluster the table on that column to save the cost of a sort each time the column is queried.
  • Also because clustered index holds both the index and the data together in one B-Tree, so retrieving rows from a clustered index is normally faster than a comparable lookup in a nonclustered index.
  • Secondary indexes can act as covering indexes, when the data that is requested include the primary key columns, because of the fact that secondary indexes automatically include primary key columns.

These benefits that I have mentioned can boost performance drastically, if you design your tables and queries accordingly. But clustered indexes have disadvantages as well.

Disadvantages of clustering

Following are some of the disadvantages of clustering:

  • If a large clustered index is defined, any secondary indexes that are defined on the same table will be significantly larger because the secondary indexes contain the clustering key.
  • Because of the way how the data is stored, secondary indexes require two lookups.
  • Clustered index can be expensive for columns that undergo frequent changes because it forces InnoDB to move each updated row to a new location.
  • Insertions can be slow, if the data is not inserted in PRIMARY KEY order, hence we can conclude that insert speeds depend heavily on insertion order. Inserting rows in primary key order is the fastest way to load data into an InnoDB table.
UPDATE (THANKS TO SUNNY):

Following is another thing that one should know regarding secondary indexes:

The records in InnoDB secondary are never updated in place. Therefore, what that means is that an UPDATE of a secondary index column means deleting the old record and inserting a new one.

Although, I did point out some disadvantages, but the fact is that these disadvantages can not be weighted down by the tremendous amount of benefits that comes with clustering in InnoDB. If you study and understand the aspects that I have mentioned in this article and apply them accordingly, you are going to see great performance enhancements. After all, clustering is another important step in bringing MySQL closer to MSSQL and Oracle.

Ref: http://www.ovaistariq.net/521/understanding-innodb-clustered-indexes/

开发板推荐:天空星STM32F407VET6开发板

超高性价比 STM32主控 | 超高主频 | 一板兼容百芯 | 比赛神器 | 沉金彩色丝印

内容概要:本文档系统性地介绍了2024年最新提出的两种智能优化算法——青蒿素优化算法与霜冰优化算法(RIME)的原理、实现方法及其性能对比分析,并提供了完整的Matlab代码实现。文档不仅聚焦于核心算法的仿真与验证,还整合了大量前沿科研资源,涵盖微电网优化、风电功率预测、无人机三维路径规划、电动汽车调度、图像融合、负荷预测、通信信号处理、电力系统故障恢复等多个高价值应用场景。所有案例均基于Matlab/Simulink平台进行建模与仿真,强调算法在复杂工程系统中的实际应用能力,旨在为科研人员提供一套从理论到代码再到应用的完整复现体系。; 适合人群:具备一定编程基础和科研背景的研究生、高校教师及工程技术人员,尤其适合从事智能优化算法研究、新能源系统优化、自动化控制、电力系统调度、无人机导航与路径规划等相关领域的研究人员。; 使用场景及目标:①用于高水平学术论文的复现与创新性研究,提升科研效率与成果产出;②应用于复杂工程系统的建模仿真与智能优化设计,如多能互补系统调度、无人机避障路径规划、微电网能量管理等;③作为智能优化算法的教学与学习资料,深入理解现代元启发式算法的设计思想与实现机制。; 阅读建议:建议读者结合文档中提供的Matlab代码与Simulink仿真模型,按照目录结构循序渐进地学习与实践,优先选择与自身研究方向契合的案例进行代码复现,重点关注算法参数设置、收敛曲线分析与多算法对比实验部分,以全面提升算法应用与科研创新能力。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值