Monday, September 17, 2007

Partitioning in MySQL 5.1, Part 2

Read Partitioning In MySQL, Part 1 before reading this post.

partition pruning Partition pruning is the most critical performance component about partitioning. Partitioning can ease management of your data (for example the ability to just backup or restore a single partition or the ability to drop a no longer needed partition) but the main reason people add partitions is for performance.

Pruning helps performance by only reading the partitions that need to be read for a query. For example, if you have a table range partitioned by year that has partitions for the years 2000 through 2010. If you write a query selecting data between 2002 and 2005, you don't want to have to scan all 10 partitions.

The optimizer needs to be smart enough to select and scan just those partitions that hold the data. This is called partition pruning. All of the databases that I have used that support partitioning (Oracle, Postgres, EnterpriseDB and DB2) support partition pruning. The better the optimizer, the less often the optimizer will erroneously scan partitions that aren't really required.

partition management As I said above, partition management is important and this is one area that postgres (and enterprisedb) lacks. The management and support functions in MySQL are pretty robust. The way it will dynamically move data when a partition is dropped or added is pretty sweet. This is very advanced logic. I will need to play with it to see how robust and performant it is but in most databases, moving data like this requires manual intervention.

I also like the ability to convert a table to a partitioned table online. That is another feature that requires manual intervention in most databases.

partitioning limitations The last slide I will cover from the presentation are the limitations. Right now there are several severe limitations.

Most of the limitations listed here are no big deal. All partitions must use the same storage engine (no biggie) and increased response time for partitioned tables with many partitions (expected). The time you save by partition pruning should outweigh the overhead by a large margin.

The things that bother me are the integer result requirement. I'm just not sure how I would implement range partitioning using a numeric. That really sounds like a list partition to me.

The lack of foreign keys in a partition is also a limitation of postgres and is a show stopper in many cases. If you can't ensure the integrity of your database, do you really need that data? I'm glad the fix for that is on the road map.

What do you think of partitioning in MySQL?

1 comment:

jbalint said...

good 2-part series on partitioning, thanks!