Monday, August 4, 2008

High Performance MySQL: Review

High Performance MySQL, Second Edition
Optimization, Backups, Replication, and More

By Baron Schwartz , Peter Zaitsev , Vadim Tkachenko , Jeremy Zawodny , Arjen Lentz , Derek J. Balling
Second Edition June 2008
Pages: 708
ISBN 10: 0-596-10171-6 | ISBN 13: 9780596101718

When I first read about this book, I figured many sections would be over my head. I was pleasantly surprised when I started reading it. In the Preface, the authors say (and I partially paraphrase for brevity):

"We wanted a book that wasn't just a SQL primer. We wanted a book with a title that didn't start or end in some arbitrary time frame and didn't talk down to the reader. Most of all, we wanted a book that would help you take your skills to the next level and build fast, reliable systems with MySQL."

"We decided to write a book that focused not just on the needs of the MySQL application developer but also on the rigorous demands of the MySQL administrator, who needs to keep the system up and running no matter what the programmers or users may throw at the server."

They are trying to write the "mythical, perfect book". That is a tall order. In many ways though, the authors accomplish what they set out to do. They may have accomplished even more than they intended to. While there is plenty of high performance here, the book goes a bit further than that. I'm not complaining but the title may put off some users who could really benefit from this book.

This is not the book if you are trying to learn about databases in general. The book assumes that you have at least some hands on experience in your background and some familiarity with MySQL. As the authors say, again in the Preface, "We assume you are already relatively experienced with MySQL and, ideally, have read an introductory book on it".

With that in mind, I'll begin the review.

Chapter 1: MySQL Architecture

Chapter 1 is an overview of the MySQL architecture. The chapter doesn't get very deep into MySQL internals (that's not the books focus) but this chapter provides an excellent fast track understanding of how MySQL works at a fairly detailed level. This chapter covers locking, transactions and the storage engine concept as well as details about each individual storage engine.

There is some explanatory content (i.e. what is a lock, what is ACID, what is a deadlock, etc) but most of the content concentrates on MySQL specifically. In a couple of places, the explanatory content and MySQL specifics were not in the same part of the text. For example, on page 8, database isolation levels are defined but it's not until page 11 in a section on autocommit that I finally read, "MySQL recognizes all four ANSI standard isolation levels, and InnoDB supports all of them....." There are a couple of other places where the specifics are oddly separate from the MySQL details. It's a minor nit though.

A particular eye opener for me was the discussion on MVCC in MySQL. If you ask most Oracle people (who are not MySQL also), they will almost all say that MySQL does not do MVCC. The book provides a nicely detailed example of how MVCC works in InnoDB.

The chapter ends with a discussion on storage engines. The book gives a paragraph or two about each available engine (including Maria and Falcon) and a table summarizing the differences between engines. I wonder if anyone is working on a columnar store for MySQL?

Chapter 2: Benchmarking and Profiling

The beginning of chapter 2 can apply to any database (and really most any application). It helps define what to benchmark, how to benchmark, how not to benchmark, etc. The benchmarking section ends with a list of useful tools for benchmarking an application and with a set of MySQL benchmarking utilities. There are examples using http_load, MySQL's Benchmark(), dbt2 and the MySQL benchmark suite of perl scripts. The examples are a mini how-to and results explanation all in one.

The second half of the chapter details profiling. There is some generic profiling discussion but most of the text covers MySQL specifics. I don't want to show my lack of knowledge, but I had never even heard of the "slow log" until I read this chapter. The authors recommend enabling it but to a DBA it has a very scary name. ;-)

This chapter explains what you should be looking for when profiling. This isn't really any different than profiling an Oracle or Postgres database. You want to start with the low hanging fruit and work your way up the tree. The chapter ends with some examples of profiling and a little bit of discussion about profiling when you can't change your database.

Chapter 3: Schema Optimization and Indexing

Having a correctly designed schema is important for any database and MySQL is no exception. This chapter concentrates on what that means for MySQL specifically. An example I wasn't aware of is how NULLable columns in MySQL can impact the database. I wouldn't have guessed that a nullable column would use more space than a NOT NULL column.

A large portion of the chapter is dedicated to the various MySQL data types, and considerations for each, followed by the various types of indexes allowed by the storage engines. It even includes a way to build your own hash index if your particular storage engine doesn't support them.

I didn't realize that MySQL supports "covering indexes". A covering index is called a "fast, full index scan" in Oracle. Basically, all of the data to satisfy a query exists in an index so a table read is never required. This can save a tremendous amount of IO and increase performance. This is a fairly sophisticated optimization that is not supported by many "advanced" databases.

This is a large chapter and includes a huge amount of useful information. Pros and cons of normalization, an indexing case study, summary tables and more. This should be mandatory reading for anyone who is designing real world database schemas in MySQL.

Chapter 4: Query Performance Optimization

How do I tune a query? The age old question asked by developers around the world. There are some general answers to this question but each database has its own quirks and considerations. This chapter addresses those issues for MySQL.

You get the usual: don't fetch more rows than needed, reduce IO and don't use "SELECT *". You get a lot more than that, though. in "Ways to restructure queries" you read about "chopping up a query". In that section, the authors recommend, in certain scenarios, using procedural code to chunk out operations. And in "Join Decomposition", the authors recommend, again in certain scenarios, to reduce a query with joins to its component parts and merge the data in the application.

They give the reasons why to do this with details on how it impacts the internals (like caching). If you read Oracle optimization books, you will get exactly the opposite advice. This is the reason it is important for designers and developers to not assume that every database works the same and follows the same rules. To take advantage of a database, you need to understand the database.

This is another chapter that is required reading for anyone designing MySQL databases. The coverage of the limitations in the MySQL optimizer is worth the cost of the book. This chapter also covers optimizer hints and user defined variables. The user defined variables might not be something you would consider when tuning but maybe they should be.

Chapter 5: Advanced MySQL Features

Chapter 5 is a mix of "other" stuff. A bit of this, a bit of that. It covers the query cache, stored code, prepared statements, updateable views (and limitations of), character sets and conversions, recent full text advances and distributed transactions. There is a really good section on merge tables (which I haven't used) and partitions (which I have used).

A new type of stored code n MySQL 5.1 is an event. An event is kind of like a DBMS_JOB in Oracle. You can schedule an event to run at a certain time or on a certain frequency. Like a DBMS_JOB, you can't send in variables or return results (well you can fudge those pragmatically, of course). Also like a DBMS_JOB, errors show up in the log file.

Chapter 6: Optimizing Server Settings

Chapter 6 gives you coverage of many (all? - most?) of the server settings. It goes beyond that though. It also gives you an understanding of what the setting does as well as when and how to use them.

Chapter 7: OS and Hardware Optimization

Hardware, the bane of most database developers. I know that I prefer to spend my time within the database not in the OS. This chapter explains what, and why, hardware to buy. How to select a CPU(s), memory and disk. It even covers the various flavors of RAID. Closing out the hardware section is a discussion of SAN, NAS and network configuration.

The OS portion of this chapter deals more with configuring the OS rather than choosing the OS. It starts with a little bit of info about the various OSes that run MySQL and which file systems you might want to choose. The rest after that is configuration.

Chapter 8: Replication

Replication is a favorite topic of mine. Most of my replication experience has been with Oracle and a little bit with Postgres. I've not had to replicate MySQL but this chapter gives me a good starting place should I need to. This chapter gives a quick overview of replication and then dives into MySQL specifics.

The nice thing about MySQL replication is that it is integrated with the server and has been for a long time. That means it's pretty stable and mature. This chapter gives a step by step guide to setting replication up and running with it. Because the process is so mature, it's really not that hard.

This chapter also digs into some of the inner details of how replication in MySQL works and some of the various configurations (master-slave, master-multi-slave, master-master, etc). I like that it also covers common problems with replication and the problem solutions. That's very handy to have on hand.

Chapter 9: Scaling and High Availability

Chapter 9 is another chapter that should be mandatory but this time for anyone working on high volume MySQL implementations. It starts with terminology to ensure that everyone is on the same page. After that we get into the goodies.

There is a discussion of data sharding. This is splitting data across different nodes in a cluster. This is very different than scaling in Oracle. If you work with Oracle and MySQL, some of these rules are exact opposites of each other. The book spends quite a bit of time on this topic and that's good because it is counter intuitive to me.

The rest of the chapter covers clustering, load balancing and high availability concepts. This is a good chapter that is pretty deep. I will have to refer back to it in the future.

Chapter 10: Application-Level Optimization

Chapter 10 is a fairly short chapter that discusses some common application issues and possible fixes. It includes a discussion of caching.

Chapter 11: Backup and Recovery

Backup and recovery is arguably the most important task for a DBA. It's also just about the most boring thing to read about. Chapter 11 covers why it's important, when to do it and how to do it. An added wrinkle in the MySQL backup and recovery scenario are the various storage engines and their impact on any particular backup methodology.

Chapter 12: Security

Chapter 12 outlines basic security in MySQL: accounts, privileges, and grant tables. It moves on to how to grant privileges and how MySQL checks them at runtime. It also covers common problems and solutions. It gets into OS, network and application level security and encryption. I'm not sure how important this topic is in a book called High Performance MySQL but it is handy in a MySQL Complete Reference.

Chapter 13: MySQL Server Status

MySQL includes a server command, SHOW STATUS, that can give plenty of information about the status of the server. This chapter walks you through various sections of the command results and what they mean. The authors give you clues about what to look for to interpret the results.

Chapter 14: Tools for High Performance

This chapter should probably be called "Tools Everyone Needs." These aren't so much performance tools as they are tools for everyday usage. Included are the MySQL Visual Tools, SQLyog, phpMyAdmin, Maatkit, innotop and more.

Appendices

There are three appendices: Transferring Large Files, Using Explain Plan and Using Sphinx (Full-Text Search) with MySQL.

My Summary

This is a good book that is well worth the cost. While it is not a newbie book, there is plenty here for novice and expert alike. I can pretty much guarantee that if you work with, or want to work with, MySQL, you will get some value from it.

I found some sections much more valuable than others. That's not unexpected. I also found the information to be at just the right level of detail. I have been working with databases for a long time though, and off and on with MySQL for a while. I think for someone newer it would still be the right amount. In the sections where there might be confusion, there is usually a discussion of terminology. For a MySQL guru, it might be a bit too explanatory and not detailed enough. I just have to say that this book is targeted more toward a novice to intermediate level rather than complete newbies or experts.

I do have a nit to pick about the title though. While the majority of the book does focus on performance, I think the title is misleading. I don't mean that in a bad way as you get more than you might expect from a book with this title. If it was named more like "MySQL Performance and Usage" or "The MySQL Reference including Performance" it might get a larger audience.

If you buy this book along with MySQL in Nutshell and MySQL Cookbook, I don't think you would need another MySQL book in your library.

I enjoyed reading this book, it is well written and, for the most part, flows logically from one topic to another. I didn't concentrate on any typos or oopsies as there is an updated version on the way with most of those already fixed. I didn't find many anyway. I can pretty much guarantee that I will refer back to this book in the future.

LewisC

Technorati : , , ,

No comments: