Sunday, September 30, 2007

Tablespaces in MySQL, Oracle and Postgres

If you are not familiar with tablespaces you may be wondering what the big deal about them is. Tablespaces are a logical addition to a database that helps maintenance, and potentially, can improve performance.

In Oracle and MySQL, a tablespace is a logical unit meant to store segments (i.e. tables and indexes). In Postgres, a tablespace is a physical unit. It is a symbolic link to a directory. Postgres does not allow tablespaces on operating systems that do not support symbolic links (such as windows).

The data file is the actual physical storage mechanism in Oracle and MySQL. Postgres stores tables in individual files. Postgres support of tablespaces is minimal. In MySQL and Oracle, performance can be improved by a more granular spread of data across disks. Ease of maintenance is maintained due to the logical grouping of tablespaces.

Oracle syntax for creating a tablespace is much the same as MySQL but with many more options. Oracle also allows a single tablespace to be made up of many data files.

Below is a very simple example of creating and using a tablespace in MySQL:

mysql> create tablespace testts
-> add datafile 'myfirstfile'
-> engine = falcon;
Query OK, 0 rows affected (0.48 sec)
mysql> create table testmyts (
-> abc integer )
-> tablespace testts;
Query OK, 0 rows affected (0.11 sec)
mysql>



The first command creates the tablespace (naming it testts) and assigning it a file name of myfirstfile. Of course, we are using the falcon engine. If you look in your MySQL data directory after running this command, you should a new file named myfirstfile.

The second command creates a table using our new tablespace. If you look at the tablespace now, it should be bigger. If not, insert a bunch of rows and watch it grow.

Oracle syntax, in its simplest form, is very close to MySQL syntax. Here is the same example in Oracle.

SQL> create tablespace testts
2 datafile 'myfirstfile'
3 size 10M;
Tablespace created.
SQL> create table testmyts (
2 abc integer )
3 tablespace testts;
Table created.
SQL>



In Oracle, we leave off the engine keyword and we need to declare a size. We can create a small data file and use autoextend if we want, but we MUST declare an initial size.

In Oracle, it is possible to create each partition of a partitioned table in its own tablespace. I'm not sure if that's possible with MySQL, but I plan to try it out!

Saturday, September 29, 2007

How do I log into MySQL?

I remember the first time I downloaded MySQL. I think I was using Mandrake Linux. Anyway, the install was fairly painless but once it was installed, I had no clue how to run queries.

I was coming from an Oracle background and was used to SQL*Plus. I was also familiar with PostgreSQL and psql. For the life of me, I could not figure out how to get into MySQL.

So, for you developers and brand new users, you can easily start MySQL and start using it. This is not meant for a production installation, just for playing on your laptop or desktop.

Start MySQL by running mysqld (mysqld.exe on Windows). It will be in your MySQL home/bin directory. That gets the server portion of our program running.

The SQL*Plus equivalent is mysql (or mysql.exe). If you are logging in for the first time, you can use root. Once you are in, you can create other users.

To log in and run commands, type:

mysql -u root

That will load the character based MySQL Monitor. At this point you are in and ready to play.

LewisC

Falcon Test

I just downloaded Falcon and was running some tests. I was getting an error:
C:\MySQL\bin>mysqladmin version status proc 
mysqladmin: connect to server at 'localhost' failed
error: 'Access denied for user 'ODBC'@'localhost' 
                             (using password: NO)'
To fix this error, just add a -u parameter, i.e.:
C:\MySQL\bin>mysqladmin version status proc -u root
mysqladmin  Ver 8.42 Distrib 6.0.2-alpha, for Win32 on ia32
Copyright (C) 2000-2006 MySQL AB
This software comes with ABSOLUTELY NO WARRANTY. This is free 
software, and you are welcome to modify and redistribute it 
under the GPL license
Server version          6.0.2-alpha-community-nt-debug
Protocol version        10
Connection              localhost via TCP/IP
TCP port                3306
Uptime:                 27 min 43 sec
Threads: 1  Questions: 4  Slow queries: 0  Opens: 15  Flush tables: 1  
Open tables: 8  Queries per second avg: 0.2
Uptime: 1663  Threads: 1  Questions: 4  Slow queries: 0  Opens: 15  
Flush tables: 1  Open tables: 8 
 Queries per second avg: 0.2
+----+------+----------------+----+---------+------+-------+------------------+
| Id | User | Host           | db | Command | Time | State | Info             |
+----+------+----------------+----+---------+------+-------+------------------+
| 5  | root | localhost:4682 |    | Query   | 0    |       | show processlist |
+----+------+----------------+----+---------+------+-------+------------------+
I'm going to be blogging about my attempts to use some of the newer enterprise features of Falcon, as well as general MySQL programming info. LewisC

Friday, September 28, 2007

Free MySQL Magazine

I stumbled across a free MySQL magazine while reading the 64th edition of Pyhthian's Log Buffer at Diamond Notes (a MySQL DBA Blog). The magazine is MySQL Magazine. There are currently two editions, Summer 2007 and Fall 2007. They're in PDF format and, at least for a MySQL beginner like me, are very useful. The Summer 2007 edition has a very good "Securing MySQL Server" article. One improvement I would like to see om the site would be to have a table of contents in the HTML. That way I wouldn't have to load the PDF just to see what's inside. That's a minor nit though. LewisC

Saturday, September 22, 2007

Differences Between MySQL and Oracle

Augusto Bott at Pythian recently posted a good entry detailing some differences between Oracle and MySQL from a MySQL DBA's perspective, From MySQL to Oracle: A Few Differences. My viewpoint is exactly the opposite, I know Oracle extremely well but I have a lot of learning to do with MySQL. Augusto ran into a SQL*Plus issue, where it is less than obvious where a SQL syntax error is. He could get around this error by using SQL Developer, Oracle's free SQL Development IDE. SQL Developer can connect to and browse MySQL databases in addition to Oracle and SQL Server. He also noticed some syntactic differences between MySQL and Oracle. That is something I have been running into in my experimentation with Oracle, Postgres, MySQL and DB2. For something touted as a standard, SQL is VERY different from one database to another. This is a good article. I can't wait to see more from Augusto. I like seeing his viewpoint as it might help me avoid some of the pain I would otherwise experience. LewisC

Monday, September 17, 2007

Partitioning in MySQL 5.1, Part 2

Read Partitioning In MySQL, Part 1 before reading this post.

partition pruning Partition pruning is the most critical performance component about partitioning. Partitioning can ease management of your data (for example the ability to just backup or restore a single partition or the ability to drop a no longer needed partition) but the main reason people add partitions is for performance.

Pruning helps performance by only reading the partitions that need to be read for a query. For example, if you have a table range partitioned by year that has partitions for the years 2000 through 2010. If you write a query selecting data between 2002 and 2005, you don't want to have to scan all 10 partitions.

The optimizer needs to be smart enough to select and scan just those partitions that hold the data. This is called partition pruning. All of the databases that I have used that support partitioning (Oracle, Postgres, EnterpriseDB and DB2) support partition pruning. The better the optimizer, the less often the optimizer will erroneously scan partitions that aren't really required.

partition management As I said above, partition management is important and this is one area that postgres (and enterprisedb) lacks. The management and support functions in MySQL are pretty robust. The way it will dynamically move data when a partition is dropped or added is pretty sweet. This is very advanced logic. I will need to play with it to see how robust and performant it is but in most databases, moving data like this requires manual intervention.

I also like the ability to convert a table to a partitioned table online. That is another feature that requires manual intervention in most databases.

partitioning limitations The last slide I will cover from the presentation are the limitations. Right now there are several severe limitations.

Most of the limitations listed here are no big deal. All partitions must use the same storage engine (no biggie) and increased response time for partitioned tables with many partitions (expected). The time you save by partition pruning should outweigh the overhead by a large margin.

The things that bother me are the integer result requirement. I'm just not sure how I would implement range partitioning using a numeric. That really sounds like a list partition to me.

The lack of foreign keys in a partition is also a limitation of postgres and is a show stopper in many cases. If you can't ensure the integrity of your database, do you really need that data? I'm glad the fix for that is on the road map.

What do you think of partitioning in MySQL?

Friday, September 14, 2007

Partitioning in MySQL 5.1, Part 1

I was browsing around the MySQL web site tonight and ran across some free webinars (recordings). Seeing as how I once did a podcast on Oracle partitioning, one webinar that jumped out at me was Partitioning in MySQL 5.1 and onwards.

I didn't even know MySQL did partitioning. Here is a description of the webinar:

In this webinar we will provide further insight into MySQL partitioning, including:

  • Introduction to MySQL Partitioning
  • Linear Key Partitioning
  • Partition Function
  • Partition Options
  • Information Schema for Partitions
  • EXPLAIN PARTITIONS
  • Partition Management
  • Partition Pruning
  • Partitioning and NULL Values
  • Partitioning Implementation
  • Partitioning for MySQL Cluster
  • Future Roadmap for Partitioning
  • Partitioning Limitations in 5.1


Sounds good. It's a good presentation but kind of slow. If you're like me and would prefer to move at your own speed, you can download the presentation itself.

For those of you even more impatient than me, I will point out a few highlights. The beginning of the presentation was about what partitioning is and why use it. I'll skip over those parts. If you are interested in those details, post me a note and I will do a blog entry on that topic.

partitioning type One of the more important pieces of information is the types of partitioning supported.

Oracle supports all of these types of partitioning but some other very good databases do not. For example, PostgreSQL only supports range partitioning (and list becuase of that) but you would have to manually implement a hash partition and any composites are out of scope for now.

I am fairly impressed with this support. You can also mix and match engines (in future versions). That's pretty exciting.

A quick description of the types for those not familiar:

  • Range Partition - Things like by year, by month, etc. Used very frequently in data warehousing. You might have a partition for each year. You can drop partitions (old years) and add new parititions (for future years). Range partitions tend to be dynamic (you'll be adding and dropping partitions over time).
  • List Partition - A known list of partitioning values. You may partition by state, country, region etc. Each value gets its own partition. You can add partitions but a list tends to be fairly static.
  • Hash Partition - A hashing algorithm creates a numeric value. You'll use this on large data sets that don't have a logical partitioning key. The idea is to give you the ability to partition on numeric data like a primary key.
  • Composite Partition - A combination of the three partitioning types above.



partition function The partition function is the function that determines which partition a particular row will reside in. The current implementation has some serious limitations but those can be worked around. Basically you write a function, say by comparing dates, and that determines which partitions will hold your record. I'm sure that I could live with two of the current limitations: always an integer value (how do you get that to work with dates?) and the fact that you can only write the function on your primary key.

I'll finish up my coverage of this presentation in part 2. All in all, I am quite impressed with the partitioning function in MySQL 5.1.

LewisC

Saturday, September 8, 2007

MySQL 6.0 in Alpha Test

I just found out that MySQL 6.0 with the Falcon Storage Engine is available as Alpha. Falcon adds ACID compliance, tablespaces and performance improvements. It's supposed to also simplify administration but I'm not sure how MySQL can be simplified. It's pretty simple now. I've always thought that the ability to swap out the storage engine is the best feature of MySQL. It reminds me of the replaceable database drivers from my clipper days in the late 80s and early 90s. From my reading about Falcon, it looks like one advantage will be the fact that Falcon has been engineered to take advantage of 64 bit architectures and large memory caches. User data files in Falcon can be up to 100 terabytes. That should pretty much cover most uses. It looks like you only get a single data file per Falcon database. Falcon has what it calles the Falcon serial Log. It's the equivalent of the Redo Log in Oracle or the Write Ahead Log (WAL) in Postgres/EnterpriseDB. This log file is also used for crash recovery (which I believe is completely automated in MySQL 6.0). Falcon uses 4 different memory caches: log, system and index, page and record. It uses a sophisticated algorithm to determine what data remains when the max is hit. Supposedly this algorithm is faster and more complex that the LRU.

Sunday, September 2, 2007

Thoughts on MySQL Proxy

I am reading up on MySQL Proxy and find it to be very interesting. It looks like the proxy is a network layer between clients and servers. A client may think it is connecting to a single server but in reality, the client's query could be running against multiple distributed servers. Another nice feature is the ability to monitor and secure data at the query and result level. A client can send a query and the proxy can log that query. When the results come back, the proxy could strip out data (or mangle it) before it goes back to the client. That would be a nice feature for protecting data during development and testing. The developers and testers can run their queries against production and the proxy could mangle sensitive data on the way back out. It's only compatible with various Unix/Linux flavors at this time so no Windows. That's probably not that big of an issue. I don't know anyone running MySQL in Windows in production. It's pretty slick. I need to do some additional reading before I can start playing with it.