Postgres vs. MSSQL
Michael L Torrie
torriem at chem.byu.edu
Wed Nov 8 11:17:58 MST 2006
On Wed, 2006-11-08 at 10:59 -0700, Daniel C. wrote:
> Is it possible to get more information from your friend about this?
I think the information posted in response to my post probably will tell
you the technical details. Here is what he had to say to me:
(10:16:25) Mark Lewis: PostgreSQL is nirvana, as far as development is
concerned. Oracle is almost as pleasant to develop with, and has more
enterprise features to boot, so it's on my 'A' list too. SQLServer and
MySQL seem more or less on the same level; they're based on an outmoded
and difficult to work with underlying data model, which prevents them
from really behaving as nicely as PG and Oracle, but they're at least
decent implementations of a flawed idea (although SQLServer has some
absurdly annoying documentation where MS tries to repeatedly convince
you that you don't really want all those pesky ACID features that are so
difficult to implement with their code base). Then there's DB2. Based on
the same old model as SQLServer and MySQL, it's a dinosaur. It hasn't
substantially changed since hard disks were invented. Working with it is
like chipping fossils out of the rock.
(10:17:36) Mark Lewis: Sorry, didn't really mean to go into a tirade
there, just venting some frustration because I need to pick up some DB2
work again today.
(10:34:16) Mark Lewis: Anyway, PG and Oracle are interesting because
although they both implement the same basic idea (keeping multiple
versions of the same row around), they come at it from two different
angles. Oracle always keeps only the most recent version of the row in
its main tablespace, and older versions get copied into the redo logs.
PostgreSQL keeps everything in the main table and differentiates each
row with a version id.
(10:36:35) torriem: interesting
(10:37:09) Mark Lewis: This explains why you need to periodically vacuum
PG databases, to get rid of the old versions, but with Oracle you just
need to back up or throw away the old redo logs once you don't need them
anymore. On the other hand, rollbacks in PG are an O(1) operation, it
just invalidates the transaction ID that modified a row. With Oracle, it
takes as long to rollback a transaction as it took to do it in the first
place, because everything needs to be copied back from the redo logs.
(10:38:35) torriem: true
(10:38:57) torriem: In my experience, though, on huge tables, vacuuming
takes a long time (4GB table) and during that time the table is
(10:39:14) Mark Lewis: This is also why you'll almost certainly never
see flashback queries in any database besides Oracle; the lock-based
DB's by definition only store the most recent version of the row in a
usable format, and when PG vacuums it discards all inactive rows, so
you'd have holes in your history.
(10:39:37) Mark Lewis: In newer versions of PG, vacuum doesn't lock the
(10:39:43) torriem: nice
(10:39:48) torriem: I'll have to move to 8.x soon then
(10:40:33) Mark Lewis: They've also added the autovacuum daemon into the
core, so it'll be smart enough to run vacuums exactly as often as you
(10:41:01) Mark Lewis: Although I still think you need to turn
autovacuum on, it doesn't run by default.
(10:43:20) torriem: good. that's at the expense of rollback history,
(10:43:29) torriem: (forgive me; my database experience and knowledge is
not too great)
(10:44:52) Mark Lewis: Well, not really. Vacuum will never remove
anything that could still be rolled back. It will only remove rows which
have definitely been outdated by committed transactions.
(10:45:16) Mark Lewis: But yes in the sense that vacuuming is why
flashback queries won't work in PG.
(10:47:22) Mark Lewis: 8.0 introduced the WAL, which is nice from an
administrative perspective (point-in-time recovery, much easier online
backups), but also significantly improved write performance, especially
for lots of small transactions. They also did a good amount of
performance tuning on the hotspots, resulting in respectable
across-the-board query improvements.
(10:48:38) Mark Lewis: 8.1 is really nice especially if you have
complicated queries with multiple 'and' terms, because it introduces
bitmap index scans which can make it much faster to use the intersection
of two indexes to answer a query.
(10:49:50) Mark Lewis: They did some of the work of making vacuum less
intrusive in 7.4, but most of it was in the 8.0 release.
(10:51:43) Mark Lewis: Oh, and with 8.0 they got rid of the big shared
memory allocator lock, so performance increases significantly when you
have lots of memory and multiple processors.
(10:53:44) torriem: sorry about that. had to go away for a moment
(10:54:20) Mark Lewis: No prob. I probably gave you more than you really
wanted to know :)
(10:54:32) torriem: no I appreciate what you've told me
(10:55:05) torriem: I definitely need to roll my database server over to
8.0. It's still 7.3. Of course we don't do anything heavy, but I do have
one application (the pr0n url sniffer) that has 4 GB of data)
(10:55:23) torriem: Seems to me that with 8.0 I shouldn't really need
mysql for much.
(10:55:47) torriem: Although mysql's auto-incrementing indexes are nice.
With PG I have to use a sequence and a default function
(10:57:06) Mark Lewis: You can use the serial data type in PG, which
makes a PG column look like an auto-increment column in MySQL by
creating a sequence behind the scenes for you.
(10:57:51) Mark Lewis:
(10:59:15) torriem: nice!
(11:02:25) Mark Lewis: For non-concurrent applications, MySQL still has
a performance advantage especially with trivial inserts and selects,
although the difference is much smaller than it used to be. So there's
still a place for MySQL. Although I wouldn't touch it for other reasons,
most notably that it has a bad habit of automatically guessing what you
really meant to do and doing something bad without giving you an error.
(11:03:28) Mark Lewis: And if you actually need referential integrity
then you need to use InnoDB tables which aren't any faster than PG.
> > Well it's likely that PHP and Django don't use any advanced DB features
> > like triggers, constraints, or stored procedures, so it really doesn't
> > matter in the least what db you pick, as long as the load is manageable.
> PHP can, Django doesn't have it (except foreign key constraints)
> natively. But since we'll be running reports and things from outside
> of Django, those things do come into play.
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
More information about the PLUG