petergeoghegan 3 months ago

"So far, so good. What's the problem here? The DDL statement can simply wait patiently until it's able to acquire its ACCESS EXCLUSIVE lock, right? The problem is that any other statements that require a lock on the users table are now queued behind this ALTER TABLE statement, including other SELECT statements that only require ACCESS SHARE locks."

This is the most important individual point that the blog post makes in my view. Lots of Postgres users are left with the wrong general idea about how things in this area work (e.g., the locking implications of autovacuum), just because they missed this one subtlety.

I'm not sure what can be done about that. It seems like Postgres could do a better job of highlighting when an interaction between two disparate transactions/sessions causes disruption.

  • Fire-Dragon-DoL 3 months ago

    Yes this is the keypoint. There is a famous blogpost on how to address this which boils down to acquiring a lock explicitly with a very short timeout, and only after acquiring the lock the alter statement is executed. Do this in a loop and essentially it prevents the disaster, but it might fail the migration if unable to take the lock (just repeat the migration)

  • Jcowell 3 months ago

    > the locking implications of autovacuum

    I thought the only case this would be a worry is if the auto vacuum is to prevent wraparound

jauntywundrkind 3 months ago

Our head of department basically put the kibosh on everyone's massive enthusiasm for migrating more and more MySQL systems to postgres, because once upon a time someone tried to run a schema change and postgres seemed to never be able to give the schema change the lock.

We didn't really get told about the problem at the time, so we don't know much about what happened. And the anger-driven (anti) development seems like it's forever going to sit there, unresolved.

Very frustrating situation for us.

  • klysm 3 months ago

    Shoving your head in the sand instead of learning from mistakes doesn’t seem very productive

alflervag 3 months ago

If this is of interest I suggest taking a look at what Robin is doing at https://kaveland.no/eugene/

TLDR; Eugene helps you write zero downtime schema migration scripts for PostgreSQL databases by giving you a friendly report warning you about any anti patterns or potential problems.

modestygrime 3 months ago

So how do people handle this in practice if the users table in this example has a ton of traffic? It might not ever succeed even with exponential backoff. It also seems strange that Postgres would need to lock the entire table just to add a new column.

  • klysm 3 months ago

    Set a short timeout on lock acquisition, eventually it will get through or you have to tolerate a larger timeout