[Dbix-class] RFC - some DBMS features you use

Fri Feb 5 02:30:04 GMT 2010

On Thu, Feb 4, 2010 at 17:07, Darren Duncan <darren at darrenduncan.net> wrote:
> Hello,
> 1.  Locking.  Do you explicitly place locks (including with "select ... for
> update" or otherwise) on either the whole or parts of the database to ensure
> no one else updates them for the duration of the locks?  What DBMSs do or
> don't support doing this?

All DBMSes worth talking about support locking at some level. Whether
or not they should is a different story.

Locking, afaict, is a way of managing the conflicting transactions
problem. Even though it's harder to program (at least, initially),
database engines should really support indicating that a transaction
commit would have a "merging conflict", similar to how git works.
Then, the client can query the engine as to what the nature of the
conflict is, resolve it, and attempt to commit again with resolved
fixes. Normally, this would be done by discarding actions done in the
transaction and restarting with a fresh view of the database.

> 1a.  At what levels of granularity do you care about being supported?
>  Examples:
> - whole database at once
> - whole database relvars/tables at once
> - just specific tuples/rows with specific primary key or other key values
> (eg, where person id is 3 or 7)
> - generic predicate locks, covering any tuples/rows that are or might be
> visible to a particular query (eg, where name starts with 'f' and date_a is
> before date_b and the person is a member of clubs started before this date)
> - other kinds?

All less-granular locks are degenerate forms of more granular locks.
Having implemented file-level locking on dbm-deep, doing anything more
granular is HARD. But, once you have one level, you can do all levels
above it.

> 1b.  Do you want locks to be freed only explicitly or automatically at the
> end of a transaction commit or rollback?  What do current DBMSs do and what
> do you want to be able to do?  (Personally, I would think being able to
> maintain locks independently of transactions is useful, especially of inner
> transactions or savepoints.)

A lock should NEVER persist beyond the enclosing transaction. It's a
nesting thing.

> 2.  Read consistency.  After you start a transaction, are you seeing a
> consistent database state besides any changes you make yourself, or if you
> repeated the same query without making changes, would you get the same
> results both times?  That is, are committed changes by other database users
> visible to you to affect the results.  Do you want to be able to choose a
> desired behavior and have it enforced, such as whether you want to see other
> committed changes immediately or not see them at all during your
> transaction, the latter especially if it is a read-only transaction?  For
> picking either behavior, how do you currently tell the DBMS to do this?  And
> do you do it natively with SQL or does your Perl wrapper handle it for you
> and if so then how?

This concept of non-serializable transactions is absolutely insane.
Transactions need to be serializable, period. Anything else is an
optimization and a poor one given the current push towards
parallelization. Serializable transactions can be transferred between
multi-masters.

> 2a.  Is there anything you do differently if you are using cursors versus if
> you are not?

Not when dealing with transactions. q.v. my answer to #2.

> 2b.  Is your DBMS multi-versioned, so that someone can see a consistent
> database state for an extended period without blocking others that want to
> update it?  Do you desire such functionality or do you prefer that updates
> of a common-interest section will block other users?  What do current DBMSs
> do in this regard or which let you choose?

When implementing transactions in dbm-deep, I ended up doing pure
MVCC. When I tried to add SQLite and MySQL as optional backends, I
found that their concept of transactions was so poor that I couldn't
support transactions in dbm-deep if I was using them as a backend.
MVCC is, afaik, the only way to do properly serializable transactions.
Oracle does this, but no-one else does.

> 2c.  Do you explicitly concern yourself with the above at all, or do you
> just do your read and update queries without worrying about whether things
> might have changed in the middle, and just let the DBMS sort it out
> implicitly, and you let things fall as they may?  Note that conceivably this
> is the scenario easiest to support as I just don't do anything special,
> sometimes.  Do any Perl DBMS wrappers care about this?

Yes, DBIx::Class will care about this.

> 3.  Transition constraints.  Can you currently or do you want to be able to
> declare declarative transition constraints that the DBMS enforces.  In
> contrast to state constraints, which look at the current state of a whole
> database and see if it is "consistent" with respect to itself, a transition
> constraint compares the state of the database before the change you made
> with the state afterwards and says whether the database may directly
> transition from the first to the second.  Any SQL TRIGGER which looks at
> both "OLD" and "NEW" together for the purpose of veto power on a database
> change or transaction would qualify.

Yes, I want transition constraints. These are very important when it
comes to properly modeling financial systems. There are several
situations where state constraints are too complex (and so are never
built), but transition constraints would be helpful.

> 3a.  Do you expect transition constraints to evaluate per statement or inner
> transaction / savepoint or only at the end of a main transaction, comparing
> the state before the transaction to after it?

Yes. :)

In other words, all of those options should be possible.

> 4.  Do you use triggers to optionally cause a database update that isn't
> conceptually part of the triggering action, such as write to a log table, or
> do you only use them to define constraints that say can't be expressed with
> a CHECK clause or unique/foreign/etc key declaration?  Note that if the
> trigger is simply implementing an updateable view, where all of the values
> being written are from the user in the same statement/statement group, that
> doesn't count as I would consider that lot simultaneous, the cause of
> itself.

Triggers . . . I have a love/hate relationship with them.

> 5.  Users.  Do you use multiple DBMS-recognized users in your database or
> just one?  I refer to what you give to DBI's connect() as its user/pass.  Do
> all users of your applications share the same user/pass or does each human
> user of your application have their own user/pass that is used there?  If
> you only use one, do you wish you could have multiple or does your Perl
> layer emulate that you are without you actually using multiple at the DBMS
> level?

Applications have their own database user. Sometimes, a given
application might have multiple users, but those are for application
functionality reasons, NEVER for human users. Humans don't know about
the database(s) - they have application-level authorizations. Those
translate into datastore-level actions.

> 5a.  Do you generate temporary database users at runtime, ostensibly for
> security purposes.  For example, the user/pass a user of your app logs in
> with isn't an actual DBMS user, but when they want to login, a DBMS user
> whose only privilege is to invoke a specific stored procedure is used to
> validate the user/pass that the user gave, and then if accepted a temporary
> user/pass is generated by the stored procedure where those credentials are
> returned by the procedure, and then the application logs into the DBMS with
> those to do all the other/normal work of the user for that session.
>  (Similar to this has actually been done in a major government database
> application.)

Ugh. This is a security hole.

> 5b.  Do you define your users using plain SQL or can your Perl layer do it?

q.v. above.

> 6.  User privileges.  Do you have differing ranges of privileges in your
> databases for what database users can see or change or invoke?  Is this
> enforced by the DBMS itself or just by your application, or is it not
> enforced at all but is just a "supposed to"?  Do you want this definable and
> enforceable at a lower level such that if say your higher level code is
> overly broad and makes a mistake, it won't end up showing or changing
> something the user shouldn't be allowed to?

The datastore should provide and enforce one set of privileges. The
application should provide and enforce another set of privileges. To
conflate the two is to muddy the waters.

> 6a.  What DBMSs currently support roles or per-user privileges other than
> simply whether they may login or not?  And what sorts of granularity do they
> have?

All of them do, after a fashion. MySQL's granularity goes to the level
of what columns in what tables a user can even know exists, let alone
select vs. update. Oracle's is similar (if I remember correctly).

> 6b.  How important is it to you to be able to have per-user privileges in
> your databases?  Or how important is it to be able to define these without
> using SQL?

This isn't an SQL problem and should be taken out of SQL. Most DBMS
authn/authz systems are unnecessarily hampered by the "requirement"
that all interaction with the engine be in SQL-like syntax. SQL should
be about interacting with the data. Administrivia should be handled
with a language designed for that purpose.

> 7.  Spatial types and operators.  Who uses spatial or GIS or such types or
> operators?  How valuable is it to be able to do so without writing SQL?

I don't use them. I know people who do. Instead of building these into
the engine, much better for the engine to provide the ability to
plugin new column types.

> 8.  Special values for numerics.  Who uses those IEEE float special values
> such as NaN or infinities or under/overflows or distinct +/- zero or
> whatever in their databases, or do you only care about storing or working
> with normal numbers?  If you do use any special values, then which ones and
> why?  Do you currently distinguish between multiple kinds of NaN (eg, 0/0
> versus square root of -1 versus whatever), or is a NaN just a NaN?  Note
> that "not using" would mean that an operation that would otherwise produce
> one instead returns an error / throws an exception.

NaN is a NULL. Of course, q.v. my answer to #7 (type plugins) for
people who think otherwise.

> 9.  Full text search.  Who uses this versus a simple substring match?

It's absolutely key to be able to provide this. q.v. my answer to #7
(type plugins).

> 10.  Who uses regular expressions of any kind in the database?  Either
> Perl-compatible or otherwise.  Basically any kind of pattern matching
> besides the trivial kind that LIKE supports (literals plus . plus .*).

I do, for full-text searching. q.v. my answer to #7 (type plugins).

> 11.  Out of band sequence generators.  If you read from / increment a
> sequence generator from within a transaction, do you want a rollback of that
> transaction to also rollback the sequence generator or not?  Do you want
> both possibilities to be supported.  What do current DBMSs support?

I don't know of a current DBMS that supports rolling back a sequence
generator because too many people have (mis)used sequences as an
insertion ordering. Frankly, I don't think sequence generators should
be supported. Instead, provide nano-second timestamping so that you
can always know that -this- row came before -that- row and UUID
generators for primary keys (the most common use for sequence
generators).

> 12.  Sequence generators that produce values other than integers?  Who uses
> this or what DBMSs support them?  And what would be the semantics for
> non-integers?
>
> 13.  Transactions and data definition.  If you start an explicit
> transaction, and then do a data definition operation, such as changing the
> type of a table column, do you want the latter to be subject to the
> transaction such that you can roll it back?  (I believe SQLite subjugates
> everything to transactions.)

YES YES YES!!!1! **ALL** SQL statements inside a transaction are
subject to the transaction. Anyone who provides an "implicit commit"
should be shot. Period, end of story.

q.v. my answer to #6b about SQL for data interaction and some other
language for administrivia.

> 14.  Implicit commit of explicit transaction.  If you have an explicit
> transaction, should any operation you do in it other than an explicit
> 'commit' cause it to commit, or should only 'commit' do that?  If someone
> attempts data definition within a transaction and the DBMS doesn't support
> subjugating those to transactions, then should the data definition attempt
> fail or should it cause an automatic commit.  (I think the subjugate-or-fail
> should be the case, and the implicit commit is always bad.)  What DBMSs
> currently support each behavior? (AFAIK, SQLite does not automatically
> commit, but MySQL does.  Know otherwise?)

I agree with subjugate-or-fail. Implicit -anything- is _ALWAYS_ bad.
I'm talking about implicit commits, implicit type coercions, implicit
case-insensitive matching . . . ANYTHING. I know why MySQL did things
this way, but it's a bad practice. It's unpredictable and if you want
your engine to have any traction in places that will pay you, you have
to be predictable. Even Oracle could do more about this.

Rob