[Dbix-class] A more accurate redux of zby's comments

Fri Jan 25 07:07:45 GMT 2008

On Thu, Jan 24, 2008 at 01:43:36PM +0100, Zbigniew Lukasiak wrote:
> On Jan 24, 2008 12:39 PM, Matt S Trout <dbix-class at trout.me.uk> wrote:
> > On Thu, Jan 24, 2008 at 08:53:08AM +0100, Zbigniew Lukasiak wrote:
> > > This started with me trying to add some new features to create - they
> > > were rejected because I was using update_or_create and it does not
> > > always work, so I went and tried to fix update_or_create and I found
> > > out that this bug is a result of two things:
> > >
> > > 1. that you need to delete the primary key when you have them
> > > undefined and try to insert the row in PostgreSQL
> > > 2. that when you delete the pk instead of leaving it undef then find
> > > will not work as advertised
> > >
> > > My plead to a workaround for 1) - by simply deleting the PK in the Pg
> > > storage driver was rejected so those that use PostgreSQL are forced to
> > > live with the consequences of using find with the PK deleted.
> >
> > That's because Postgres' behaviour in this situation is correct.
> >
> > The fact that MySQL and SQLite both silently accept bad input is not a
> > reason to emulate the bug under other DBs that actually care about data
> > integrity.
> 
> OK - I can agree with that.  My thinking was that you are not slave of
> the SQL semantics here - you can introduce your own (in fact you
> already do because you don't insert undef - you change it to NULL in
> the query). But maybe you are right that this should not go into the
> core - I am now looking into ways to introduce that as an add on.

undef ~~ NULL is the DBI standard; no special semantic here at all.

If anything there should be an add-on that makes MySQL and SQLite behave
the same way as postgres, since that's the correct behaviour.

MySQL 5 in strict mode may already do this actually - or at least one of
the many knobs you can flip saying "actually care about data integrity"
may.

> >
> > > First about the nature of the problem. When called on a ResultSet
> > > object find tries to determine if the query it receives and the
> > > internal conditions of the ResultSet object do include at least one
> > > full unique constraint (for example a primary key). But it is not
> > > always possible to do that. Let me quote from one of the core
> > > developers about that problem:
> > >
> > > > Of course you can't always determine. But if you don't know, then you default
> > > > to "no it doesn't produce a unique row" and provide some way for the user to
> > > > say "actually I know it does and I accept you can't help me if I'm wrong" :)
> >
> > I was talking about join conditions there and introspection onto those.
> >
> > Please don't quote me out of context.
> >
> > My comments there were for the *_related stuff - that it couldn't -always-
> > be possible to know that a join condition would result in a unique record
> > on the other side, but that we should provide a flag to allow the user to
> > specify that they know it -does- and DBIC should proceed as normal.
> 
> *_related can sneak in in many ways. For example when you filter the
> ResultSet by some access condition (rows that a user owns or
> something).  But I agree that this is the hard case.  And I think I
> agree with your solution with that flag as well - just what would that
> 'proceed as normal' mean? Would it mean that the uniqueness is not
> checked at all, the query is reduced by all columns that don't belong
> to any of the unique constraints (or that don't belong to the chosen
> unique constraint), then it is run and the first row is returned?

It would mean that DBIC would accept the user saying "I know you can't tell
but this is unique". Possibly we should also check to see if the sth is
exhausted after getting the single row and produce a warning or exception
if not.

> > > The consequence of that is that you should not use these methods in
> > > libraries where you cannot say "actually I know it does and I accept
> > > you can't help me if I'm wrong" because you don't know what ResultSet
> > > and query you receive.
> >
> > Well, this is true. But it doesn't actually matter since if you -don't-
> > know that you shouldn't be calling find() or any of the methods you
> > enumerate in the first place.
> 
> When you do know the PK then you would call find, when you do know
> that you have no PK then you would directly call create or new.  It is
> only when you don't know that you need any of these listed methods.

I think you're (a) assuming PK rather than one of N unique keys, (b)
assuming an auto-increment or other surrogate PK rather than a natural one.

Much of the utility of the *_or_* methods is when one or both of these is
not the case.

-- 
      Matt S Trout       Need help with your Catalyst or DBIx::Class project?
   Technical Director                    http://www.shadowcat.co.uk/catalyst/
 Shadowcat Systems Ltd.  Want a managed development or deployment platform?
http://chainsawblues.vox.com/            http://www.shadowcat.co.uk/servers/