[Dbix-class] find_or_create and unique constraints

Sat Oct 16 08:52:04 GMT 2010

Bill Moseley wrote:
> 
> 
> On Fri, Oct 15, 2010 at 1:18 AM, Peter Rabbitson <rabbit+dbic at rabbit.us 
> <mailto:rabbit%2Bdbic at rabbit.us>> wrote:
> 
>     Bill Moseley wrote:
> 
>         Two issues I'd like to see discussed here is if DBIC should
>         throw an exception if find() returns more than one row (instead
>         of a warning), and if it makes sense to fall back to the full
>         criteria passed to find() if no unique key can be determined.
> 
> 
>     I want to make sure I am getting this right - are you proposing that
>     we discuss drastically changing a stable API, potentially breaking
>     DBIC for hundreds if not thousands of users, all for the sake of
>     correctness? I am still not sure what the discussion scope is, thus
>     I'll reserve judgment until more details emerge.
> 
> 
> Peter, I'm first just trying to understand why the code works as it 
> does.   Let me try again with question marks.
> 
> I started out researching duplicate key violations as those generate an 
> error.  According to the Postgresql logs some are indeed race 
> conditions, but others are simply inserts.  Some of those turned out to 
> be calling find() with a missing or wrong unique condition set so that 
> the find failed because all columns were used in the select which 
> returned no rows and then on insert the unique constraint hit in the db 
> and generated the error.  Another bug we are seeing is find returned 
> multiple rows so the wrong row gets updated.
> 
> Those are serious problems because the data did not get into the db 
> correctly.  These are problem due to our mistakes (e.g. unique 
> constraint out of sync with db, and incorrect use of find() ), but 
> something anyone could easily experience.
> 
> We can fix our code, but the discussion here is what can or should DBIC 
> do to prevent this from happening to other users.
> 
> Peter, do you know why it was decided that find() falls back to the full 
> criteria if no unique constraint is found?  Clearly, there was concern 
> about it not being unique because the code will issue a warning if more 
> than one row is returned.  (Obviously, the code cannot issue a warning 
> if no rows are returned when one exiss because it searched on non-unique 
> columns.)
> 
> Are there situations where a DBIC user would want or expect find() to 
> select more than one row or no rows when one existed?
> 
>     But if I was to guess what you actually need - I can propose extracting
>     the "ooh you got too many rows" code into a single overridable point, so
>     you can hook it in your base resultset and make it die, or do whatever
>     else you wish it to do.
> 
> 
> I can do that now.  Unless I'm not understanding the reasoning for the 
> current behavior, which is possible, my concern is other users could 
> still end up updating the wrong rows or attempting to insert when a 
> unique row already exists.
> 
> Do you think that a stronger warning about multiple rows and non unique 
> constraints used in find() with, say, an option "strict_find => 1" that 
> could be added to enforce it with the goal to make it the default in 
> later versions would still break DBIC for many existing users?

I went to look at the docs and realized that they do not actually talk
about this at all, so we started with different initial info. This
option had existed since the very beginning and is called 'key', which
basically means "you either satisfy this particular unique constraint
or you don't find anything". Without a 'key' argument, find() becomes
a best-effort wrapper around search() with some extra juice.

I am now painfully aware that the docs do not say anything about this
*at all*, and while trawling the code I also discovered what is a
genuine bug, so expect both being fixed before the release the end of
this weekend.

Thanks for persisting! :)