[Dbix-class] Recursive update_or_create - RFC

Mon Jun 23 18:49:55 BST 2008

On Mon, Jun 23, 2008 at 9:57 AM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
> On Mon, Jun 23, 2008 at 12:23 AM, luke saunders <luke.saunders at gmail.com> wrote:
>> On Sun, Jun 22, 2008 at 3:21 PM, luke saunders <luke.saunders at gmail.com> wrote:
>>> On Sat, Jun 21, 2008 at 9:45 PM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>>>> On Sat, Jun 21, 2008 at 10:09 PM, luke saunders <luke.saunders at gmail.com> wrote:
>>>>> On Sat, Jun 21, 2008 at 4:05 PM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>>>>>> On Sat, Jun 21, 2008 at 4:44 PM, luke saunders <luke.saunders at gmail.com> wrote:
>>>>>>> On Sat, Jun 21, 2008 at 1:51 PM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>>>>>>>> On Sat, Jun 21, 2008 at 1:51 PM, luke saunders <luke.saunders at gmail.com> wrote:
>>>>>>>>> On Thu, Jun 19, 2008 at 1:29 PM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>>>>>>>>>> In the attachments you'll find  DBIx::Class::ResultSet::RecursivePUT -
>>>>>>>>>> it is a base class for ResultSets and it provides just one method:
>>>>>>>>>> recursive_PUT which works just like update_or_create - but can update
>>>>>>>>>> a whole tree of related records.  I wrote it as a method for REST PUT
>>>>>>>>>> operations, i.e. creating and updating complex objects.  There is one
>>>>>>>>>> more difference from update_or_create - it will work on tables with
>>>>>>>>>> auto_increment primary keys - you just need to set the key to 'undef'
>>>>>>>>>> - then it will be deleted from the INSERT sent to the database (this
>>>>>>>>>> will only happen for auto_increment keys - other columns will be set
>>>>>>>>>> to 'NULL' just as usual).  This is additional complexity of the
>>>>>>>>>> semantics - but with the benefit of more universal and uniform usage.
>>>>>>>>>>
>>>>>>>>>> I am waiting for comments - especially about the names - and also
>>>>>>>>>> about the interface.  I am thinking that perhaps the object oriented
>>>>>>>>>> interface creates too much hassle for the programmer as it requires to
>>>>>>>>>> explicitly set up the ResultSet classes - while normally they can be
>>>>>>>>>> omitted.  So maybe I should change the interface to a simple
>>>>>>>>>> procedural interface and change $rs->recursive_PUT( $update_hash ) to
>>>>>>>>>> recursive_PUT( $rs, $update_hash).  What do you think?  Any other
>>>>>>>>>> comments are welcome as well.
>>>>>>>>>
>>>>>>>>> so essentially this provides another $rs->update_or_create which can
>>>>>>>>> work recursively, right? so if that's the case then why not just
>>>>>>>>> override the normal $rs->update_or_create method rather than providing
>>>>>>>>> another one? which then begs the question, if this is worth having at
>>>>>>>>> all, then maybe it should just be part of DBIC itself. though if this
>>>>>>>>> were to be a separate ResultSet class, then I wouldn't call it
>>>>>>>>> RecursivePUT - it's nothing to do with PUT requests really - so maybe
>>>>>>>>> RecursiveUpdateOrCreate or something.
>>>>>>>>
>>>>>>>> Certainly it would be an ideal solution for me if it was included in
>>>>>>>> the DBIC core - unfortunately I don't think this can happen - because
>>>>>>>> it requires a special treatment of 'undef's for auto_increment columns
>>>>>>>> - see below.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> i don't quite follow the point about it working with auto increment
>>>>>>>>> PKs. are you just saying that if you provide { pk => undef } to the
>>>>>>>>> existing update_or_create implementation, it'll try to insert a row
>>>>>>>>> with a NULL PK? but presumably it won't do that if you don't set the
>>>>>>>>> pk at all, so I'm not sure I see the problem.
>>>>>>>>
>>>>>>>> Yes - the current update_or_create will try to insert a NULL PK and
>>>>>>>> this will fail in PostgreSQL.  Unfortunately you cannot also just
>>>>>>>> remove the pk from the query - because then the find part will not
>>>>>>>> work - I've described this at
>>>>>>>> http://perlalchemy.blogspot.com/2008/06/dbixclass-gotchas.html .  I
>>>>>>>> consider my arrangement with special treatment of 'undef' for
>>>>>>>> auto_increment columns as the only way to have it working universally
>>>>>>>> - otherwise you cannot have just one method to do the update and
>>>>>>>> create - you need to tell the resultset if you want to do update or
>>>>>>>> create. This also means that it would be very complicated to do it
>>>>>>>> recursively - because you'll need additional parameters to tell if it
>>>>>>>> is and update or create for each row in the structure.
>>>>>>>
>>>>>>> if the either all of the primary key columns or all of the columns
>>>>>>> from one of the unique constraints are provided then it makes sense
>>>>>>> for update_or_create to attempt the find but if this isn't the case
>>>>>>> then there's no point doing the find and it should skip straight to
>>>>>>> the create.
>>>>>>
>>>>>> This cannot be done - because some of the columns can be set in the
>>>>>> resultset internal condition and it is hard to determine what columns
>>>>>> were set there.  See the example from my blog article:
>>>>>>
>>>>>> $new_cd_rs = $cd_rs->search( { -and => [ id => [1, 2], id => { '>' => 1 } ] } )
>>>>>> and then $new_cd_rs->update_or_create( { artist => 1 } ).
>>>>>>
>>>>>> This is  important when accessing related rows (because part of the
>>>>>> key of the whole of it can be the foreign key - so it will be in the
>>>>>> resultsource and it does not need to be in the query) and in case of
>>>>>> RestrictedWithObject.  Both cases require that you take into account
>>>>>> the conditions internal to the resultset, but even without them it
>>>>>> would be just more consistent with the rest of resultset methods (like
>>>>>> search - which does take into account the internal resultset
>>>>>> conditions).
>>>>>
>>>>> Yeah, I stand corrected, we can't do this check. So given that I guess
>>>>> we won't be able to have this in DBIC core until after the SQLA
>>>>> refactor, but I think that's ultimately what *should* happen.
>>>>>
>>>>> So back to your current implementation. The undef thing seems like a
>>>>> bit of a hack - I'm not sure how well it's going to play with unique
>>>>> constraints. For example if I have PK 'id' and a unique on 'name' then
>>>>> ->find({ id => undef, name => 'luke' }) is going to return nothing
>>>>> even if there already exists a row with that name, and so when the
>>>>> subsequent ->create({ name => 'luke' }) happens we're going to get a
>>>>> nasty surprise. Right?
>>>>
>>>> Yes - I am assuming that the rows are validated elsewhere.  I have no
>>>> idea how to include validation in this method.  But it is actually not
>>>> related to the 'undef' thing - this query is an error - and it should
>>>> fail.  If you let it update the row with name => 'luke' then a new
>>>> user will be able up change data of someone else.
>>>
>>> Right, so by specifying pk => undef you're basically saying "don't
>>> bother with the find, just do the create". I think probably we could need a
>>> more elegant way to specify this.
>>>
>>> Other than that I think this could be useful.
>>
>> Also, I currently need the update method of the
>> Catalyst::Controller::DBIC::API module we've talked about before to
>> allow updates to related rows, so it would be good if your module also
>> provided a recursive update method which would error if a related row
>> was not found rather than trying to create it, then I could use that.
>> Would that be simple to do?
>>
>
> I think this can be easily added as an additional param (with a global
> effect).  Patches are welcome :)
>

Fair enough.

> But why do you need that?
>

The functionality? Well I have an update endpoint (like
/api/rpc/track/id/x/update for example) to which you can not only pass
params like
 name => 'new track name'

but also
 cd.name => 'new cd name'

where 'cd' is a relation from Track. If the track in question doesn't
have a CD then that's a bad request and it should error rather than
just go ahead and create it.

> As to the second question - i.e. about eliminating the need of putting
> id => undef into the query to indicate that a new record needs to be
> created.  One solution is also an additional parameter and one that
> would be inside the query structure (because it needs to be supplied
> separately for each record).  I don't like that solution - because it
> means additional reserved word - something like create_new_row => 1
> added for the query params for each new row.  And it would not be
> consistent with the case when pk => 'some new value' and you have no
> record with such a primary key.  Maybe you have other proposals.

Yeah, I don't have anything better sadly.

Which kind of makes me think that only recursive update makes sense
until an SQLA refactor allows the correct approach for recursive
update_or_create.