[Dbix-class] Recursive update_or_create - RFC

Mon Jun 23 19:17:41 BST 2008

On Mon, Jun 23, 2008 at 7:49 PM, luke saunders <luke.saunders at gmail.com> wrote:
> On Mon, Jun 23, 2008 at 9:57 AM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>> On Mon, Jun 23, 2008 at 12:23 AM, luke saunders <luke.saunders at gmail.com> wrote:
>>> On Sun, Jun 22, 2008 at 3:21 PM, luke saunders <luke.saunders at gmail.com> wrote:
>>>> On Sat, Jun 21, 2008 at 9:45 PM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>>>>> On Sat, Jun 21, 2008 at 10:09 PM, luke saunders <luke.saunders at gmail.com> wrote:
>>>>>> On Sat, Jun 21, 2008 at 4:05 PM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>>>>>>> On Sat, Jun 21, 2008 at 4:44 PM, luke saunders <luke.saunders at gmail.com> wrote:
>>>>>>>> On Sat, Jun 21, 2008 at 1:51 PM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>>>>>>>>> On Sat, Jun 21, 2008 at 1:51 PM, luke saunders <luke.saunders at gmail.com> wrote:
>>>>>>>>>> On Thu, Jun 19, 2008 at 1:29 PM, Zbigniew Lukasiak <zzbbyy at gmail.com> wrote:
>>>>>>>>>>> In the attachments you'll find  DBIx::Class::ResultSet::RecursivePUT -
>>>>>>>>>>> it is a base class for ResultSets and it provides just one method:
>>>>>>>>>>> recursive_PUT which works just like update_or_create - but can update
>>>>>>>>>>> a whole tree of related records.  I wrote it as a method for REST PUT
>>>>>>>>>>> operations, i.e. creating and updating complex objects.  There is one
>>>>>>>>>>> more difference from update_or_create - it will work on tables with
>>>>>>>>>>> auto_increment primary keys - you just need to set the key to 'undef'
>>>>>>>>>>> - then it will be deleted from the INSERT sent to the database (this
>>>>>>>>>>> will only happen for auto_increment keys - other columns will be set
>>>>>>>>>>> to 'NULL' just as usual).  This is additional complexity of the
>>>>>>>>>>> semantics - but with the benefit of more universal and uniform usage.
>>>>>>>>>>>
>>>>>>>>>>> I am waiting for comments - especially about the names - and also
>>>>>>>>>>> about the interface.  I am thinking that perhaps the object oriented
>>>>>>>>>>> interface creates too much hassle for the programmer as it requires to
>>>>>>>>>>> explicitly set up the ResultSet classes - while normally they can be
>>>>>>>>>>> omitted.  So maybe I should change the interface to a simple
>>>>>>>>>>> procedural interface and change $rs->recursive_PUT( $update_hash ) to
>>>>>>>>>>> recursive_PUT( $rs, $update_hash).  What do you think?  Any other
>>>>>>>>>>> comments are welcome as well.
>>>>>>>>>>
>>>>>>>>>> so essentially this provides another $rs->update_or_create which can
>>>>>>>>>> work recursively, right? so if that's the case then why not just
>>>>>>>>>> override the normal $rs->update_or_create method rather than providing
>>>>>>>>>> another one? which then begs the question, if this is worth having at
>>>>>>>>>> all, then maybe it should just be part of DBIC itself. though if this
>>>>>>>>>> were to be a separate ResultSet class, then I wouldn't call it
>>>>>>>>>> RecursivePUT - it's nothing to do with PUT requests really - so maybe
>>>>>>>>>> RecursiveUpdateOrCreate or something.
>>>>>>>>>
>>>>>>>>> Certainly it would be an ideal solution for me if it was included in
>>>>>>>>> the DBIC core - unfortunately I don't think this can happen - because
>>>>>>>>> it requires a special treatment of 'undef's for auto_increment columns
>>>>>>>>> - see below.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> i don't quite follow the point about it working with auto increment
>>>>>>>>>> PKs. are you just saying that if you provide { pk => undef } to the
>>>>>>>>>> existing update_or_create implementation, it'll try to insert a row
>>>>>>>>>> with a NULL PK? but presumably it won't do that if you don't set the
>>>>>>>>>> pk at all, so I'm not sure I see the problem.
>>>>>>>>>
>>>>>>>>> Yes - the current update_or_create will try to insert a NULL PK and
>>>>>>>>> this will fail in PostgreSQL.  Unfortunately you cannot also just
>>>>>>>>> remove the pk from the query - because then the find part will not
>>>>>>>>> work - I've described this at
>>>>>>>>> http://perlalchemy.blogspot.com/2008/06/dbixclass-gotchas.html .  I
>>>>>>>>> consider my arrangement with special treatment of 'undef' for
>>>>>>>>> auto_increment columns as the only way to have it working universally
>>>>>>>>> - otherwise you cannot have just one method to do the update and
>>>>>>>>> create - you need to tell the resultset if you want to do update or
>>>>>>>>> create. This also means that it would be very complicated to do it
>>>>>>>>> recursively - because you'll need additional parameters to tell if it
>>>>>>>>> is and update or create for each row in the structure.
>>>>>>>>
>>>>>>>> if the either all of the primary key columns or all of the columns
>>>>>>>> from one of the unique constraints are provided then it makes sense
>>>>>>>> for update_or_create to attempt the find but if this isn't the case
>>>>>>>> then there's no point doing the find and it should skip straight to
>>>>>>>> the create.
>>>>>>>
>>>>>>> This cannot be done - because some of the columns can be set in the
>>>>>>> resultset internal condition and it is hard to determine what columns
>>>>>>> were set there.  See the example from my blog article:
>>>>>>>
>>>>>>> $new_cd_rs = $cd_rs->search( { -and => [ id => [1, 2], id => { '>' => 1 } ] } )
>>>>>>> and then $new_cd_rs->update_or_create( { artist => 1 } ).
>>>>>>>
>>>>>>> This is  important when accessing related rows (because part of the
>>>>>>> key of the whole of it can be the foreign key - so it will be in the
>>>>>>> resultsource and it does not need to be in the query) and in case of
>>>>>>> RestrictedWithObject.  Both cases require that you take into account
>>>>>>> the conditions internal to the resultset, but even without them it
>>>>>>> would be just more consistent with the rest of resultset methods (like
>>>>>>> search - which does take into account the internal resultset
>>>>>>> conditions).
>>>>>>
>>>>>> Yeah, I stand corrected, we can't do this check. So given that I guess
>>>>>> we won't be able to have this in DBIC core until after the SQLA
>>>>>> refactor, but I think that's ultimately what *should* happen.
>>>>>>
>>>>>> So back to your current implementation. The undef thing seems like a
>>>>>> bit of a hack - I'm not sure how well it's going to play with unique
>>>>>> constraints. For example if I have PK 'id' and a unique on 'name' then
>>>>>> ->find({ id => undef, name => 'luke' }) is going to return nothing
>>>>>> even if there already exists a row with that name, and so when the
>>>>>> subsequent ->create({ name => 'luke' }) happens we're going to get a
>>>>>> nasty surprise. Right?
>>>>>
>>>>> Yes - I am assuming that the rows are validated elsewhere.  I have no
>>>>> idea how to include validation in this method.  But it is actually not
>>>>> related to the 'undef' thing - this query is an error - and it should
>>>>> fail.  If you let it update the row with name => 'luke' then a new
>>>>> user will be able up change data of someone else.
>>>>
>>>> Right, so by specifying pk => undef you're basically saying "don't
>>>> bother with the find, just do the create". I think probably we could need a
>>>> more elegant way to specify this.
>>>>
>>>> Other than that I think this could be useful.
>>>
>>> Also, I currently need the update method of the
>>> Catalyst::Controller::DBIC::API module we've talked about before to
>>> allow updates to related rows, so it would be good if your module also
>>> provided a recursive update method which would error if a related row
>>> was not found rather than trying to create it, then I could use that.
>>> Would that be simple to do?
>>>
>>
>> I think this can be easily added as an additional param (with a global
>> effect).  Patches are welcome :)
>>
>
> Fair enough.
>
>> But why do you need that?
>>
>
> The functionality? Well I have an update endpoint (like
> /api/rpc/track/id/x/update for example) to which you can not only pass
> params like
>  name => 'new track name'
>
> but also
>  cd.name => 'new cd name'
>
> where 'cd' is a relation from Track. If the track in question doesn't
> have a CD then that's a bad request and it should error rather than
> just go ahead and create it.

Well - I think you (or I) can add an option to do that.  It should not
be too difficult.  But it looks more like validation thing.  Maybe we
could include validation here - but for now I have no idea how to do
it properly.

>
>> As to the second question - i.e. about eliminating the need of putting
>> id => undef into the query to indicate that a new record needs to be
>> created.  One solution is also an additional parameter and one that
>> would be inside the query structure (because it needs to be supplied
>> separately for each record).  I don't like that solution - because it
>> means additional reserved word - something like create_new_row => 1
>> added for the query params for each new row.  And it would not be
>> consistent with the case when pk => 'some new value' and you have no
>> record with such a primary key.  Maybe you have other proposals.
>
> Yeah, I don't have anything better sadly.
>
> Which kind of makes me think that only recursive update makes sense
> until an SQLA refactor allows the correct approach for recursive
> update_or_create.

I hope that this means that you agree with me - at least for now.

For the future with the hypothetical refactorisation - I need to ask
what you have in mind when you talk about "the correct approach"?
Analysing the query and checking if it comprises a unique condition on
the primary key?  And if it does not then don't call ->find - but
rather go directly to ->create?  The problem is that whatever is your
refactorisation of SQLA - then still the question if a given condition
comprises a unique constraint on the primary key is undecidable.  This
is because the condition can contain arbitrary arithmetics and
arithmetics is undecidable (Goedel).  So you'll need to define some
additional constraints on the queries used.    This will much
complicate the solution.  I don't know if I'd wait for that.

-- 
Zbigniew Lukasiak
http://brudnopis.blogspot.com/
http://perlalchemy.blogspot.com/