[Catalyst] More detailed proposal for changes related to content negotiation and REST

Fri Aug 9 19:11:52 GMT 2013

On Friday, August 9, 2013 12:00 PM, Bill Moseley <moseley at hank.org> wrote:

>
>
>On Thu, Aug 8, 2013 at 9:27 PM, John Napiorkowski <jjn1056 at yahoo.com> wrote:
>
>
>>
>>https://github.com/perl-catalyst/CatalystX-Proposals-REStandContentNegotiation
> 
>
>
>I currently extend HTTP::Body in a similar way that you describe.   But I have them a separate classes so I just do:
>
>
>use My::Multipart;
>
>then in that I hack in my type for HTTP::Body:
>
>
>
>package My::MultiPart;
>>use HTTP::Body;
>>$HTTP::Body::TYPES->{'multipart/form-data'} = 'My::MultiPart';
>>
>>

Yes, I do this as well.  It feels a bit icky though, but this is a possible approach for us as all

>>
>>
>
>
>As you propose, mapping a mime type to a sub seems pretty flexible.  I assume the sub could return a filehandle.   File uploads still need to stream the uploads to disk while making the parameters available as HTTP::Body does now.
>

Yes, my original thinking is the subref would return something that would populate body_data.  I suppose that could be an object or file handle.  This is one good reason to distinguish body_data from body parameters (which is expected to be a hash ref).

I'm not really sure the best thing to do here with multipart and particularly with the file upload canonical type 'multipart/formdata'.  My thinking is that if someone wants to roll their own, they can, be everthing goes to body_data, otherwise one can simple just use the existing built in support for this, which really isn't broken.

Now, I am not sure what to do when the multipart contains json, for example (as in a file upload but instead of a binary file plus urlencoded params, the params are application/json.  I'm very tempted to say this is out of scope for the first version.  Really this does point to some underlying inflexibility in the existing design.

>
>I like the regex mimetype keys, but should they be an array instead of a hash so can set the order checked?
>

I guess I was thinking to prevent people from listing 'application/json' more than once by accident.  And there's some issues with the best way to merge (for example you might parse differently in dev from production...

What's the use case you have in mind?  Something like first check for something like 'application/vnd.mycompany.user+json' and then fall back to 'application/(?:vmd.*+)?json' if you don't find it?  Is that an actual case you've come across?  

I guess I was thinking also that we do want to make the global functionality a bit limited, so that in the next iteration we can support something more fine grained, at the controller level perhaps.  I'd hate to introduce such terrible globalness to Catalyst which in general has be decent in avoiding that.

>
>I think we must consider large inputs/streams.    You say $_ is the request content.  Is that the full request with headers?   Is the request already de-chunked, for example?  Or am I thinking too low level?

We've spoken before about the parsing larger incoming and chunked data thing before.  I would love to address this, but right now it seems like something we need better agreement on in the psgi level.  For example, since star man already buffers incoming input, it feels silly to me to have catalyst then try to re-stream that.  You've already paid the full price of buffering in terms of memory, and performance right?  Or am I not understanding?

I'd really like to have something at the Catalyst level that sanely acheives this end, but I think part of the price we paid when going to PSGi at the core, is that most of the popular plack handlers are pre loading and buffering input, even large request input.  This seems to be an area where it might behoove us to work with the psgi group to find something stable.  Even the optional psgix.io isn't always going to work out, since some people don't want to support that in the handler (its a somewhat vague definition I guess and makes people uncomfortable).

Until them, or until someone helps me understand that my thinking is totally wrong on this score, it seems the best thing to do is to put this out of scope for now.  That way we can move on supporting a goodly number of real use cases.

I intended to say that $_ equals a string that is the buffered request body.  This way we can reserve other args for handling the future streaming case.  I was actually pondering something were the sub ref returns a sub ref that gets called over and over to do the parse.  

>
>
>
>
>In some apps I'm using Catalyst::Action::REST and others I have some custom code where I use HTTP::Body to parse JSON.   I'm mixed about having the request data end up in $c->req->parameters vs $c->req->data.    I don't really see why form-data/urlencoded should be treated differently than other encodings like JSON.
>

I think my idea with using a new request attribute 'body_data' was so that we'd cleanly separate classic style params.  also, there is no certainty or expectation that body_data contain a hash ref, it could be an object, an arrayref (think if someone uploads a CVS file for example), so there's no easy way to map this to body_parameters.  I know the rails people tried to do this, and it did lead to some security problems, if I recall properly.

I did for a long time think it would be idea for them to be one and all the same, but my thinking 'evolved' based on the issues described.  For what it is worth 'body_data' was suggested by mst, if that carries any weight.

>
>I not quite sure about $c->res->body( \%data );   I think body should be the raw body.   What does $c->res->body return?  The serialized json?  The original hashref?   
>
>

I'm not sure I like it either.  I would say body returns whatever you set it to, until the point were encoding happens.  It does feel a bit flaky, but I can't actually put my finger on a real code smell here.

Any other suggestions?  This is certainly a part of the proposal that is going to raise doubt, but I can't think of something better, or assemble problematic use cases in my head over it either.

One thinking I had here was to just overload the response->format to allow for just having a hashref or array ref,  Then ->body would get populated immediately based on that (and you'd get a decent place to catch exceptions as well.

Any other thoughts?  I think the idea here was to start with the most simple thing I can think of and make it more complex as needed.

>
>
>If a parser dies what kind of exception is thrown?   You say they would not set any response status, but wouldn't we want to catch the error and then set a 400?  (I use exception objects that carry http status, a message to return in the body and a message used for logging at a given level.)
>

How people do exceptions in Perl tends to be nearly religious, and I didn't want to hold this up based on figuring that stuff out :)  I was thinking to just raise an exception and let the existing Catalyst stuff do its thing.  I'm just thinking not to add anything special for this type of error, but just do the existing behavior, for better or worse.

>
>I'm not sure what to think of the Provides and Consumes controller attributes.   It's one thing to realize early that the client cannot handle any response I'm willing to provide (json, yaml, whatever), but I'm worried this would blur separation of concerns.  That is, by the time the controller runs we would have already decoded the request and want to just work with the data provided.   And would we want different data returned if the client is asking for json vs yaml?
>
>
>Maybe I'm missing the point of that.
>

I think the main thing is to do the match early, rather than perform the entire action or action chain, and then find out that in the end we don't have an agreed upon data exchange format.

since request->body_data is intended to be lazy, we won't run that parse code until you ask for the data.  We don't need to parse the data to do the basic match here, this is just based on the HTTP meta data, no the actual content.  I think for common cases this is fine (I realize that yet again this might not be the best approach for multipart uploads...)

I think we do want to try and give us a baby step toward a dispatch based not just on path args and http method, but allow one to do real content negotiation via the actions meta data.  That way if you want to return code differently based on if the request accepts json version some other format, we can take advantage of Catalyst's built in dispatching for that.

For example, you might format the data differently if the request is Atom, verses JSON (or you might set your HTTP headers differently to make up for how some types of encoding lack information.  For example, its common in REST circles to use the HTTP link header when using some types of formats.

I do think there is a strong use case to say that they data returned can vary based on the request accepts.

>
>
>
>BTW - I practice I find it pretty handy to be able to specify/override response encoding via a query param.

Yup, I know that is common, but I think I'd prefer to handle that as Plack middleware instead of baking it into catalyst.  I actually baked in all those http header overrides for the http method when I did the support for dispatch based on Method, and I regret it (it will need to be removed and refactored eventually)

>
>-- 
>Bill Moseley
>moseley at hank.org
>_______________________________________________
>
>List: Catalyst at lists.scsys.co.uk
>Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
>Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
>Dev site: http://dev.catalyst.perl.org/
>
>
>