[Catalyst] charset not needed for Catalyst::Action::REST?

Tomas Doran bobtfish at bobtfish.net
Sat Feb 25 09:05:15 GMT 2012


On 24 Feb 2012, at 15:50, Bill Moseley wrote:

> When using Catalyst::Action::REST the content-type response never includes a charset.  JSON seems to be handled correctly in code -- JSON strings are always UTF-8.  Does that mean there is no need to specify a charset on responses?


Theoretically, you don't need to, but I think we should.. Specifically I've heard reported encoding issues talking to some other stacks which were fixed by us doing this explicitly.

> And what if a JSON request comes in with a non-UTF8 charset?  Should that be ignored?  It's application/json, not text/json so maybe there no encoding issues?

I thought that JSON was always UTF-8, but I read the spec recently, and whilst it's always unicode, it can be encoded as utf-others also:

   JSON text SHALL be encoded in Unicode.  The default encoding is UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.


> What about other serializations?  YAML is UTF-8 or UTF-16.  Does that mean the charset needs to be included in response?  And again, if a request comes in with UTF-16 does it need to be decoded or does that happen in YAML::Syck?
> 

The latter, but yes I think the charset should also be included.

> Event text/html doesn't include a charset in a the "serialized" response.

I would think that text/html should be handled by C::P::Unicode::Encoding still, if that was present?

> Does there need to be an additional decoding and encoding layer when using Catalyst::Action::REST?  

I'm of the opinion there shouldn't need to be.

> Should I force a charset on all responses

I think we should fix this, at least for JSON and YAML where the right thing to do is entirely clear..

> BTW -- doesn't seem like YAML survies a round trip like JSON does:
> <snip>
> But YAML drops the utf8 flag:
> 
> $ perl -MYAML::Syck  -MEncode -wle 'print length(YAML::Syck::Load(YAML::Syck::Dump( ["\x{263A}"]) )->[0])'
> 3

Eugh. This works as expected with YAML and YAML::XS, I vote that we should stop using YAML::Syck as it's less maintained (and clearly has encoding issues).

Anyone have strong reasons for not doing this?

Cheers
t0m




More information about the Catalyst mailing list