[Catalyst] CSV / UTF-8 / Unicode

Craig Chant craig at homeloanpartnership.com
Wed Jul 3 10:18:57 GMT 2013


Hi all , thanks for the replies.



>> $xls is a scalar you have in memory of the comma separated value data?



Yes, correct, and when output direct to $c->response->body , catalyst serve=
r falls over with an out of memory error and reboots itself. I had an IRC d=
iscussion about this and it was acknowledged as a limitation of Catalyst an=
d the solution was to wrap it up in a IO:File object.



>> but how Windows decides how to open files is something I'm not familiar =
with.
>> That depends on the Browser setting per mime type and not the OS.
>> In Firefox you can define it in Preferences / Applications.



Perhaps, but it appears default behaviour in IE / FF / Opera / Chrome to op=
en CSV as an XLS spread sheet in Excel seems to require that MIME type and =
works exactly as desired when used.



Reading the RFC link it states no MIME has ever officially been registered =
of CSV but does give the text/csv MIME type, yet as mentioned I didn't get =
the desired behaviour when used, is there an issue with using the MIME type=
 I have, if it is yielding desired results?



>> Maybe write a standalone test and take Catalyst and browser quirks out o=
f the picture.



I have already done this, I have two SQL wrapper modules one that uses DBI:=
:DBD::ODBC and one that uses Win32::ODBC, I applied it to the same standalo=
ne script that produces CSV output, the only difference between the test wa=
s one test accessed SQL with the DBI SQL wrapper and one test accessed SQL =
with the Win32::ODBC SQL wrapper, DBI outputted junk chars, Win32::ODBC did=
n't. What else should I be doing to test for the culprit of the corruption?



>> Also, you are aware that your data will probably be coming back as UCS2 =
if you're using SQL Server right?

No, what is UCS2 and is this handled differently in DBI::DBD::ODBC vs Win32=
::ODBC ?



Thanks,

Craig





________________________________
From: Anthony Lucas [anthonyjlucas at gmail.com]
Sent: 03 July 2013 06:42
To: The elegant MVC web framework
Subject: Re: [Catalyst] CSV / UTF-8 / Unicode

I don't see how DBI can be corrupting your data. DBD::ODBC, or more specifi=
cally the actual driver it is using will be more suspect in that case.

In either case, setting DBI's trace flag to something high should provide s=
ome insight  who's doing what, and you can make sure they are being handled=
 as wide characters. ODBC drivers are renowned for providing incomplete API=
s and lying about stuff.

Maybe write a standalone test and take Catalyst and browser quirks out of t=
he picture.
Also, you are aware that your data will probably be coming back as UCS2 if =
you're using SQL Server right?


On 2 July 2013 17:29, Craig Chant <craig at homeloanpartnership.com<mailto:cra=
ig at homeloanpartnership.com>> wrote:

>> All the above seems overkill.   I suspect what you want is closer to thi=
s: (but see notes below).



Tried that, didn=92t work, ended up in a long Catalyst discussion where it =
was worked out that I needed to wrap any XLS output to an IO:FILE handle ot=
herwise Catalyst dies with an =93out of memory=94 error something to do wit=
h streaming data support issues in Catalyst so the work round is to wrap th=
e output into an IO:File object.



>>Second, be aware that $c->response->content_length(length($xls));



Yes, I was doing the encode then using Length (I did read on perldocs about=
 requesting the length against the octet) , either way, the length was the =
least of my worries, keeping Catalyst from falling over with =91Wide Charac=
ter=92 errors, or not getting garbage was my main concern.



And yes, the output is CSV not strictly XLS but I have been told and looked=
 it up on the net that 'application/vnd.ms-excel'  Is the correct MIME head=
er to pass for CSV that you want MS Excel to open.



Of course, if I have the wrong MIME header for CSV -> MS Excel please can y=
ou provide the correct one, as it took me a long time to find that one, as =
the box standard =91text/csv=92 does not work properly when opened in MS Ex=
cel.



Though as it appears DBI is corrupting my Unicode data, it might be related=
 to that rather than CSV->MS Excel per sae!





From: Bill Moseley [mailto:moseley at hank.org<mailto:moseley at hank.org>]
Sent: 02 July 2013 16:59

To: The elegant MVC web framework
Subject: Re: [Catalyst] CSV / UTF-8 / Unicode





On Tue, Jul 2, 2013 at 2:59 AM, Craig Chant <craig at homeloanpartnership.com<=
mailto:craig at homeloanpartnership.com>> wrote:

        # output header
        $c->response->content_type('application/vnd.ms-excel');
        $c->response->content_length(length($xls));
        $c->response->header(Content_Disposition =3D> 'attachment;filename=
=3DNBCS_Export.csv');

        # create an IO::File for Catalyst
        use IO::File;
        my $iof =3D IO::File->new;

        $iof->open(\$xls, "r");
        $iof->binmode(":encoding(UTF-8)");

        # output XLS data
        $c->response->body($iof);



All the above seems overkill.   I suspect what you want is closer to this: =
(but see notes below).



        $c->response->content_type('text/csv');

        $c->response->body($xls);
        $c->response->header(Content_Disposition =3D> 'attachment;filename=
=3DNBCS_Export.csv');



Then with that content type the plugin would encode $xls as utf8 and add ;c=
harset=3Dutf8 (or whatever it is configured to encode as).



Notes:



First, you are not returning Excel, so the content type is not what you fir=
st listed above, right?



Second, be aware that $c->response->content_length(length($xls)); could be =
very wrong.  If $xls is really CSV text AND it's decoded then length($xls) =
is the length in characters, not octets.   Don't set the content length.





Third, Catalyst::Plugin::Unicode::Encoding, IMO, has some issues.



The plugin limits to just these content types.



    return $c->next::method(@_)

      unless $c->response->content_type =3D~ /^text|xml$|javascript$/;



Then it does this:



    $c->response->body( $c->encoding->encode( $body, $CHECK ) )

        if ref(\$body) eq 'SCALAR';



Personally, I think the correct approach is to only encode character data -=
- that is check to see if the utf8 flag is set before calling encode.



Maybe limit to the content types listed above, but throw an exception for o=
ther content types where the body is a scalar AND has the utf8 flag on.  Af=
ter all, we can only write out octets or else we get the Wide Character err=
or.













--
Bill Moseley
moseley at hank.org<mailto:moseley at hank.org>

This Email and any attachments contain confidential information and is inte=
nded solely for the individual to whom it is addressed. If this Email has b=
een misdirected, please notify the author as soon as possible. If you are n=
ot the intended recipient you must not disclose, distribute, copy, print or=
 rely on any of the information contained, and all copies must be deleted i=
mmediately. Whilst we take reasonable steps to try to identify any software=
 viruses, any attachments to this e-mail may nevertheless contain viruses, =
which our anti-virus software has failed to identify. You should therefore =
carry out your own anti-virus checks before opening any documents. HomeLoan=
 Partnership will not accept any liability for damage caused by computer vi=
ruses emanating from any attachment or other document supplied with this e-=
mail. HomeLoan Partnership reserves the right to monitor and archive all e-=
mail communications through its network. No representative or employee of H=
omeLoan Partnership has the authority to enter into any contract on behalf =
of HomeLoan Partnership by email. HomeLoan Partnership is a trading name of=
 H L Partnership Limited, registered in England and Wales with Registration=
 Number 5011722. Registered office: 26-34 Old Street, London, EC1V 9QQ. H L=
 Partnership Limited is authorised and regulated by the Financial Conduct A=
uthority.

_______________________________________________
List: Catalyst at lists.scsys.co.uk<mailto:Catalyst at lists.scsys.co.uk>
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


This Email and any attachments contain confidential information and is inte=
nded solely for the individual to whom it is addressed. If this Email has b=
een misdirected, please notify the author as soon as possible. If you are n=
ot the intended recipient you must not disclose, distribute, copy, print or=
 rely on any of the information contained, and all copies must be deleted i=
mmediately. Whilst we take reasonable steps to try to identify any software=
 viruses, any attachments to this e-mail may nevertheless contain viruses, =
which our anti-virus software has failed to identify. You should therefore =
carry out your own anti-virus checks before opening any documents. HomeLoan=
 Partnership will not accept any liability for damage caused by computer vi=
ruses emanating from any attachment or other document supplied with this e-=
mail. HomeLoan Partnership reserves the right to monitor and archive all e-=
mail communications through its network. No representative or employee of H=
omeLoan Partnership has the authority to enter into any contract on behalf =
of HomeLoan Partnership by email. HomeLoan Partnership is a trading name of=
 H L Partnership Limited, registered in England and Wales with Registration=
 Number 5011722. Registered office: 26-34 Old Street, London, EC1V 9QQ. H L=
 Partnership Limited is authorised and regulated by the Financial Conduct A=
uthority.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20130703/a0579=
ebc/attachment.htm


More information about the Catalyst mailing list