[Catalyst] CSV / UTF-8 / Unicode
Craig Chant
craig at homeloanpartnership.com
Wed Jul 3 10:18:57 GMT 2013
Hi all , thanks for the replies.
>> $xls is a scalar you have in memory of the comma separated value data?
Yes, correct, and when output direct to $c->response->body , catalyst serve=
r falls over with an out of memory error and reboots itself. I had an IRC d=
iscussion about this and it was acknowledged as a limitation of Catalyst an=
d the solution was to wrap it up in a IO:File object.
>> but how Windows decides how to open files is something I'm not familiar =
with.
>> That depends on the Browser setting per mime type and not the OS.
>> In Firefox you can define it in Preferences / Applications.
Perhaps, but it appears default behaviour in IE / FF / Opera / Chrome to op=
en CSV as an XLS spread sheet in Excel seems to require that MIME type and =
works exactly as desired when used.
Reading the RFC link it states no MIME has ever officially been registered =
of CSV but does give the text/csv MIME type, yet as mentioned I didn't get =
the desired behaviour when used, is there an issue with using the MIME type=
I have, if it is yielding desired results?
>> Maybe write a standalone test and take Catalyst and browser quirks out o=
f the picture.
I have already done this, I have two SQL wrapper modules one that uses DBI:=
:DBD::ODBC and one that uses Win32::ODBC, I applied it to the same standalo=
ne script that produces CSV output, the only difference between the test wa=
s one test accessed SQL with the DBI SQL wrapper and one test accessed SQL =
with the Win32::ODBC SQL wrapper, DBI outputted junk chars, Win32::ODBC did=
n't. What else should I be doing to test for the culprit of the corruption?
>> Also, you are aware that your data will probably be coming back as UCS2 =
if you're using SQL Server right?
No, what is UCS2 and is this handled differently in DBI::DBD::ODBC vs Win32=
::ODBC ?
Thanks,
Craig
________________________________
From: Anthony Lucas [anthonyjlucas at gmail.com]
Sent: 03 July 2013 06:42
To: The elegant MVC web framework
Subject: Re: [Catalyst] CSV / UTF-8 / Unicode
I don't see how DBI can be corrupting your data. DBD::ODBC, or more specifi=
cally the actual driver it is using will be more suspect in that case.
In either case, setting DBI's trace flag to something high should provide s=
ome insight who's doing what, and you can make sure they are being handled=
as wide characters. ODBC drivers are renowned for providing incomplete API=
s and lying about stuff.
Maybe write a standalone test and take Catalyst and browser quirks out of t=
he picture.
Also, you are aware that your data will probably be coming back as UCS2 if =
you're using SQL Server right?
On 2 July 2013 17:29, Craig Chant <craig at homeloanpartnership.com<mailto:cra=
ig at homeloanpartnership.com>> wrote:
>> All the above seems overkill. I suspect what you want is closer to thi=
s: (but see notes below).
Tried that, didn=92t work, ended up in a long Catalyst discussion where it =
was worked out that I needed to wrap any XLS output to an IO:FILE handle ot=
herwise Catalyst dies with an =93out of memory=94 error something to do wit=
h streaming data support issues in Catalyst so the work round is to wrap th=
e output into an IO:File object.
>>Second, be aware that $c->response->content_length(length($xls));
Yes, I was doing the encode then using Length (I did read on perldocs about=
requesting the length against the octet) , either way, the length was the =
least of my worries, keeping Catalyst from falling over with =91Wide Charac=
ter=92 errors, or not getting garbage was my main concern.
And yes, the output is CSV not strictly XLS but I have been told and looked=
it up on the net that 'application/vnd.ms-excel' Is the correct MIME head=
er to pass for CSV that you want MS Excel to open.
Of course, if I have the wrong MIME header for CSV -> MS Excel please can y=
ou provide the correct one, as it took me a long time to find that one, as =
the box standard =91text/csv=92 does not work properly when opened in MS Ex=
cel.
Though as it appears DBI is corrupting my Unicode data, it might be related=
to that rather than CSV->MS Excel per sae!
From: Bill Moseley [mailto:moseley at hank.org<mailto:moseley at hank.org>]
Sent: 02 July 2013 16:59
To: The elegant MVC web framework
Subject: Re: [Catalyst] CSV / UTF-8 / Unicode
On Tue, Jul 2, 2013 at 2:59 AM, Craig Chant <craig at homeloanpartnership.com<=
mailto:craig at homeloanpartnership.com>> wrote:
# output header
$c->response->content_type('application/vnd.ms-excel');
$c->response->content_length(length($xls));
$c->response->header(Content_Disposition =3D> 'attachment;filename=
=3DNBCS_Export.csv');
# create an IO::File for Catalyst
use IO::File;
my $iof =3D IO::File->new;
$iof->open(\$xls, "r");
$iof->binmode(":encoding(UTF-8)");
# output XLS data
$c->response->body($iof);
All the above seems overkill. I suspect what you want is closer to this: =
(but see notes below).
$c->response->content_type('text/csv');
$c->response->body($xls);
$c->response->header(Content_Disposition =3D> 'attachment;filename=
=3DNBCS_Export.csv');
Then with that content type the plugin would encode $xls as utf8 and add ;c=
harset=3Dutf8 (or whatever it is configured to encode as).
Notes:
First, you are not returning Excel, so the content type is not what you fir=
st listed above, right?
Second, be aware that $c->response->content_length(length($xls)); could be =
very wrong. If $xls is really CSV text AND it's decoded then length($xls) =
is the length in characters, not octets. Don't set the content length.
Third, Catalyst::Plugin::Unicode::Encoding, IMO, has some issues.
The plugin limits to just these content types.
return $c->next::method(@_)
unless $c->response->content_type =3D~ /^text|xml$|javascript$/;
Then it does this:
$c->response->body( $c->encoding->encode( $body, $CHECK ) )
if ref(\$body) eq 'SCALAR';
Personally, I think the correct approach is to only encode character data -=
- that is check to see if the utf8 flag is set before calling encode.
Maybe limit to the content types listed above, but throw an exception for o=
ther content types where the body is a scalar AND has the utf8 flag on. Af=
ter all, we can only write out octets or else we get the Wide Character err=
or.
--
Bill Moseley
moseley at hank.org<mailto:moseley at hank.org>
This Email and any attachments contain confidential information and is inte=
nded solely for the individual to whom it is addressed. If this Email has b=
een misdirected, please notify the author as soon as possible. If you are n=
ot the intended recipient you must not disclose, distribute, copy, print or=
rely on any of the information contained, and all copies must be deleted i=
mmediately. Whilst we take reasonable steps to try to identify any software=
viruses, any attachments to this e-mail may nevertheless contain viruses, =
which our anti-virus software has failed to identify. You should therefore =
carry out your own anti-virus checks before opening any documents. HomeLoan=
Partnership will not accept any liability for damage caused by computer vi=
ruses emanating from any attachment or other document supplied with this e-=
mail. HomeLoan Partnership reserves the right to monitor and archive all e-=
mail communications through its network. No representative or employee of H=
omeLoan Partnership has the authority to enter into any contract on behalf =
of HomeLoan Partnership by email. HomeLoan Partnership is a trading name of=
H L Partnership Limited, registered in England and Wales with Registration=
Number 5011722. Registered office: 26-34 Old Street, London, EC1V 9QQ. H L=
Partnership Limited is authorised and regulated by the Financial Conduct A=
uthority.
_______________________________________________
List: Catalyst at lists.scsys.co.uk<mailto:Catalyst at lists.scsys.co.uk>
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
This Email and any attachments contain confidential information and is inte=
nded solely for the individual to whom it is addressed. If this Email has b=
een misdirected, please notify the author as soon as possible. If you are n=
ot the intended recipient you must not disclose, distribute, copy, print or=
rely on any of the information contained, and all copies must be deleted i=
mmediately. Whilst we take reasonable steps to try to identify any software=
viruses, any attachments to this e-mail may nevertheless contain viruses, =
which our anti-virus software has failed to identify. You should therefore =
carry out your own anti-virus checks before opening any documents. HomeLoan=
Partnership will not accept any liability for damage caused by computer vi=
ruses emanating from any attachment or other document supplied with this e-=
mail. HomeLoan Partnership reserves the right to monitor and archive all e-=
mail communications through its network. No representative or employee of H=
omeLoan Partnership has the authority to enter into any contract on behalf =
of HomeLoan Partnership by email. HomeLoan Partnership is a trading name of=
H L Partnership Limited, registered in England and Wales with Registration=
Number 5011722. Registered office: 26-34 Old Street, London, EC1V 9QQ. H L=
Partnership Limited is authorised and regulated by the Financial Conduct A=
uthority.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20130703/a0579=
ebc/attachment.htm
More information about the Catalyst
mailing list