[Catalyst] CSV / UTF-8 / Unicode

Craig Chant craig at homeloanpartnership.com
Thu Jul 4 08:56:15 GMT 2013


Thanks Anthony,



I will try your suggestions when I'm back in the office tomorrow, I'm on a =
study day today for my OU course.



Would be good to track this down and fix without having to refactor to Win3=
2::ODBC , I eventually want to look at replacing my own DBI wrapper for DBI=
C ORM but am concerned this wouldn't be possible if I cant get DBI to play =
ball with MS.



>> Your data is being stored in Unicode data typed columns right?



Yes it's NVARCHAR(max) , which I understood is MS's data-typing for uNicode=
 VARiable CHARacters, looking at some sample column data via the Windows SQ=
L Management GUI, it appears to display ok.



I know that the data being pasted into it is coming from an MS Access front=
 end application that is linked to the same backend SQL server.



I also know that this is a memo / rich text input box control on the form (=
view) bound directly to the table column via a linked table definition with=
 the backend SQL server and some of what they enter they copy/paste from em=
ails and MS Word documents (and possibly PDF)



I can't see any odd characters looking at a small amount of sample data on =
the SQL server, and the data comes out of Win32::ODBC looking ok too.



>From what I can tell the data is in Unicode during capture and storage, it=
's just the retrieval with DBI  where it seems to be breaking down.



I have to include a longread setting when using DBI::DBD::ODBC with SQL  al=
ready, otherwise it falls over with the data being to long, so perhaps ther=
e is another parameter I need?



I really appreciate all the help you guys have given so far, thank you.



Regards,



Craig





________________________________
From: Anthony Lucas [anthonyjlucas at gmail.com]
Sent: 04 July 2013 01:09
To: The elegant MVC web framework
Subject: Re: [Catalyst] CSV / UTF-8 / Unicode


On 3 July 2013 11:18, Craig Chant <craig at homeloanpartnership.com<mailto:cra=
ig at homeloanpartnership.com>> wrote:

>> Maybe write a standalone test and take Catalyst and browser quirks out o=
f the picture.

I have already done this, I have two SQL wrapper modules one that uses DBI:=
:DBD::ODBC and one that uses Win32::ODBC, I applied it to the same standalo=
ne script that produces CSV output, the only difference between the test wa=
s one test accessed SQL with the DBI SQL wrapper and one test accessed SQL =
with the Win32::ODBC SQL wrapper, DBI outputted junk chars, Win32::ODBC did=
n't. What else should I be doing to test for the culprit of the corruption?

You need to see how they are using the ODBC API underneath for handling the=
 data and encoding.
Setting the trace flag on DBI (i.e. DBI->trace(n)) will expose the DBD::ODB=
C activity. I'm not sure of the debugging available for Win32::ODBC.

One thing I would check first is what they are treating the column data as.=
 If DBD::ODBC is treating the columns as WCHAR but Win32::ODBC is treating =
them as CHAR and then doing extra "magic" decoding (or not), well then you'=
ve found a big clue. There has to be different handling or differing levels=
 of ODBC support somewhere.

I would assume that DBD::ODBC is doing "the right thing", and something els=
e is amiss upstream (but well, never assume with Unicode handling, so make =
sure with the trace).



>> Also, you are aware that your data will probably be coming back as UCS2 =
if you're using SQL Server right?

No, what is UCS2 and is this handled differently in DBI::DBD::ODBC vs Win32=
::ODBC ?


>From what I understand, is ultimately what you've got happening?:
Original Input Data -> SQL Client -> Database Driver -> Database (UCS2) -> =
Windows ODBC Driver -> DBD::ODBC -> Catalyst(?)

If so, since you're storing the data as Unicode and the database driver kno=
ws this (because your column type is NVARCHAR etc.), conversion to UCS2 hap=
pens at the driver stage on Windows. This is lossless between the different=
 Unicodes, so just make sure your input is actual good Unicode up to that p=
oint and your data is being stored correctly.

Your data is being stored in Unicode data typed columns right?





This Email and any attachments contain confidential information and is inte=
nded solely for the individual to whom it is addressed. If this Email has b=
een misdirected, please notify the author as soon as possible. If you are n=
ot the intended recipient you must not disclose, distribute, copy, print or=
 rely on any of the information contained, and all copies must be deleted i=
mmediately. Whilst we take reasonable steps to try to identify any software=
 viruses, any attachments to this e-mail may nevertheless contain viruses, =
which our anti-virus software has failed to identify. You should therefore =
carry out your own anti-virus checks before opening any documents. HomeLoan=
 Partnership will not accept any liability for damage caused by computer vi=
ruses emanating from any attachment or other document supplied with this e-=
mail. HomeLoan Partnership reserves the right to monitor and archive all e-=
mail communications through its network. No representative or employee of H=
omeLoan Partnership has the authority to enter into any contract on behalf =
of HomeLoan Partnership by email. HomeLoan Partnership is a trading name of=
 H L Partnership Limited, registered in England and Wales with Registration=
 Number 5011722. Registered office: 26-34 Old Street, London, EC1V 9QQ. H L=
 Partnership Limited is authorised and regulated by the Financial Conduct A=
uthority.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/catalyst/attachments/20130704/a4564=
e3b/attachment.htm


More information about the Catalyst mailing list