No subject

Sun Nov 20 20:48:46 GMT 2022

```=0D
Am Di 21. Nov 2017, 07:07:59, ilmari at ilmari.org schrieb:=0D
> "Felix Antonius Wilhelm Ostmann via RT"=0D
> <bug-DBIx-Class-Schema-Loader at rt.cpan.org> writes:=0D
> =0D
> > It is not really the same ...=0D
> =0D
> The _internal_ representation is not the same; the \x from will be=0D
> represented internally as one byte per code point ("downgraded"),=0D
> while=0D
> the literal form will be utf-8-encoded ("upgraded"). Semantically they=0D=

> are the same, as evidenced by "eq" returning true.=0D
> =0D
> > In the real code i have to make a Encode::decode('ISO-8859-15',=0D
> > $enum) as a quickfix.=0D
> =0D
> Please show where in the real code you have to do this.  It smells=0D
> like=0D
> something you're passing it to suffering from the Unicode Bug,=0D
> i.e. treating the characters in the 128..255 range differently=0D
> depending=0D
> on the internal representation (see=0D
> https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for=0D
> details).=0D
> =0D
> > $ cat ticket123698.pl=0D
> > use utf8;=0D
> > use 5.20.0;=0D
> > use Data::Dumper;=0D
> > say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestellt";=0D
> > print Dumper("zur\xFCckgestellt","zur=C3=83=C2=BCckgestellt");=0D
> > $ perl ticket123698.pl=0D
> > 1=0D
> > $VAR1 =3D 'zur=C3=AF=C2=BF=C2=BDckgestellt';=0D
> > $VAR2 =3D "zur\x{fc}ckgestellt";=0D
> =0D
> The different outputs here are a quirk of how Data::Dumper deals with=0D=

> downgraded vs. upgraded strings (which could be viewed as an instance=0D=

> of=0D
> the Unicode Bug, but doesn't actually affect semantics).  The first=0D
> one=0D
> is showing as =C3=AF=C2=BF=C2=BD because you haven't thold perl that yo=
ur terminal=0D
> expects UTF-8-encoded strings.  Adding=0D
> =0D
> use open qw(:std :utf8);=0D
> =0D
> to the script will make it apply a UTF-8 encoding layer to the=0D
> standard=0D
> input/output/error filehandles, so non-ASCII charcters show correctly.=0D=

> =0D
> - ilmari=0D
=0D
=0D
OK, here is the real world scenario with pseudo code. I am using DBIx::Cl=
ass + Catalyst + Template Toolkit=0D
=0D
ResultSet:=0D
sub enum_status {=0D
    my ($self) =3D @_;=0D
    # FIXME see https://rt.cpan.org/Public/Bug/Update.html?id=3D123698=0D=

    return map { Encode::decode("ISO-8859-15", $_) } @{ $self->result_sou=
rce->column_info('status')->{extra}->{list} };=0D
    return @{ $self->result_source->column_info('status')->{extra}->{list=
} };=0D
}=0D
=0D
Catalyst-Controller:=0D
$c->stash->{status_order} =3D [ $rs->enum_status ];=0D
=0D
Template:=0D
[% FOREACH status IN status_order %]=0D
<a href=3D"[% c.request.uri_with({status =3D> status}) %]">=0D
[% END %]=0D
=0D
Without the FIXME the links are ISO-8859-15=0D
=0D
=0D
After reading your reply and docs about unicode-Bug i changed the code to=
 the following:=0D
=0D
__PACKAGE__->column_adds(=0D
...=0D
  {         =0D
    data_type =3D> "enum",=0D
    default_value =3D> "offen",  =0D
    extra =3D> {=0D
      custom_type_name =3D> "enum_tasks_status",=0D
      list =3D> ["offen", "erledigt", "zur\xFCckgestellt"],=0D
    },      =0D
    is_nullable =3D> 0,          =0D
  },=0D
...=0D
);=0D
...=0D
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:W4KhHAXiEW35h5XWiZwhFg=0D
utf8::upgrade($_) for @{ __PACKAGE__->column_info('status')->{extra}->{li=
st} };=0D
=0D
=0D
=0D
But in my option this is kind of a bug. Why are all other strings comming=
 from the database already upgraded but not this?=0D
=0D
=0D
```=0D
=0D
=0D
-- =0D
Reply to this email directly or view it on GitHub:=0D
https://github.com/dbsrgits/dbix-class-schema-loader/issues/52=0D
You are receiving this because you are subscribed to this thread.=0D
=0D
Message ID: <dbsrgits/dbix-class-schema-loader/issues/52 at github.com>=

----==_mimepart_637a938c595dd_1b82c67021831f3
Content-Type: text/html;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p></p>=0D
<p dir=3D"auto">Migrated from <a href=3D"https://rt.cpan.org/Ticket/Displ=
ay.html?id=3D123698" rel=3D"nofollow">rt.cpan.org#123698</a> (status was =
'open')</p>=0D
<p dir=3D"auto">Requestors:</p>=0D
<ul dir=3D"auto">=0D
<li><a href=3D"mailto:felix.ostmann at gmail.com">felix.ostmann at gmail.com</a=
></li>=0D
</ul>=0D
<p dir=3D"auto">From <a href=3D"mailto:felix.ostmann at gmail.com">felix.ost=
mann at gmail.com</a> on 2017-11-21 09:54:01<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">The {extra}{list} =
enum values are not correct encoded. I use the same connection settings f=
or the app itself and all data from the database are correctly encoded ex=
cept this enum.=0D
=0D
=0D
&gt; \dT+=0D
...=0D
 steinhaus_main | enum_tasks_status   | enum_tasks_status   | 4     | off=
en         +| =0D
                |                     |                     |       | erl=
edigt      +| =0D
                |                     |                     |       | zur=
=C3=83=C2=BCckgestellt | =0D
...=0D
=0D
=0D
$ grep status -C5 Tasks.pm=0D
...=0D
  "status",=0D
  {=0D
    data_type =3D&gt; "enum",=0D
    default_value =3D&gt; "offen",=0D
    extra =3D&gt; {=0D
      custom_type_name =3D&gt; "enum_tasks_status",=0D
      list =3D&gt; ["offen", "erledigt", "zur\xFCckgestellt"],=0D
    },=0D
    is_nullable =3D&gt; 0,=0D
  },=0D
...=0D
=0D
the file is in utf8 with use utf8; in the beginning so i expected:=0D
=0D
      list =3D&gt; ["offen", "erledigt", "zur=C3=83=C2=BCckgestellt"],=0D=

</code></pre>=0D
<p dir=3D"auto">From <a href=3D"mailto:ilmari+cpan at ilmari.org">ilmari+cpa=
n at ilmari.org</a> on 2017-11-21 11:08:27<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">On 2017-11-21 09:5=
4:01, felix.ostmann at gmail.com wrote:=0D
&gt; The {extra}{list} enum values are not correct encoded. I use the sam=
e=0D
&gt; connection settings for the app itself and all data from the databas=
e=0D
&gt; are correctly encoded except this enum.=0D
&gt; =0D
&gt; =0D
&gt; &gt; \dT+=0D
&gt; ...=0D
&gt;   steinhaus_main | enum_tasks_status   | enum_tasks_status   | 4    =
 |=0D
&gt; offen         +|=0D
&gt;                  |                     |                     |      =
 |=0D
&gt; erledigt      +|=0D
&gt;                  |                     |                     |      =
 |=0D
&gt; zur=C3=83=C2=BCckgestellt |=0D
&gt; ...=0D
&gt; =0D
&gt; =0D
&gt; $ grep status -C5 Tasks.pm=0D
&gt; ...=0D
&gt;   "status",=0D
&gt;   {=0D
&gt;     data_type =3D&gt; "enum",=0D
&gt;     default_value =3D&gt; "offen",=0D
&gt;     extra =3D&gt; {=0D
&gt;       custom_type_name =3D&gt; "enum_tasks_status",=0D
&gt;       list =3D&gt; ["offen", "erledigt", "zur\xFCckgestellt"],=0D
&gt;     },=0D
&gt;     is_nullable =3D&gt; 0,=0D
&gt;   },=0D
&gt; ...=0D
&gt; =0D
&gt; the file is in utf8 with use utf8; in the beginning so i expected:=0D=

&gt; =0D
&gt; list =3D&gt; ["offen", "erledigt", "zur=C3=83=C2=BCckgestellt"],=0D
=0D
These representations of the string are equivalent:=0D
=0D
    $ perl -Mutf8 -E 'say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestel=
lt"'=0D
    1=0D
=0D
Schema::Loader uses Data::Dump to serialise method call arguments in the =
generated files, and it encodes all non-ASCII (and non-printable) charact=
ers using \x notation.=0D
=0D
For aesthetic reasons it might be desirable to output Unicode word charac=
ters literally too, but the current output is not incorrect.=0D
=0D
- ilmari=0D
</code></pre>=0D
<p dir=3D"auto">From <a href=3D"mailto:felix.ostmann at gmail.com">felix.ost=
mann at gmail.com</a> on 2017-11-21 11:43:13<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">Am Di 21. Nov 2017=
, 06:08:27, ilmari schrieb:=0D
&gt; On 2017-11-21 09:54:01, felix.ostmann at gmail.com wrote:=0D
&gt; &gt; The {extra}{list} enum values are not correct encoded. I use th=
e same=0D
&gt; &gt; connection settings for the app itself and all data from the da=
tabase=0D
&gt; &gt; are correctly encoded except this enum.=0D
&gt; &gt;=0D
&gt; &gt;=0D
&gt; &gt; &gt; \dT+=0D
&gt; &gt; ...=0D
&gt; &gt;   steinhaus_main | enum_tasks_status   | enum_tasks_status   | =
4=0D
&gt; &gt; |=0D
&gt; &gt; offen         +|=0D
&gt; &gt;                  |                     |                     |=0D=

&gt; &gt; |=0D
&gt; &gt; erledigt      +|=0D
&gt; &gt;                  |                     |                     |=0D=

&gt; &gt; |=0D
&gt; &gt; zur=C3=83=C2=BCckgestellt |=0D
&gt; &gt; ...=0D
&gt; &gt;=0D
&gt; &gt;=0D
&gt; &gt; $ grep status -C5 Tasks.pm=0D
&gt; &gt; ...=0D
&gt; &gt;   "status",=0D
&gt; &gt;   {=0D
&gt; &gt;     data_type =3D&gt; "enum",=0D
&gt; &gt;     default_value =3D&gt; "offen",=0D
&gt; &gt;     extra =3D&gt; {=0D
&gt; &gt;       custom_type_name =3D&gt; "enum_tasks_status",=0D
&gt; &gt;       list =3D&gt; ["offen", "erledigt", "zur\xFCckgestellt"],=0D=

&gt; &gt;     },=0D
&gt; &gt;     is_nullable =3D&gt; 0,=0D
&gt; &gt;   },=0D
&gt; &gt; ...=0D
&gt; &gt;=0D
&gt; &gt; the file is in utf8 with use utf8; in the beginning so i expect=
ed:=0D
&gt; &gt;=0D
&gt; &gt; list =3D&gt; ["offen", "erledigt", "zur=C3=83=C2=BCckgestellt"]=
,=0D
&gt; =0D
&gt; These representations of the string are equivalent:=0D
&gt; =0D
&gt; $ perl -Mutf8 -E 'say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgeste=
llt"'=0D
&gt; 1=0D
&gt; =0D
&gt; Schema::Loader uses Data::Dump to serialise method call arguments in=
=0D
&gt; the generated files, and it encodes all non-ASCII (and non-printable=
)=0D
&gt; characters using \x notation.=0D
&gt; =0D
&gt; For aesthetic reasons it might be desirable to output Unicode word=0D=

&gt; characters literally too, but the current output is not incorrect.=0D=

&gt; =0D
&gt; - ilmari=0D
=0D
It is not really the same ...=0D
=0D
In the real code i have to make a Encode::decode('ISO-8859-15', $enum) as=
 a quickfix. =0D
=0D
$ cat ticket123698.pl =0D
use utf8;=0D
use 5.20.0;=0D
use Data::Dumper;=0D
say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestellt";=0D
print Dumper("zur\xFCckgestellt","zur=C3=83=C2=BCckgestellt");=0D
$ perl ticket123698.pl =0D
1=0D
$VAR1 =3D 'zur=C3=AF=C2=BF=C2=BDckgestellt';=0D
$VAR2 =3D "zur\x{fc}ckgestellt";=0D
</code></pre>=0D
<p dir=3D"auto">From <a href=3D"mailto:ilmari at ilmari.org">ilmari at ilmari.o=
rg</a> on 2017-11-21 12:07:59<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">"Felix Antonius Wi=
lhelm Ostmann via RT"=0D
&lt;bug-DBIx-Class-Schema-Loader at rt.cpan.org&gt; writes:=0D
=0D
&gt; It is not really the same ...=0D
=0D
The _internal_ representation is not the same; the \x from will be=0D
represented internally as one byte per code point ("downgraded"), while=0D=

the literal form will be utf-8-encoded ("upgraded"). Semantically they=0D=

are the same, as evidenced by "eq" returning true.=0D
=0D
&gt; In the real code i have to make a Encode::decode('ISO-8859-15', $enu=
m) as a quickfix. =0D
=0D
Please show where in the real code you have to do this.  It smells like=0D=

something you're passing it to suffering from the Unicode Bug,=0D
i.e. treating the characters in the 128..255 range differently depending=0D=

on the internal representation (see=0D
https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for details).=0D=

=0D
&gt; $ cat ticket123698.pl =0D
&gt; use utf8;=0D
&gt; use 5.20.0;=0D
&gt; use Data::Dumper;=0D
&gt; say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestellt";=0D
&gt; print Dumper("zur\xFCckgestellt","zur=C3=83=C2=BCckgestellt");=0D
&gt; $ perl ticket123698.pl =0D
&gt; 1=0D
&gt; $VAR1 =3D 'zur=C3=AF=C2=BF=C2=BDckgestellt';=0D
&gt; $VAR2 =3D "zur\x{fc}ckgestellt";=0D
=0D
The different outputs here are a quirk of how Data::Dumper deals with=0D
downgraded vs. upgraded strings (which could be viewed as an instance of=0D=

the Unicode Bug, but doesn't actually affect semantics).  The first one=0D=

is showing as =C3=AF=C2=BF=C2=BD because you haven't thold perl that your=
 terminal=0D
expects UTF-8-encoded strings.  Adding=0D
=0D
    use open qw(:std :utf8);=0D
=0D
to the script will make it apply a UTF-8 encoding layer to the standard=0D=

input/output/error filehandles, so non-ASCII charcters show correctly.=0D=

=0D
- ilmari=0D
-- =0D
"I use RMS as a guide in the same way that a boat captain would use=0D
 a lighthouse.  It's good to know where it is, but you generally=0D
 don't want to find yourself in the same spot." - Tollef Fog Heen=0D
</code></pre>=0D
<p dir=3D"auto">From <a href=3D"mailto:felix.ostmann at gmail.com">felix.ost=
mann at gmail.com</a> on 2017-11-21 13:35:39<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">Am Di 21. Nov 2017=
, 07:07:59, ilmari at ilmari.org schrieb:=0D
&gt; "Felix Antonius Wilhelm Ostmann via RT"=0D
&gt; &lt;bug-DBIx-Class-Schema-Loader at rt.cpan.org&gt; writes:=0D
&gt; =0D
&gt; &gt; It is not really the same ...=0D
&gt; =0D
&gt; The _internal_ representation is not the same; the \x from will be=0D=

&gt; represented internally as one byte per code point ("downgraded"),=0D=

&gt; while=0D
&gt; the literal form will be utf-8-encoded ("upgraded"). Semantically th=
ey=0D
&gt; are the same, as evidenced by "eq" returning true.=0D
&gt; =0D
&gt; &gt; In the real code i have to make a Encode::decode('ISO-8859-15',=
=0D
&gt; &gt; $enum) as a quickfix.=0D
&gt; =0D
&gt; Please show where in the real code you have to do this.  It smells=0D=

&gt; like=0D
&gt; something you're passing it to suffering from the Unicode Bug,=0D
&gt; i.e. treating the characters in the 128..255 range differently=0D
&gt; depending=0D
&gt; on the internal representation (see=0D
&gt; https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for=0D
&gt; details).=0D
&gt; =0D
&gt; &gt; $ cat ticket123698.pl=0D
&gt; &gt; use utf8;=0D
&gt; &gt; use 5.20.0;=0D
&gt; &gt; use Data::Dumper;=0D
&gt; &gt; say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestellt";=0D
&gt; &gt; print Dumper("zur\xFCckgestellt","zur=C3=83=C2=BCckgestellt");=0D=

&gt; &gt; $ perl ticket123698.pl=0D
&gt; &gt; 1=0D
&gt; &gt; $VAR1 =3D 'zur=C3=AF=C2=BF=C2=BDckgestellt';=0D
&gt; &gt; $VAR2 =3D "zur\x{fc}ckgestellt";=0D
&gt; =0D
&gt; The different outputs here are a quirk of how Data::Dumper deals wit=
h=0D
&gt; downgraded vs. upgraded strings (which could be viewed as an instanc=
e=0D
&gt; of=0D
&gt; the Unicode Bug, but doesn't actually affect semantics).  The first=0D=

&gt; one=0D
&gt; is showing as =C3=AF=C2=BF=C2=BD because you haven't thold perl that=
 your terminal=0D
&gt; expects UTF-8-encoded strings.  Adding=0D
&gt; =0D
&gt; use open qw(:std :utf8);=0D
&gt; =0D
&gt; to the script will make it apply a UTF-8 encoding layer to the=0D
&gt; standard=0D
&gt; input/output/error filehandles, so non-ASCII charcters show correctl=
y.=0D
&gt; =0D
&gt; - ilmari=0D
=0D
=0D
OK, here is the real world scenario with pseudo code. I am using DBIx::Cl=
ass + Catalyst + Template Toolkit=0D
=0D
ResultSet:=0D
sub enum_status {=0D
    my ($self) =3D @_;=0D
    # FIXME see https://rt.cpan.org/Public/Bug/Update.html?id=3D123698=0D=

    return map { Encode::decode("ISO-8859-15", $_) } @{ $self-&gt;result_=
source-&gt;column_info('status')-&gt;{extra}-&gt;{list} };=0D
    return @{ $self-&gt;result_source-&gt;column_info('status')-&gt;{extr=
a}-&gt;{list} };=0D
}=0D
=0D
Catalyst-Controller:=0D
$c-&gt;stash-&gt;{status_order} =3D [ $rs-&gt;enum_status ];=0D
=0D
Template:=0D
[% FOREACH status IN status_order %]=0D
&lt;a href=3D"[% c.request.uri_with({status =3D&gt; status}) %]"&gt;=0D
[% END %]=0D
=0D
Without the FIXME the links are ISO-8859-15=0D
=0D
=0D
After reading your reply and docs about unicode-Bug i changed the code to=
 the following:=0D
=0D
__PACKAGE__-&gt;column_adds(=0D
...=0D
  {         =0D
    data_type =3D&gt; "enum",=0D
    default_value =3D&gt; "offen",  =0D
    extra =3D&gt; {=0D
      custom_type_name =3D&gt; "enum_tasks_status",=0D
      list =3D&gt; ["offen", "erledigt", "zur\xFCckgestellt"],=0D
    },      =0D
    is_nullable =3D&gt; 0,          =0D
  },=0D
...=0D
);=0D
...=0D
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:W4KhHAXiEW35h5XWiZwhFg=0D
utf8::upgrade($_) for @{ __PACKAGE__-&gt;column_info('status')-&gt;{extra=
}-&gt;{list} };=0D
=0D
=0D
=0D
But in my option this is kind of a bug. Why are all other strings comming=
 from the database already upgraded but not this?=0D
=0D
=0D
</code></pre>=0D
=0D
<p style=3D"font-size:small;-webkit-text-size-adjust:none;color:#666;">&m=
dash;<br />Reply to this email directly, <a href=3D"https://github.com/db=
srgits/dbix-class-schema-loader/issues/52">view it on GitHub</a>, or <a h=
ref=3D"https://github.com/notifications/unsubscribe-auth/AACJ4AX4BKNDRTZB=
XSCWQHDWJKFQZANCNFSM6AAAAAASGAWIZY">unsubscribe</a>.<br />You are receivi=
ng this because you are subscribed to this thread.<img src=3D"https://git=
hub.com/notifications/beacon/AACJ4AUKW4HQ2IJUWYQDAZTWJKFQZA5CNFSM6AAAAAAS=
GAWIZ2WGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHFNWFTAU.gif" heigh=
t=3D"1" width=3D"1" alt=3D"" /><span style=3D"color: transparent; font-si=
ze: 0; display: none; visibility: hidden; overflow: hidden; opacity: 0; w=
idth: 0; height: 0; max-width: 0; max-height: 0; mso-hide: all">Message I=
D: <span>&lt;dbsrgits/dbix-class-schema-loader/issues/52</span><span>@</s=
pan><span>github</span><span>.</span><span>com&gt;</span></span></p>=0D
<script type=3D"application/ld+json">[=0D
{=0D
"@context": "http://schema.org",=0D
"@type": "EmailMessage",=0D
"potentialAction": {=0D
"@type": "ViewAction",=0D
"target": "https://github.com/dbsrgits/dbix-class-schema-loader/issues/52=
",=0D
"url": "https://github.com/dbsrgits/dbix-class-schema-loader/issues/52",=0D=

"name": "View Issue"=0D
},=0D
"description": "View this Issue on GitHub",=0D
"publisher": {=0D
"@type": "Organization",=0D
"name": "GitHub",=0D
"url": "https://github.com"=0D
}=0D
}=0D
]</script>=

----==_mimepart_637a938c595dd_1b82c67021831f3--