No subject
Sun Nov 20 20:48:46 GMT 2022
```=0D
Am Di 21. Nov 2017, 07:07:59, ilmari at ilmari.org schrieb:=0D
> "Felix Antonius Wilhelm Ostmann via RT"=0D
> <bug-DBIx-Class-Schema-Loader at rt.cpan.org> writes:=0D
> =0D
> > It is not really the same ...=0D
> =0D
> The _internal_ representation is not the same; the \x from will be=0D
> represented internally as one byte per code point ("downgraded"),=0D
> while=0D
> the literal form will be utf-8-encoded ("upgraded"). Semantically they=0D=
> are the same, as evidenced by "eq" returning true.=0D
> =0D
> > In the real code i have to make a Encode::decode('ISO-8859-15',=0D
> > $enum) as a quickfix.=0D
> =0D
> Please show where in the real code you have to do this. It smells=0D
> like=0D
> something you're passing it to suffering from the Unicode Bug,=0D
> i.e. treating the characters in the 128..255 range differently=0D
> depending=0D
> on the internal representation (see=0D
> https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for=0D
> details).=0D
> =0D
> > $ cat ticket123698.pl=0D
> > use utf8;=0D
> > use 5.20.0;=0D
> > use Data::Dumper;=0D
> > say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestellt";=0D
> > print Dumper("zur\xFCckgestellt","zur=C3=83=C2=BCckgestellt");=0D
> > $ perl ticket123698.pl=0D
> > 1=0D
> > $VAR1 =3D 'zur=C3=AF=C2=BF=C2=BDckgestellt';=0D
> > $VAR2 =3D "zur\x{fc}ckgestellt";=0D
> =0D
> The different outputs here are a quirk of how Data::Dumper deals with=0D=
> downgraded vs. upgraded strings (which could be viewed as an instance=0D=
> of=0D
> the Unicode Bug, but doesn't actually affect semantics). The first=0D
> one=0D
> is showing as =C3=AF=C2=BF=C2=BD because you haven't thold perl that yo=
ur terminal=0D
> expects UTF-8-encoded strings. Adding=0D
> =0D
> use open qw(:std :utf8);=0D
> =0D
> to the script will make it apply a UTF-8 encoding layer to the=0D
> standard=0D
> input/output/error filehandles, so non-ASCII charcters show correctly.=0D=
> =0D
> - ilmari=0D
=0D
=0D
OK, here is the real world scenario with pseudo code. I am using DBIx::Cl=
ass + Catalyst + Template Toolkit=0D
=0D
ResultSet:=0D
sub enum_status {=0D
my ($self) =3D @_;=0D
# FIXME see https://rt.cpan.org/Public/Bug/Update.html?id=3D123698=0D=
return map { Encode::decode("ISO-8859-15", $_) } @{ $self->result_sou=
rce->column_info('status')->{extra}->{list} };=0D
return @{ $self->result_source->column_info('status')->{extra}->{list=
} };=0D
}=0D
=0D
Catalyst-Controller:=0D
$c->stash->{status_order} =3D [ $rs->enum_status ];=0D
=0D
Template:=0D
[% FOREACH status IN status_order %]=0D
<a href=3D"[% c.request.uri_with({status =3D> status}) %]">=0D
[% END %]=0D
=0D
Without the FIXME the links are ISO-8859-15=0D
=0D
=0D
After reading your reply and docs about unicode-Bug i changed the code to=
the following:=0D
=0D
__PACKAGE__->column_adds(=0D
...=0D
{ =0D
data_type =3D> "enum",=0D
default_value =3D> "offen", =0D
extra =3D> {=0D
custom_type_name =3D> "enum_tasks_status",=0D
list =3D> ["offen", "erledigt", "zur\xFCckgestellt"],=0D
}, =0D
is_nullable =3D> 0, =0D
},=0D
...=0D
);=0D
...=0D
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:W4KhHAXiEW35h5XWiZwhFg=0D
utf8::upgrade($_) for @{ __PACKAGE__->column_info('status')->{extra}->{li=
st} };=0D
=0D
=0D
=0D
But in my option this is kind of a bug. Why are all other strings comming=
from the database already upgraded but not this?=0D
=0D
=0D
```=0D
=0D
=0D
-- =0D
Reply to this email directly or view it on GitHub:=0D
https://github.com/dbsrgits/dbix-class-schema-loader/issues/52=0D
You are receiving this because you are subscribed to this thread.=0D
=0D
Message ID: <dbsrgits/dbix-class-schema-loader/issues/52 at github.com>=
----==_mimepart_637a938c595dd_1b82c67021831f3
Content-Type: text/html;
charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<p></p>=0D
<p dir=3D"auto">Migrated from <a href=3D"https://rt.cpan.org/Ticket/Displ=
ay.html?id=3D123698" rel=3D"nofollow">rt.cpan.org#123698</a> (status was =
'open')</p>=0D
<p dir=3D"auto">Requestors:</p>=0D
<ul dir=3D"auto">=0D
<li><a href=3D"mailto:felix.ostmann at gmail.com">felix.ostmann at gmail.com</a=
></li>=0D
</ul>=0D
<p dir=3D"auto">From <a href=3D"mailto:felix.ostmann at gmail.com">felix.ost=
mann at gmail.com</a> on 2017-11-21 09:54:01<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">The {extra}{list} =
enum values are not correct encoded. I use the same connection settings f=
or the app itself and all data from the database are correctly encoded ex=
cept this enum.=0D
=0D
=0D
> \dT+=0D
...=0D
steinhaus_main | enum_tasks_status | enum_tasks_status | 4 | off=
en +| =0D
| | | | erl=
edigt +| =0D
| | | | zur=
=C3=83=C2=BCckgestellt | =0D
...=0D
=0D
=0D
$ grep status -C5 Tasks.pm=0D
...=0D
"status",=0D
{=0D
data_type =3D> "enum",=0D
default_value =3D> "offen",=0D
extra =3D> {=0D
custom_type_name =3D> "enum_tasks_status",=0D
list =3D> ["offen", "erledigt", "zur\xFCckgestellt"],=0D
},=0D
is_nullable =3D> 0,=0D
},=0D
...=0D
=0D
the file is in utf8 with use utf8; in the beginning so i expected:=0D
=0D
list =3D> ["offen", "erledigt", "zur=C3=83=C2=BCckgestellt"],=0D=
</code></pre>=0D
<p dir=3D"auto">From <a href=3D"mailto:ilmari+cpan at ilmari.org">ilmari+cpa=
n at ilmari.org</a> on 2017-11-21 11:08:27<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">On 2017-11-21 09:5=
4:01, felix.ostmann at gmail.com wrote:=0D
> The {extra}{list} enum values are not correct encoded. I use the sam=
e=0D
> connection settings for the app itself and all data from the databas=
e=0D
> are correctly encoded except this enum.=0D
> =0D
> =0D
> > \dT+=0D
> ...=0D
> steinhaus_main | enum_tasks_status | enum_tasks_status | 4 =
|=0D
> offen +|=0D
> | | | =
|=0D
> erledigt +|=0D
> | | | =
|=0D
> zur=C3=83=C2=BCckgestellt |=0D
> ...=0D
> =0D
> =0D
> $ grep status -C5 Tasks.pm=0D
> ...=0D
> "status",=0D
> {=0D
> data_type =3D> "enum",=0D
> default_value =3D> "offen",=0D
> extra =3D> {=0D
> custom_type_name =3D> "enum_tasks_status",=0D
> list =3D> ["offen", "erledigt", "zur\xFCckgestellt"],=0D
> },=0D
> is_nullable =3D> 0,=0D
> },=0D
> ...=0D
> =0D
> the file is in utf8 with use utf8; in the beginning so i expected:=0D=
> =0D
> list =3D> ["offen", "erledigt", "zur=C3=83=C2=BCckgestellt"],=0D
=0D
These representations of the string are equivalent:=0D
=0D
$ perl -Mutf8 -E 'say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestel=
lt"'=0D
1=0D
=0D
Schema::Loader uses Data::Dump to serialise method call arguments in the =
generated files, and it encodes all non-ASCII (and non-printable) charact=
ers using \x notation.=0D
=0D
For aesthetic reasons it might be desirable to output Unicode word charac=
ters literally too, but the current output is not incorrect.=0D
=0D
- ilmari=0D
</code></pre>=0D
<p dir=3D"auto">From <a href=3D"mailto:felix.ostmann at gmail.com">felix.ost=
mann at gmail.com</a> on 2017-11-21 11:43:13<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">Am Di 21. Nov 2017=
, 06:08:27, ilmari schrieb:=0D
> On 2017-11-21 09:54:01, felix.ostmann at gmail.com wrote:=0D
> > The {extra}{list} enum values are not correct encoded. I use th=
e same=0D
> > connection settings for the app itself and all data from the da=
tabase=0D
> > are correctly encoded except this enum.=0D
> >=0D
> >=0D
> > > \dT+=0D
> > ...=0D
> > steinhaus_main | enum_tasks_status | enum_tasks_status | =
4=0D
> > |=0D
> > offen +|=0D
> > | | |=0D=
> > |=0D
> > erledigt +|=0D
> > | | |=0D=
> > |=0D
> > zur=C3=83=C2=BCckgestellt |=0D
> > ...=0D
> >=0D
> >=0D
> > $ grep status -C5 Tasks.pm=0D
> > ...=0D
> > "status",=0D
> > {=0D
> > data_type =3D> "enum",=0D
> > default_value =3D> "offen",=0D
> > extra =3D> {=0D
> > custom_type_name =3D> "enum_tasks_status",=0D
> > list =3D> ["offen", "erledigt", "zur\xFCckgestellt"],=0D=
> > },=0D
> > is_nullable =3D> 0,=0D
> > },=0D
> > ...=0D
> >=0D
> > the file is in utf8 with use utf8; in the beginning so i expect=
ed:=0D
> >=0D
> > list =3D> ["offen", "erledigt", "zur=C3=83=C2=BCckgestellt"]=
,=0D
> =0D
> These representations of the string are equivalent:=0D
> =0D
> $ perl -Mutf8 -E 'say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgeste=
llt"'=0D
> 1=0D
> =0D
> Schema::Loader uses Data::Dump to serialise method call arguments in=
=0D
> the generated files, and it encodes all non-ASCII (and non-printable=
)=0D
> characters using \x notation.=0D
> =0D
> For aesthetic reasons it might be desirable to output Unicode word=0D=
> characters literally too, but the current output is not incorrect.=0D=
> =0D
> - ilmari=0D
=0D
It is not really the same ...=0D
=0D
In the real code i have to make a Encode::decode('ISO-8859-15', $enum) as=
a quickfix. =0D
=0D
$ cat ticket123698.pl =0D
use utf8;=0D
use 5.20.0;=0D
use Data::Dumper;=0D
say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestellt";=0D
print Dumper("zur\xFCckgestellt","zur=C3=83=C2=BCckgestellt");=0D
$ perl ticket123698.pl =0D
1=0D
$VAR1 =3D 'zur=C3=AF=C2=BF=C2=BDckgestellt';=0D
$VAR2 =3D "zur\x{fc}ckgestellt";=0D
</code></pre>=0D
<p dir=3D"auto">From <a href=3D"mailto:ilmari at ilmari.org">ilmari at ilmari.o=
rg</a> on 2017-11-21 12:07:59<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">"Felix Antonius Wi=
lhelm Ostmann via RT"=0D
<bug-DBIx-Class-Schema-Loader at rt.cpan.org> writes:=0D
=0D
> It is not really the same ...=0D
=0D
The _internal_ representation is not the same; the \x from will be=0D
represented internally as one byte per code point ("downgraded"), while=0D=
the literal form will be utf-8-encoded ("upgraded"). Semantically they=0D=
are the same, as evidenced by "eq" returning true.=0D
=0D
> In the real code i have to make a Encode::decode('ISO-8859-15', $enu=
m) as a quickfix. =0D
=0D
Please show where in the real code you have to do this. It smells like=0D=
something you're passing it to suffering from the Unicode Bug,=0D
i.e. treating the characters in the 128..255 range differently depending=0D=
on the internal representation (see=0D
https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for details).=0D=
=0D
> $ cat ticket123698.pl =0D
> use utf8;=0D
> use 5.20.0;=0D
> use Data::Dumper;=0D
> say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestellt";=0D
> print Dumper("zur\xFCckgestellt","zur=C3=83=C2=BCckgestellt");=0D
> $ perl ticket123698.pl =0D
> 1=0D
> $VAR1 =3D 'zur=C3=AF=C2=BF=C2=BDckgestellt';=0D
> $VAR2 =3D "zur\x{fc}ckgestellt";=0D
=0D
The different outputs here are a quirk of how Data::Dumper deals with=0D
downgraded vs. upgraded strings (which could be viewed as an instance of=0D=
the Unicode Bug, but doesn't actually affect semantics). The first one=0D=
is showing as =C3=AF=C2=BF=C2=BD because you haven't thold perl that your=
terminal=0D
expects UTF-8-encoded strings. Adding=0D
=0D
use open qw(:std :utf8);=0D
=0D
to the script will make it apply a UTF-8 encoding layer to the standard=0D=
input/output/error filehandles, so non-ASCII charcters show correctly.=0D=
=0D
- ilmari=0D
-- =0D
"I use RMS as a guide in the same way that a boat captain would use=0D
a lighthouse. It's good to know where it is, but you generally=0D
don't want to find yourself in the same spot." - Tollef Fog Heen=0D
</code></pre>=0D
<p dir=3D"auto">From <a href=3D"mailto:felix.ostmann at gmail.com">felix.ost=
mann at gmail.com</a> on 2017-11-21 13:35:39<br>=0D
:</p>=0D
<pre class=3D"notranslate"><code class=3D"notranslate">Am Di 21. Nov 2017=
, 07:07:59, ilmari at ilmari.org schrieb:=0D
> "Felix Antonius Wilhelm Ostmann via RT"=0D
> <bug-DBIx-Class-Schema-Loader at rt.cpan.org> writes:=0D
> =0D
> > It is not really the same ...=0D
> =0D
> The _internal_ representation is not the same; the \x from will be=0D=
> represented internally as one byte per code point ("downgraded"),=0D=
> while=0D
> the literal form will be utf-8-encoded ("upgraded"). Semantically th=
ey=0D
> are the same, as evidenced by "eq" returning true.=0D
> =0D
> > In the real code i have to make a Encode::decode('ISO-8859-15',=
=0D
> > $enum) as a quickfix.=0D
> =0D
> Please show where in the real code you have to do this. It smells=0D=
> like=0D
> something you're passing it to suffering from the Unicode Bug,=0D
> i.e. treating the characters in the 128..255 range differently=0D
> depending=0D
> on the internal representation (see=0D
> https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for=0D
> details).=0D
> =0D
> > $ cat ticket123698.pl=0D
> > use utf8;=0D
> > use 5.20.0;=0D
> > use Data::Dumper;=0D
> > say "zur\xFCckgestellt" eq "zur=C3=83=C2=BCckgestellt";=0D
> > print Dumper("zur\xFCckgestellt","zur=C3=83=C2=BCckgestellt");=0D=
> > $ perl ticket123698.pl=0D
> > 1=0D
> > $VAR1 =3D 'zur=C3=AF=C2=BF=C2=BDckgestellt';=0D
> > $VAR2 =3D "zur\x{fc}ckgestellt";=0D
> =0D
> The different outputs here are a quirk of how Data::Dumper deals wit=
h=0D
> downgraded vs. upgraded strings (which could be viewed as an instanc=
e=0D
> of=0D
> the Unicode Bug, but doesn't actually affect semantics). The first=0D=
> one=0D
> is showing as =C3=AF=C2=BF=C2=BD because you haven't thold perl that=
your terminal=0D
> expects UTF-8-encoded strings. Adding=0D
> =0D
> use open qw(:std :utf8);=0D
> =0D
> to the script will make it apply a UTF-8 encoding layer to the=0D
> standard=0D
> input/output/error filehandles, so non-ASCII charcters show correctl=
y.=0D
> =0D
> - ilmari=0D
=0D
=0D
OK, here is the real world scenario with pseudo code. I am using DBIx::Cl=
ass + Catalyst + Template Toolkit=0D
=0D
ResultSet:=0D
sub enum_status {=0D
my ($self) =3D @_;=0D
# FIXME see https://rt.cpan.org/Public/Bug/Update.html?id=3D123698=0D=
return map { Encode::decode("ISO-8859-15", $_) } @{ $self->result_=
source->column_info('status')->{extra}->{list} };=0D
return @{ $self->result_source->column_info('status')->{extr=
a}->{list} };=0D
}=0D
=0D
Catalyst-Controller:=0D
$c->stash->{status_order} =3D [ $rs->enum_status ];=0D
=0D
Template:=0D
[% FOREACH status IN status_order %]=0D
<a href=3D"[% c.request.uri_with({status =3D> status}) %]">=0D
[% END %]=0D
=0D
Without the FIXME the links are ISO-8859-15=0D
=0D
=0D
After reading your reply and docs about unicode-Bug i changed the code to=
the following:=0D
=0D
__PACKAGE__->column_adds(=0D
...=0D
{ =0D
data_type =3D> "enum",=0D
default_value =3D> "offen", =0D
extra =3D> {=0D
custom_type_name =3D> "enum_tasks_status",=0D
list =3D> ["offen", "erledigt", "zur\xFCckgestellt"],=0D
}, =0D
is_nullable =3D> 0, =0D
},=0D
...=0D
);=0D
...=0D
# DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:W4KhHAXiEW35h5XWiZwhFg=0D
utf8::upgrade($_) for @{ __PACKAGE__->column_info('status')->{extra=
}->{list} };=0D
=0D
=0D
=0D
But in my option this is kind of a bug. Why are all other strings comming=
from the database already upgraded but not this?=0D
=0D
=0D
</code></pre>=0D
=0D
<p style=3D"font-size:small;-webkit-text-size-adjust:none;color:#666;">&m=
dash;<br />Reply to this email directly, <a href=3D"https://github.com/db=
srgits/dbix-class-schema-loader/issues/52">view it on GitHub</a>, or <a h=
ref=3D"https://github.com/notifications/unsubscribe-auth/AACJ4AX4BKNDRTZB=
XSCWQHDWJKFQZANCNFSM6AAAAAASGAWIZY">unsubscribe</a>.<br />You are receivi=
ng this because you are subscribed to this thread.<img src=3D"https://git=
hub.com/notifications/beacon/AACJ4AUKW4HQ2IJUWYQDAZTWJKFQZA5CNFSM6AAAAAAS=
GAWIZ2WGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHFNWFTAU.gif" heigh=
t=3D"1" width=3D"1" alt=3D"" /><span style=3D"color: transparent; font-si=
ze: 0; display: none; visibility: hidden; overflow: hidden; opacity: 0; w=
idth: 0; height: 0; max-width: 0; max-height: 0; mso-hide: all">Message I=
D: <span><dbsrgits/dbix-class-schema-loader/issues/52</span><span>@</s=
pan><span>github</span><span>.</span><span>com></span></span></p>=0D
<script type=3D"application/ld+json">[=0D
{=0D
"@context": "http://schema.org",=0D
"@type": "EmailMessage",=0D
"potentialAction": {=0D
"@type": "ViewAction",=0D
"target": "https://github.com/dbsrgits/dbix-class-schema-loader/issues/52=
",=0D
"url": "https://github.com/dbsrgits/dbix-class-schema-loader/issues/52",=0D=
"name": "View Issue"=0D
},=0D
"description": "View this Issue on GitHub",=0D
"publisher": {=0D
"@type": "Organization",=0D
"name": "GitHub",=0D
"url": "https://github.com"=0D
}=0D
}=0D
]</script>=
----==_mimepart_637a938c595dd_1b82c67021831f3--
More information about the DBIx-Class-Devel
mailing list