From mboxrd@z Thu Jan  1 00:00:00 1970
From: Michael Tremer <michael.tremer@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: [PATCH v2 2/2] dns.cgi: Fixes bug#12395 - German umlauts not
 correctly displayed in remarks
Date: Tue, 12 Mar 2024 14:56:28 +0000
Message-ID: <AD19D12D-19A0-41DF-B477-8FDC613FF6FA@ipfire.org>
In-Reply-To: <37fc4478-061b-4273-b8d7-d8e2f6bceac2@ipfire.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1303106120820928946=="
List-Id: <development.lists.ipfire.org>

--===============1303106120820928946==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Hello,

> On 12 Mar 2024, at 12:27, Adolf Belka <adolf.belka(a)ipfire.org> wrote:
>=20
> Hi Michael,
>=20
> On 12/03/2024 11:02, Michael Tremer wrote:
>> Thank you.
>> I merged this for now so that we can fix this problem quickly.
>> However I was wondering whether we should consider making the decode state=
ment a part of the =E2=80=9Ccleanhtml=E2=80=9D function.
> That makes a lot of sense. It would also mean that the problem of umlauts e=
tc would be fixed everywhere that cleanhtml is used rather than needing to fi=
x every invocation of cleanhtml.
>=20
> I will look at putting something together for that.

This is a common problem with any kind of software. I suppose a lot of people=
 are used to it and leave out any special characters where possible. I genera=
lly tend to comment in English because it=E2=80=99s shorter, doesn=E2=80=99t =
have the special character issue and allows customers to hire any staff where=
 German isn=E2=80=99t their first language for example. For international cus=
tomers I am using English anyways. Hence I never really ran into this problem.

>> I am still unsure why this is happening in the first place. We should be r=
eceiving UTF-8 from the browser, and I believe that perl doesn=E2=80=99t nati=
vely store things in UTF-8. That is however not a problem, because it should =
read files the same way it wrote them and so there should not be any differen=
ce when we re-read the configuration files. Unless some parts of the code spe=
cify any kind of encoding.
> We do receive UTF-8 from the browser. The problem seems to be that the HTML=
::Entities::encode_entities command doesn't work with UTF-8 but with ISO-8859=
-1 encoding. I can't find where I found this the other day when I was searchi=
ng on this topic to understand how to overcome it.

Ah, yeah that would make sense. So basically we store it correctly but once w=
e pass it through cleanhtml() we mess it up. Okay. That is good to know.

So we just need to encode/decode back to UTF-8.

> The fix is not encoding the text from the browser remark box into UTF-8 but=
 decoding it from UTF-8. Once the text is in the files then it is fine.

We should always us UTF-8 everywhere. We might have some problems with Chines=
e or so (no idea really), but for 99% of our user-base UTF-8 should work fine.

> Of course my reasoning for doing the decoding may or may not be right, so I=
 am always open to alternative suggestions.

I believe you found the right path :)

-Michael

> Regards,
>=20
> Adolf.
>> -Michael
>>> On 11 Mar 2024, at 12:19, Adolf Belka <adolf.belka(a)ipfire.org> wrote:
>>>=20
>>> - If Freifunk M=C3=BCnchen e.V. is entered as a remark it gets converted =
to
>>>   Freifunk M=C3=83=C2=BCnchen e.V.
>>> - This is because cleanhtml is used on the UTF-8 remark text before savin=
g it to the file
>>>   and the HTML::Entities::encode_entities command that is run on that rem=
ark text does
>>>   not work with UTF-8 text.
>>> - If the UTF-8 text in the remark is decoded before running through the c=
leanhtml command
>>>   then the characters with diacritical marks are correctly shown.
>>> - Have tested out the fix on a remark with a range of different character=
s with
>>>   diacritical marks and all of the ones tested were displayed correctly w=
ith the fix while
>>>   in the original form they were mangled.
>>>=20
>>> Fixes: Bug#12395
>>> Tested-by: Adolf Belka <adolf.belka(a)ipfire.org>
>>> Signed-off-by: Adolf Belka <adolf.belka(a)ipfire.org>
>>> ---
>>> html/cgi-bin/dns.cgi | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>=20
>>> diff --git a/html/cgi-bin/dns.cgi b/html/cgi-bin/dns.cgi
>>> index 0a34d3fd6..eb6f908d5 100644
>>> --- a/html/cgi-bin/dns.cgi
>>> +++ b/html/cgi-bin/dns.cgi
>>> @@ -142,6 +142,13 @@ if (($cgiparams{'SERVERS'} eq $Lang::tr{'save'}) || =
($cgiparams{'SERVERS'} eq $L
>>> # Go further if there was no error.
>>> if ( ! $errormessage) {
>>> # Check if a remark has been entered.
>>> +
>>> + # decode the UTF-8 text so that characters with diacritical marks such =
as
>>> + # umlauts are treated correctly by the following cleanhtml command
>>> + $cgiparams{'REMARK'} =3D decode("UTF-8", $cgiparams{'REMARK'});
>>> +
>>> + # run the REMARK text through cleanhtml to ensure all unsafe html chara=
cters
>>> + # are correctly encoded to their html entities
>>> $cgiparams{'REMARK'} =3D &Header::cleanhtml($cgiparams{'REMARK'});
>>>=20
>>> my %dns_servers =3D ();
>>> --=20
>>> 2.44.0
>>>=20
>=20
> --=20
> Sent from my laptop



--===============1303106120820928946==--