From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: [PATCH v2 2/2] dns.cgi: Fixes bug#12395 - German umlauts not correctly displayed in remarks Date: Fri, 15 Mar 2024 10:53:34 +0000 Message-ID: <05A98501-B183-41D5-B3AF-D4D153FF8A1C@ipfire.org> In-Reply-To: <8a50ed88-d578-4e1e-b56f-52ebfb6e8d5b@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1719900586203229643==" List-Id: --===============1719900586203229643== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, > On 13 Mar 2024, at 22:05, Adolf Belka wrote: >=20 > Hi Michael, >=20 > On 12/03/2024 15:56, Michael Tremer wrote: >> Hello, >>> On 12 Mar 2024, at 12:27, Adolf Belka wrote: >>>=20 >>> Hi Michael, >>>=20 >>> On 12/03/2024 11:02, Michael Tremer wrote: >>>> Thank you. >>>> I merged this for now so that we can fix this problem quickly. >>>> However I was wondering whether we should consider making the decode sta= tement a part of the =E2=80=9Ccleanhtml=E2=80=9D function. >>> That makes a lot of sense. It would also mean that the problem of umlauts= etc would be fixed everywhere that cleanhtml is used rather than needing to = fix every invocation of cleanhtml. >>>=20 >>> I will look at putting something together for that. >> This is a common problem with any kind of software. I suppose a lot of peo= ple are used to it and leave out any special characters where possible. I gen= erally tend to comment in English because it=E2=80=99s shorter, doesn=E2=80= =99t have the special character issue and allows customers to hire any staff = where German isn=E2=80=99t their first language for example. For internationa= l customers I am using English anyways. Hence I never really ran into this pr= oblem. >>>> I am still unsure why this is happening in the first place. We should be= receiving UTF-8 from the browser, and I believe that perl doesn=E2=80=99t na= tively store things in UTF-8. That is however not a problem, because it shoul= d read files the same way it wrote them and so there should not be any differ= ence when we re-read the configuration files. Unless some parts of the code s= pecify any kind of encoding. >>> We do receive UTF-8 from the browser. The problem seems to be that the HT= ML::Entities::encode_entities command doesn't work with UTF-8 but with ISO-88= 59-1 encoding. I can't find where I found this the other day when I was searc= hing on this topic to understand how to overcome it. >> Ah, yeah that would make sense. So basically we store it correctly but onc= e we pass it through cleanhtml() we mess it up. Okay. That is good to know. >> So we just need to encode/decode back to UTF-8. > Two things with the previous patch I did. >=20 > Firstly I forgot to add >=20 > use Encode I try to always explicitly name the function that I call from a module. Like = &Encode::encode(=E2=80=A6) instead of just =E2=80=9Cencode()=E2=80=9D. I am n= ot sure whether that changes anything else but my feeling that this function = belongs to a certain module. > at the top of the dns.cgi file. I had it in the vm test system but forgot t= o copy it across to the patch I created. >=20 > Secondly, I decode the UTF-8 so cleanhtml works correctly with the text bu= t I didn't then encode it back to UTF-8 after doing the html::entities cleani= ng as you have indicated would be the right approach. >=20 > I tested that today in my vm testbed and I confirm it works correctly, keep= ing the text with the correct html entity values but also then ensures that t= he text is back in UTF-8 format. >=20 > As the patch set from this original email has been merged into next I will = submit a separate patch to add the use Encode and the encode back to UTF-8 li= nes for adding to next. >=20 > Regards, >=20 > Adolf. >>> The fix is not encoding the text from the browser remark box into UTF-8 b= ut decoding it from UTF-8. Once the text is in the files then it is fine. >> We should always us UTF-8 everywhere. We might have some problems with Chi= nese or so (no idea really), but for 99% of our user-base UTF-8 should work f= ine. >>> Of course my reasoning for doing the decoding may or may not be right, so= I am always open to alternative suggestions. >> I believe you found the right path :) >> -Michael >>> Regards, >>>=20 >>> Adolf. >>>> -Michael >>>>> On 11 Mar 2024, at 12:19, Adolf Belka wrote: >>>>>=20 >>>>> - If Freifunk M=C3=BCnchen e.V. is entered as a remark it gets converte= d to >>>>> Freifunk M=C3=83=C2=BCnchen e.V. >>>>> - This is because cleanhtml is used on the UTF-8 remark text before sav= ing it to the file >>>>> and the HTML::Entities::encode_entities command that is run on that r= emark text does >>>>> not work with UTF-8 text. >>>>> - If the UTF-8 text in the remark is decoded before running through the= cleanhtml command >>>>> then the characters with diacritical marks are correctly shown. >>>>> - Have tested out the fix on a remark with a range of different charact= ers with >>>>> diacritical marks and all of the ones tested were displayed correctly= with the fix while >>>>> in the original form they were mangled. >>>>>=20 >>>>> Fixes: Bug#12395 >>>>> Tested-by: Adolf Belka >>>>> Signed-off-by: Adolf Belka >>>>> --- >>>>> html/cgi-bin/dns.cgi | 7 +++++++ >>>>> 1 file changed, 7 insertions(+) >>>>>=20 >>>>> diff --git a/html/cgi-bin/dns.cgi b/html/cgi-bin/dns.cgi >>>>> index 0a34d3fd6..eb6f908d5 100644 >>>>> --- a/html/cgi-bin/dns.cgi >>>>> +++ b/html/cgi-bin/dns.cgi >>>>> @@ -142,6 +142,13 @@ if (($cgiparams{'SERVERS'} eq $Lang::tr{'save'}) |= | ($cgiparams{'SERVERS'} eq $L >>>>> # Go further if there was no error. >>>>> if ( ! $errormessage) { >>>>> # Check if a remark has been entered. >>>>> + >>>>> + # decode the UTF-8 text so that characters with diacritical marks suc= h as >>>>> + # umlauts are treated correctly by the following cleanhtml command >>>>> + $cgiparams{'REMARK'} =3D decode("UTF-8", $cgiparams{'REMARK'}); >>>>> + >>>>> + # run the REMARK text through cleanhtml to ensure all unsafe html cha= racters >>>>> + # are correctly encoded to their html entities >>>>> $cgiparams{'REMARK'} =3D &Header::cleanhtml($cgiparams{'REMARK'}); >>>>>=20 >>>>> my %dns_servers =3D (); >>>>> --=20 >>>>> 2.44.0 >>>>>=20 >>>=20 >>> --=20 >>> Sent from my laptop --===============1719900586203229643==--