From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adolf Belka To: development@lists.ipfire.org Subject: Re: [PATCH v2 2/2] dns.cgi: Fixes bug#12395 - German umlauts not correctly displayed in remarks Date: Tue, 12 Mar 2024 13:27:00 +0100 Message-ID: <37fc4478-061b-4273-b8d7-d8e2f6bceac2@ipfire.org> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6181246753800973251==" List-Id: --===============6181246753800973251== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Michael, On 12/03/2024 11:02, Michael Tremer wrote: > Thank you. >=20 > I merged this for now so that we can fix this problem quickly. >=20 > However I was wondering whether we should consider making the decode statem= ent a part of the =E2=80=9Ccleanhtml=E2=80=9D function. That makes a lot of sense. It would also mean that the problem of=20 umlauts etc would be fixed everywhere that cleanhtml is used rather than=20 needing to fix every invocation of cleanhtml. I will look at putting something together for that. >=20 > I am still unsure why this is happening in the first place. We should be re= ceiving UTF-8 from the browser, and I believe that perl doesn=E2=80=99t nativ= ely store things in UTF-8. That is however not a problem, because it should r= ead files the same way it wrote them and so there should not be any differenc= e when we re-read the configuration files. Unless some parts of the code spec= ify any kind of encoding. We do receive UTF-8 from the browser. The problem seems to be that the=20 HTML::Entities::encode_entities command doesn't work with UTF-8 but with=20 ISO-8859-1 encoding. I can't find where I found this the other day when=20 I was searching on this topic to understand how to overcome it. The fix is not encoding the text from the browser remark box into UTF-8=20 but decoding it from UTF-8. Once the text is in the files then it is fine. Of course my reasoning for doing the decoding may or may not be right,=20 so I am always open to alternative suggestions. Regards, Adolf. >=20 > -Michael >=20 >> On 11 Mar 2024, at 12:19, Adolf Belka wrote: >> >> - If Freifunk M=C3=BCnchen e.V. is entered as a remark it gets converted to >> Freifunk M=C3=83=C2=BCnchen e.V. >> - This is because cleanhtml is used on the UTF-8 remark text before saving= it to the file >> and the HTML::Entities::encode_entities command that is run on that rem= ark text does >> not work with UTF-8 text. >> - If the UTF-8 text in the remark is decoded before running through the cl= eanhtml command >> then the characters with diacritical marks are correctly shown. >> - Have tested out the fix on a remark with a range of different characters= with >> diacritical marks and all of the ones tested were displayed correctly w= ith the fix while >> in the original form they were mangled. >> >> Fixes: Bug#12395 >> Tested-by: Adolf Belka >> Signed-off-by: Adolf Belka >> --- >> html/cgi-bin/dns.cgi | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/html/cgi-bin/dns.cgi b/html/cgi-bin/dns.cgi >> index 0a34d3fd6..eb6f908d5 100644 >> --- a/html/cgi-bin/dns.cgi >> +++ b/html/cgi-bin/dns.cgi >> @@ -142,6 +142,13 @@ if (($cgiparams{'SERVERS'} eq $Lang::tr{'save'}) || (= $cgiparams{'SERVERS'} eq $L >> # Go further if there was no error. >> if ( ! $errormessage) { >> # Check if a remark has been entered. >> + >> + # decode the UTF-8 text so that characters with diacritical marks such as >> + # umlauts are treated correctly by the following cleanhtml command >> + $cgiparams{'REMARK'} =3D decode("UTF-8", $cgiparams{'REMARK'}); >> + >> + # run the REMARK text through cleanhtml to ensure all unsafe html charac= ters >> + # are correctly encoded to their html entities >> $cgiparams{'REMARK'} =3D &Header::cleanhtml($cgiparams{'REMARK'}); >> >> my %dns_servers =3D (); >> --=20 >> 2.44.0 >> >=20 --=20 Sent from my laptop --===============6181246753800973251==--