From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adolf Belka To: development@lists.ipfire.org Subject: Re: [PATCH v2 2/2] dns.cgi: Fixes bug#12395 - German umlauts not correctly displayed in remarks Date: Wed, 13 Mar 2024 23:05:01 +0100 Message-ID: <8a50ed88-d578-4e1e-b56f-52ebfb6e8d5b@ipfire.org> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5208883124853815601==" List-Id: --===============5208883124853815601== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Michael, On 12/03/2024 15:56, Michael Tremer wrote: > Hello, >=20 >> On 12 Mar 2024, at 12:27, Adolf Belka wrote: >> >> Hi Michael, >> >> On 12/03/2024 11:02, Michael Tremer wrote: >>> Thank you. >>> I merged this for now so that we can fix this problem quickly. >>> However I was wondering whether we should consider making the decode stat= ement a part of the =E2=80=9Ccleanhtml=E2=80=9D function. >> That makes a lot of sense. It would also mean that the problem of umlauts = etc would be fixed everywhere that cleanhtml is used rather than needing to f= ix every invocation of cleanhtml. >> >> I will look at putting something together for that. >=20 > This is a common problem with any kind of software. I suppose a lot of peop= le are used to it and leave out any special characters where possible. I gene= rally tend to comment in English because it=E2=80=99s shorter, doesn=E2=80=99= t have the special character issue and allows customers to hire any staff whe= re German isn=E2=80=99t their first language for example. For international c= ustomers I am using English anyways. Hence I never really ran into this probl= em. >=20 >>> I am still unsure why this is happening in the first place. We should be = receiving UTF-8 from the browser, and I believe that perl doesn=E2=80=99t nat= ively store things in UTF-8. That is however not a problem, because it should= read files the same way it wrote them and so there should not be any differe= nce when we re-read the configuration files. Unless some parts of the code sp= ecify any kind of encoding. >> We do receive UTF-8 from the browser. The problem seems to be that the HTM= L::Entities::encode_entities command doesn't work with UTF-8 but with ISO-885= 9-1 encoding. I can't find where I found this the other day when I was search= ing on this topic to understand how to overcome it. >=20 > Ah, yeah that would make sense. So basically we store it correctly but once= we pass it through cleanhtml() we mess it up. Okay. That is good to know. >=20 > So we just need to encode/decode back to UTF-8. Two things with the previous patch I did. Firstly I forgot to add use Encode at the top of the dns.cgi file. I had it in the vm test system but forgot to = copy it across to the patch I created. Secondly, I decode the UTF-8 so cleanhtml works correctly with the text but = I didn't then encode it back to UTF-8 after doing the html::entities cleaning= as you have indicated would be the right approach. I tested that today in my vm testbed and I confirm it works correctly, keepin= g the text with the correct html entity values but also then ensures that the= text is back in UTF-8 format. As the patch set from this original email has been merged into next I will su= bmit a separate patch to add the use Encode and the encode back to UTF-8 line= s for adding to next. Regards, Adolf. >=20 >> The fix is not encoding the text from the browser remark box into UTF-8 bu= t decoding it from UTF-8. Once the text is in the files then it is fine. >=20 > We should always us UTF-8 everywhere. We might have some problems with Chin= ese or so (no idea really), but for 99% of our user-base UTF-8 should work fi= ne. >=20 >> Of course my reasoning for doing the decoding may or may not be right, so = I am always open to alternative suggestions. >=20 > I believe you found the right path :) >=20 > -Michael >=20 >> Regards, >> >> Adolf. >>> -Michael >>>> On 11 Mar 2024, at 12:19, Adolf Belka wrote: >>>> >>>> - If Freifunk M=C3=BCnchen e.V. is entered as a remark it gets converted= to >>>> Freifunk M=C3=83=C2=BCnchen e.V. >>>> - This is because cleanhtml is used on the UTF-8 remark text before savi= ng it to the file >>>> and the HTML::Entities::encode_entities command that is run on that r= emark text does >>>> not work with UTF-8 text. >>>> - If the UTF-8 text in the remark is decoded before running through the = cleanhtml command >>>> then the characters with diacritical marks are correctly shown. >>>> - Have tested out the fix on a remark with a range of different characte= rs with >>>> diacritical marks and all of the ones tested were displayed correctly= with the fix while >>>> in the original form they were mangled. >>>> >>>> Fixes: Bug#12395 >>>> Tested-by: Adolf Belka >>>> Signed-off-by: Adolf Belka >>>> --- >>>> html/cgi-bin/dns.cgi | 7 +++++++ >>>> 1 file changed, 7 insertions(+) >>>> >>>> diff --git a/html/cgi-bin/dns.cgi b/html/cgi-bin/dns.cgi >>>> index 0a34d3fd6..eb6f908d5 100644 >>>> --- a/html/cgi-bin/dns.cgi >>>> +++ b/html/cgi-bin/dns.cgi >>>> @@ -142,6 +142,13 @@ if (($cgiparams{'SERVERS'} eq $Lang::tr{'save'}) ||= ($cgiparams{'SERVERS'} eq $L >>>> # Go further if there was no error. >>>> if ( ! $errormessage) { >>>> # Check if a remark has been entered. >>>> + >>>> + # decode the UTF-8 text so that characters with diacritical marks such= as >>>> + # umlauts are treated correctly by the following cleanhtml command >>>> + $cgiparams{'REMARK'} =3D decode("UTF-8", $cgiparams{'REMARK'}); >>>> + >>>> + # run the REMARK text through cleanhtml to ensure all unsafe html char= acters >>>> + # are correctly encoded to their html entities >>>> $cgiparams{'REMARK'} =3D &Header::cleanhtml($cgiparams{'REMARK'}); >>>> >>>> my %dns_servers =3D (); >>>> --=20 >>>> 2.44.0 >>>> >> >> --=20 >> Sent from my laptop >=20 >=20 --===============5208883124853815601==--