Hi Michael,
On 12/03/2024 15:56, Michael Tremer wrote:
Hello,
On 12 Mar 2024, at 12:27, Adolf Belka adolf.belka@ipfire.org wrote:
Hi Michael,
On 12/03/2024 11:02, Michael Tremer wrote:
Thank you. I merged this for now so that we can fix this problem quickly. However I was wondering whether we should consider making the decode statement a part of the “cleanhtml” function.
That makes a lot of sense. It would also mean that the problem of umlauts etc would be fixed everywhere that cleanhtml is used rather than needing to fix every invocation of cleanhtml.
I will look at putting something together for that.
This is a common problem with any kind of software. I suppose a lot of people are used to it and leave out any special characters where possible. I generally tend to comment in English because it’s shorter, doesn’t have the special character issue and allows customers to hire any staff where German isn’t their first language for example. For international customers I am using English anyways. Hence I never really ran into this problem.
I am still unsure why this is happening in the first place. We should be receiving UTF-8 from the browser, and I believe that perl doesn’t natively store things in UTF-8. That is however not a problem, because it should read files the same way it wrote them and so there should not be any difference when we re-read the configuration files. Unless some parts of the code specify any kind of encoding.
We do receive UTF-8 from the browser. The problem seems to be that the HTML::Entities::encode_entities command doesn't work with UTF-8 but with ISO-8859-1 encoding. I can't find where I found this the other day when I was searching on this topic to understand how to overcome it.
Ah, yeah that would make sense. So basically we store it correctly but once we pass it through cleanhtml() we mess it up. Okay. That is good to know.
So we just need to encode/decode back to UTF-8.
Two things with the previous patch I did.
Firstly I forgot to add
use Encode
at the top of the dns.cgi file. I had it in the vm test system but forgot to copy it across to the patch I created.
Secondly, I decode the UTF-8 so cleanhtml works correctly with the text but I didn't then encode it back to UTF-8 after doing the html::entities cleaning as you have indicated would be the right approach.
I tested that today in my vm testbed and I confirm it works correctly, keeping the text with the correct html entity values but also then ensures that the text is back in UTF-8 format.
As the patch set from this original email has been merged into next I will submit a separate patch to add the use Encode and the encode back to UTF-8 lines for adding to next.
Regards,
Adolf.
The fix is not encoding the text from the browser remark box into UTF-8 but decoding it from UTF-8. Once the text is in the files then it is fine.
We should always us UTF-8 everywhere. We might have some problems with Chinese or so (no idea really), but for 99% of our user-base UTF-8 should work fine.
Of course my reasoning for doing the decoding may or may not be right, so I am always open to alternative suggestions.
I believe you found the right path :)
-Michael
Regards,
Adolf.
-Michael
On 11 Mar 2024, at 12:19, Adolf Belka adolf.belka@ipfire.org wrote:
- If Freifunk München e.V. is entered as a remark it gets converted to Freifunk München e.V.
- This is because cleanhtml is used on the UTF-8 remark text before saving it to the file and the HTML::Entities::encode_entities command that is run on that remark text does not work with UTF-8 text.
- If the UTF-8 text in the remark is decoded before running through the cleanhtml command then the characters with diacritical marks are correctly shown.
- Have tested out the fix on a remark with a range of different characters with diacritical marks and all of the ones tested were displayed correctly with the fix while in the original form they were mangled.
Fixes: Bug#12395 Tested-by: Adolf Belka adolf.belka@ipfire.org Signed-off-by: Adolf Belka adolf.belka@ipfire.org
html/cgi-bin/dns.cgi | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/html/cgi-bin/dns.cgi b/html/cgi-bin/dns.cgi index 0a34d3fd6..eb6f908d5 100644 --- a/html/cgi-bin/dns.cgi +++ b/html/cgi-bin/dns.cgi @@ -142,6 +142,13 @@ if (($cgiparams{'SERVERS'} eq $Lang::tr{'save'}) || ($cgiparams{'SERVERS'} eq $L # Go further if there was no error. if ( ! $errormessage) { # Check if a remark has been entered.
- # decode the UTF-8 text so that characters with diacritical marks such as
- # umlauts are treated correctly by the following cleanhtml command
- $cgiparams{'REMARK'} = decode("UTF-8", $cgiparams{'REMARK'});
- # run the REMARK text through cleanhtml to ensure all unsafe html characters
- # are correctly encoded to their html entities
$cgiparams{'REMARK'} = &Header::cleanhtml($cgiparams{'REMARK'});
my %dns_servers = ();
2.44.0
-- Sent from my laptop