From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: pakfire-builder problems after git pull Date: Wed, 11 Oct 2023 14:34:00 +0100 Message-ID: <34139A5A-6C86-4BED-AE36-C13D6FFB7927@ipfire.org> In-Reply-To: <791c0fe8-7982-4c83-a663-fafda2691993@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8464745764287280373==" List-Id: --===============8464745764287280373== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Adolf, Thank you for giving this another go :) I am currently sidetracked with a couple of other things (will send an email = about this hopefully soon), and there might be some more problems in Pakfire = that I might have recently introduced. However, it exists normally, but with an error code. Normally it should log s= omething to syslog. If not, please run the same thing with =E2=80=94debug and= if it still does not tell you why it is unhappy, let me know. Best, -Michael > On 11 Oct 2023, at 14:08, Adolf Belka wrote: >=20 > Hi Michael, >=20 > I tried to see if gdb would show anything about what was happening. >=20 > I ran gdb pakfire-builder and then within gdb ran the command run build bee= p/beep.nm >=20 > Here is the output from the gdb pakfire-builder command running as root.I c= an't really figure out what is happening, except that pakfire-builder seems t= o have exited very early >=20 > gdb pakfire-builder > GNU gdb (GDB) 13.2 > Copyright (C) 2023 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-pc-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . >=20 > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from pakfire-builder... > (gdb) run build beep/beep.nm > Starting program: /usr/bin/pakfire-builder build beep/beep.nm > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/usr/lib/libthread_db.so.1". > [New Thread 0x7ffff57e16c0 (LWP 1342)] > [Thread 0x7ffff57e16c0 (LWP 1342) exited] > [Inferior 1 (process 1338) exited with code 0377] > (gdb) >=20 > Regards, > Adolf >=20 >=20 > On 06/10/2023 19:11, Adolf Belka wrote: >> Hi Michael, >>=20 >> On 06/10/2023 17:21, Michael Tremer wrote: >>> Hello everyone, >>>=20 >>> Finally there is some white smoke on Friday! >>>=20 >>> After a *very* long time, I found the reason why Pakfire was crashing as = soon as a thread was launched (which is what actually happened). The reason i= s a compiler bug which remains unresolved since 2017: https://gcc.gnu.org/bug= zilla/show_bug.cgi?id=3D81142 >>>=20 >>> The fix for this is: https://git.ipfire.org/?p=3Dpakfire.git;a=3Dcommitdi= ff;h=3Db4d087f3353a174be936da4cc959dc169491162a >>>=20 >>> Thank you for your patience. Please pull and let me know what new bugs I = have created in the meantime. >> I did a git pull and then ran the build commands but still ran into the pr= ogressbar vs progressbar2 issue. >>=20 >> I changed progressbar2 in configure.ac to progressbar and repeated the ./a= utogen etc and everything then went through without a problem. >>=20 >> Then I tried sudo pakfire-builder build beep/beep.nm and the program came = straight back to a new line without appearing to do anything and without any = messages at all. >>=20 >> I the did sudo pakfire-builder --debug build beep/beep.nm and exactly the = same happened, went to a new line without any messages or anything. >>=20 >> Regards, >> Adolf. >>>=20 >>> Best, >>> -Michael >>>=20 >>>> On 1 Oct 2023, at 17:11, Adolf Belka wrote: >>>>=20 >>>> I downgraded curl and glibc to the previous versions just to see if that= stopped the segfault problem but it didn't >>>>=20 >>>> Regards, >>>>=20 >>>> Adolf. >>>>=20 >>>> On 01/10/2023 16:24, Adolf Belka wrote: >>>>> Hi Michael, >>>>>=20 >>>>> I eventually figured out how to use gdb in this situation and managed t= o get a backtrace when running the build of the python3-build package. >>>>>=20 >>>>> The output I got was >>>>>=20 >>>>> sudo gdb pakfire-builder >>>>> GNU gdb (GDB) 13.2 >>>>> Copyright (C) 2023 Free Software Foundation, Inc. >>>>> License GPLv3+: GNU GPL version 3 or later >>>>> This is free software: you are free to change and redistribute it. >>>>> There is NO WARRANTY, to the extent permitted by law. >>>>> Type "show copying" and "show warranty" for details. >>>>> This GDB was configured as "x86_64-pc-linux-gnu". >>>>> Type "show configuration" for configuration details. >>>>> For bug reporting instructions, please see: >>>>> . >>>>> Find the GDB manual and other documentation resources online at: >>>>> . >>>>>=20 >>>>> For help, type "help". >>>>> Type "apropos word" to search for commands related to "word"... >>>>> Reading symbols from pakfire-builder... >>>>> (gdb) run build python3-build/python3-build.nm >>>>> Starting program: /usr/bin/pakfire-builder build python3-build/python3-= build.nm >>>>> [Thread debugging using libthread_db enabled] >>>>> Using host libthread_db library "/usr/lib/libthread_db.so.1". >>>>> [New Thread 0x7ffff55276c0 (LWP 61317)] >>>>>=20 >>>>> Thread 2 "pakfire-builder" received signal SIGSEGV, Segmentation fault. >>>>> [Switching to Thread 0x7ffff55276c0 (LWP 61317)] >>>>> 0x00007ffff7fd4fa4 in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> (gdb) bt >>>>> #0 0x00007ffff7fd4fa4 in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> #1 0x00007ffff7fd2544 in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> #2 0x00007ffff7fcc715 in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> #3 0x00007ffff7fcb4e1 in _dl_catch_exception () from /lib64/ld-linux-x= 86-64.so.2 >>>>> #4 0x00007ffff7fccb75 in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> #5 0x00007ffff7fd60b1 in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> #6 0x00007ffff7fcb4e1 in _dl_catch_exception () from /lib64/ld-linux-x= 86-64.so.2 >>>>> #7 0x00007ffff7fd581a in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> #8 0x00007ffff7fcb4e1 in _dl_catch_exception () from /lib64/ld-linux-x= 86-64.so.2 >>>>> #9 0x00007ffff7fd5bec in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> #10 0x00007ffff7ea58a1 in ?? () from /usr/lib/libc.so.6 >>>>> #11 0x00007ffff7fcb4e1 in _dl_catch_exception () from /lib64/ld-linux-x= 86-64.so.2 >>>>> #12 0x00007ffff7fcb603 in ?? () from /lib64/ld-linux-x86-64.so.2 >>>>> #13 0x00007ffff7ea5811 in ?? () from /usr/lib/libc.so.6 >>>>> #14 0x00007ffff7ea5a4f in ?? () from /usr/lib/libc.so.6 >>>>> #15 0x00007ffff7e8d857 in ?? () from /usr/lib/libc.so.6 >>>>> #16 0x00007ffff7e8dc3d in ?? () from /usr/lib/libc.so.6 >>>>> #17 0x00007ffff7e8bd58 in __nss_next2 () from /usr/lib/libc.so.6 >>>>> #18 0x00007ffff7e72805 in gethostbyname2_r () from /usr/lib/libc.so.6 >>>>> #19 0x00007ffff7e33883 in getaddrinfo () from /usr/lib/libc.so.6 >>>>> #20 0x00007ffff7b7e2c9 in ?? () from /usr/lib/libcurl.so.4 >>>>> #21 0x00007ffff7b813bc in ?? () from /usr/lib/libcurl.so.4 >>>>> #22 0x00007ffff7ddd9eb in ?? () from /usr/lib/libc.so.6 >>>>> #23 0x00007ffff7e617cc in ?? () from /usr/lib/libc.so.6 >>>>> (gdb) >>>>>=20 >>>>>=20 >>>>> I see libcurl and libc present there so that looks to be what you indic= ated. >>>>>=20 >>>>> I have curl-8.3.0 and glibc-2.38 on my system. >>>>>=20 >>>>> Regards, >>>>>=20 >>>>> Adolf. >>>>>=20 >>>>> On 01/10/2023 14:51, Adolf Belka wrote: >>>>>> Hi Michael, >>>>>>=20 >>>>>> On 01/10/2023 14:22, Michael Tremer wrote: >>>>>>> Hello, >>>>>>>=20 >>>>>>>> On 30 Sep 2023, at 13:39, Adolf Belka wro= te: >>>>>>>>=20 >>>>>>>> Hi Michael, >>>>>>>>=20 >>>>>>>> I just ran git pull which took me to commit >>>>>>>>=20 >>>>>>>> build: Add implicit dist() when a makefile is passed -- followed by = make and make install. >>>>>>>>=20 >>>>>>>> Running pakfire-builder build beep/beep.nm gave the error message "S= egmentation fault" >>>>>>>=20 >>>>>>> I can confirm this, but it seems to be a crash in libcurl/glibc: >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>> Rolling back to >>>>>>>>=20 >>>>>>>> cli: pakfire-builder: Implement "image create" >>>>>>>>=20 >>>>>>>> then worked with beep fine. >>>>>>>=20 >>>>>>> However, if you say rolling back works, then we must trigger the cras= h in Pakfire somehow, because libcurl/glibc stays the same. >>>>>> My earlier rolling back worked. However when I then tried to create a = new package, python3-build, it then segfaulted again. >>>>>>=20 >>>>>> Then I rolled back a long way and tried again with the python3-build a= nd again it segfaulted. >>>>>>=20 >>>>>> So it looks like the segfault is not always consistent but rolling bac= k did not actually help as there are still packages that end up with a segfau= lt. >>>>>>=20 >>>>>> Regards, >>>>>> Adolf. >>>>>>>=20 >>>>>>> In IPFire we use c-ares as a resolver for cURL which does not crash l= ike this. >>>>>>>=20 >>>>>>>> Then tried git pull, make, make install again and this time beep bui= lt without any problems. Then I tried sqlite and I got the segmentation fault= again. >>>>>>>=20 >>>>>>> This is very early when Pakfire is starting up and refreshing its rep= ository information. >>>>>>>=20 >>>>>>>> Then I rolled back and sqlite built okay. Then ran sqlite again and = this time part way through building my laptop logged out from my session. >>>>>>>>=20 >>>>>>>> Logged back in and ran sqlite again and this after a short while eve= rything froze and I had to reset the laptop. >>>>>>>>=20 >>>>>>>> So rolling back to >>>>>>>>=20 >>>>>>>> cli: pakfire-builder: Implement "image create" >>>>>>>>=20 >>>>>>>> Does not stop the problem happening it just seems to show itself in = different forms. >>>>>>>>=20 >>>>>>>> I have now rolled back to >>>>>>>>=20 >>>>>>>> cli: pakfire: Implement --yes >>>>>>>>=20 >>>>>>>> which I think is where my last git pull had placed me. >>>>>>>>=20 >>>>>>>> Running pakfire-builder on beep twice and sqlite four times has give= n no problems so that commit stage is confirmed good. >>>>>>>>=20 >>>>>>>> Let me know if there is anything else I should try to narrow the pro= blem down. >>>>>>>=20 >>>>>>> Good question. I have a VM with archlinux where I can reproduce the c= rash. I will have a look what I can do to fix this=E2=80=A6 >>>>>>>=20 >>>>>>> -Michael >>>>>>>=20 >>>>>>>>=20 >>>>>>>> Regards, >>>>>>>> Adolf. >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> --=20 >>>>>>>> Sent from my laptop >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>>> --=20 >>>> Sent from my laptop >>>>=20 >>>=20 >>=20 >=20 > --=20 > Sent from my laptop >=20 --===============8464745764287280373==--