Discussion:
Bug#983379: Panic on startup
Add Reply
Sjoerd Simons
2021-02-23 08:40:01 UTC
Reply
Permalink
Package: user-mode-linux
Version: 5.10um1+b1
Severity: grave

On startup of uml in e.g. fakemachine it panics straight away:

```
$ fakemachine -b uml "uname -a"
kmsg_dump:
<5>Linux version 5.10.5 (***@x86-conova-01) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #1 Mon Jan 11 20:40:53 UTC 2021
<6>Zone ranges:
<6> Normal [mem 0x0000000000000000-0x00000000e164ffff]
<6>Movable zone start for each node
<6>Early memory node ranges
<6> node 0: [mem 0x0000000000000000-0x000000008164ffff]
<6>Initmem setup node 0 [mem 0x0000000000000000-0x000000008164ffff]
<7>On node 0 totalpages: 530000
<7> Normal zone: 8282 pages used for memmap
<7> Normal zone: 0 pages reserved
<7> Normal zone: 530000 pages, LIFO batch:63
<7>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
<7>pcpu-alloc: [0] 0
<6>Built 1 zonelists, mobility grouping on. Total pages: 521718
<5>Kernel command line: mem=2048M initrd=/tmp/fakemachine-981932232/initramfs.cpio panic=-1 nosplash systemd.unit=fakemachine.service console=tty0 vec0:transport=fd,fd=3,vec=0 quiet con1=fd:0,fd:1 con0=null con=none root=98:0
<6>Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
<6>Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
<6>mem auto-init: stack:off, heap alloc:off, heap free:off
<6>Memory: 2044088K/2120000K available (5830K kernel code, 1535K rwdata, 1744K rodata, 191K init, 225K bss, 75912K reserved, 0K cma-reserved)
<6>SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
<6>NR_IRQS: 24
<6>clocksource: timer: mask: 0xffffffffffffffff max_cycles: 0x1cd42e205, max_idle_ns: 881590404426 ns
<6>Calibrating delay loop... 4213.14 BogoMIPS (lpj=21065728)
<6>pid_max: default: 32768 minimum: 301
<6>LSM: Security Framework initializing
<6>Yama: disabled by default; enable with sysctl kernel.yama.*
<6>SELinux: Initializing.
<6>TOMOYO Linux initialized
<6>Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
<6>Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
<4>
<4>Modules linked in:
<6>Pid: 0, comm: swapper Not tainted 5.10.5
<6>RIP: 0033:[<00000000604d4201>]
<6>RSP: 00007ffdc7521cb0 EFLAGS: 00010206
<6>RAX: 0000000000000000 RBX: 0000000000000059 RCX: 00007ffdc7520000
<6>RDX: 0000000000000035 RSI: 0000000060b69a71 RDI: 0000000060d8ac3b
<6>RBP: 0000000000000000 R08: 0000000060b69a72 R09: 0000000060d8abe2
<6>R10: 0000000080000000 R11: 3d74696e695f676e R12: 0000000000000002
<6>R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000001
<0>Kernel panic - not syncing: Segfault with no mm
<4>CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.5 #1
<4>Stack:
<4> 623e3d20 8000000000000000 7fd9ee013908 7fd9ee013ae5
<4> 7fd9ee4629e8 00000000 7ffdc7521cf0 00000400
<4> 7fd9ee409f20 7fd9ee4629e8 00000000 00000000Call Trace:
<4> [<604d4fa3>] ? __printk_safe_enter+0x0/0x35
<4> [<604d154a>] ? arch_local_irq_save+0x0/0x22
<4> [<604d46f5>] ? vprintk_emit+0x9d/0x185
<4> [<604d49d3>] ? vprintk_deferred+0x1d/0x32
<4> [<60a26ee2>] ? printk_deferred+0x93/0x9b
<4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
<4> [<60a26e4f>] ? printk_deferred+0x0/0x9b
<4> [<6049cddb>] ? set_signals+0x0/0x38
<4> [<60589588>] ? arch_local_irq_save+0x0/0x22
<4> [<6055c928>] ? kvmalloc_node+0x56/0x96
<4> [<6058d3c0>] ? __kmalloc+0x1e2/0x1f9
<4> [<608e3d32>] ? ___ratelimit+0xd0/0xde
<4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
<4> [<60901485>] ? _warn_unseeded_randomness+0x60/0x8f
<4> [<6090295b>] ? get_random_u32+0x29/0x98
<4> [<6088f79f>] ? bucket_table_alloc.isra.0+0x115/0x13d
<4> [<6088f68a>] ? bucket_table_alloc.isra.0+0x0/0x13d
<4> [<6088ff7a>] ? rhashtable_init+0x175/0x1ca
<4> [<607ef317>] ? ipc_init_ids+0x4e/0x6f
<4> [<600153bd>] ? sem_init+0x17/0x45
fakemachine: error starting uml backend: <nil>

```


-- System Information:
Debian Release: bullseye/sid
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'stable-debug'), (500, 'proposed-updates'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: armhf, arm64

Kernel: Linux 5.10.0-3-amd64 (SMP w/32 CPU threads)
Kernel taint flags: TAINT_UNSIGNED_MODULE
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages user-mode-linux depends on:
ii libc6 2.31-9

Versions of packages user-mode-linux recommends:
ii uml-utilities 20070815.4-1

Versions of packages user-mode-linux suggests:
ii gnome-terminal [x-terminal-emulator] 3.38.3-1
pn rootstrap <none>
ii rxvt-unicode [x-terminal-emulator] 9.22-8+b1
ii slirp 1:1.0.17-11
pn user-mode-linux-doc <none>
pn vde2 <none>
ii xterm [x-terminal-emulator] 366-1

-- no debconf information
Ritesh Raj Sarraf
2021-02-23 17:40:01 UTC
Reply
Permalink
Added the debian bug report in CC.
The current Debian user-mode-linux package in unstable is based on
the 5.10.5 stable source which includes the mentioned patch, but is
still causing an error for some users.
After updating the tree to 5.10.5 and applying all Debian patches
from the package, I cannot reproduce the bug.
I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
without issues. Hosts are all up to date Debian 10.8 and so is the
UML userspace.
Did you mean 5.10, 5.2 and 4.19 (UML) guests ?

We've seen this happen on Debian Testing and Unstable Host (of which
the former would soon be the next stable i.e. Debian Bullseye).

In our tests, when running the same linux uml binary (5.10) on a Debian
Stable Host, it is working fine.
--
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System
Anton Ivanov
2021-02-23 18:40:02 UTC
Reply
Permalink
Post by Ritesh Raj Sarraf
Added the debian bug report in CC.
The current Debian user-mode-linux package in unstable is based on
the 5.10.5 stable source which includes the mentioned patch, but is
still causing an error for some users.
After updating the tree to 5.10.5 and applying all Debian patches
from the package, I cannot reproduce the bug.
I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
without issues. Hosts are all up to date Debian 10.8 and so is the
UML userspace.
Did you mean 5.10, 5.2 and 4.19 (UML) guests ?
No. Hosts.

I have several 6core/12thread Ryzens which are used for development
testing.

They all use identical userspace with the sole difference being the
kernel. They all use a selection of 5.x because 4.19 does not support
the hardware properly.

The 4.19 testing is done on my old "test farm" which is all A8s and
Athlon X760.
Post by Ritesh Raj Sarraf
We've seen this happen on Debian Testing and Unstable Host (of which
the former would soon be the next stable i.e. Debian Bullseye).
In our tests, when running the same linux uml binary (5.10) on a Debian
Stable Host, it is working fine.
OK. I will upgrade one of my systems to Debian testing to try to
reproduce this.
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
Anton Ivanov
2021-02-24 11:50:02 UTC
Reply
Permalink
Post by Ritesh Raj Sarraf
Added the debian bug report in CC.
The current Debian user-mode-linux package in unstable is based on
the 5.10.5 stable source which includes the mentioned patch, but is
still causing an error for some users.
After updating the tree to 5.10.5 and applying all Debian patches
from the package, I cannot reproduce the bug.
I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
without issues. Hosts are all up to date Debian 10.8 and so is the
UML userspace.
Did you mean 5.10, 5.2 and 4.19 (UML) guests ?
We've seen this happen on Debian Testing and Unstable Host (of which
the former would soon be the next stable i.e. Debian Bullseye).
In our tests, when running the same linux uml binary (5.10) on a Debian
Stable Host, it is working fine.
I cannot reproduce it on a physical Bullseye host using the Debian user-mode-linux package compiled from source.

Environment - Bullseye minimal install and build deps. 6 cores/12 threads Ryzen

I cannot reproduce it using the upstream source and the patches from the user-mode-linux package

Environment - same as above.

I cannot reproduce it using the upstream source + patches and compiling on Buster using the following:

1. Bullseye physical host, minimal install, same hardware

2. Bullseye VM, minimal install, running with 4 vCPUs on the same host

3. Bullseye LXC container running on a Debian Buster host, minimal install, same hardware

In all cases it boots cleanly and there are no segfaults.

So, frankly, no idea what is causing it to crash - I have run most combinations of 5.10 on a 5.10, all work fine here.
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
Ritesh Raj Sarraf
2021-03-02 09:20:03 UTC
Reply
Permalink
Post by Anton Ivanov
In all cases it boots cleanly and there are no segfaults.
So, frankly, no idea what is causing it to crash - I have run most
combinations of 5.10 on a 5.10, all work fine here.
Is there any other way I can help you with this issue ?
I do have the core dump available on my local machine.
--
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System
Ritesh Raj Sarraf
2021-03-02 17:40:01 UTC
Reply
Permalink
So the best I can extract for you is to compile the kernel with as
much
information as possible.
Can you try using one of the older kernels so we can verify if this
is indeed a 5.10 thing.
That was the first thing I tried. I tested it with 5.10, 5.9 and 5.4.
All 3 crashed. That's when I knew this one was going to be painful one
to conclude.

The only other input I have is that I have one more user who's reported
to be able to reproduce the issue.

OTOH, I have one more user (other than you) who's not been able to
reproduce the issue.
I will do a dissect the moment I figure out how to reproduce it. I
will try to do some more experiments on that tomorrow.
Meanwhile, I enabled some debug info in the kernel. Here's what I have
got so far:

```
(gdb) bt
#0 0x00007f89908dc087 in kill () at ../sysdeps/unix/syscall-
template.S:120
#1 0x00000000604a3514 in uml_abort () at arch/um/os-Linux/util.c:94
#2 0x00000000604a3791 in os_dump_core () at arch/um/os-
Linux/util.c:149
#3 0x000000006048d126 in panic_exit (self=0x2e66d5, unused1=6,
unused2=0x0) at arch/um/kernel/um_arch.c:217
#4 0x00000000604c725a in notifier_call_chain (nl=0x2e66d5, val=0,
v=0x60d82f40 <buf>, nr_to_call=-1, nr_calls=0x0) at
kernel/notifier.c:83
#5 0x00000000604c72f6 in atomic_notifier_call_chain (nh=0x2e66d5,
val=6, v=0x0) at kernel/notifier.c:217
#6 0x0000000060a54607 in panic (fmt=0x60a55225 <printk>
"UH\211\345H\201\354", <incomplete sequence \320>) at
kernel/panic.c:272
#7 0x000000006048cca3 in segv (fi=<incomplete type>, ip=1615717312,
is_user=0, regs=0x60c2ee58 <cpu0_irqstack+11864>) at
arch/um/kernel/trap.c:246
#8 0x000000006048ce64 in segv_handler (sig=3040981, unused_si=0x6,
regs=0x60c2ee58 <cpu0_irqstack+11864>) at arch/um/kernel/trap.c:190
#9 0x00000000604a2556 in sig_handler_common (sig=11, si=0x60c2fbf0
<cpu0_irqstack+15344>, mc=0x60c2fae8 <cpu0_irqstack+15080>) at
arch/um/os-Linux/signal.c:48
#10 0x00000000604a2aa2 in sig_handler (sig=3040981, si=0x6, mc=0x0) at
arch/um/os-Linux/signal.c:81
#11 0x00000000604a265f in hard_handler (sig=3040981, si=0x60c2fbf0
<cpu0_irqstack+15344>, p=0x0) at arch/um/os-Linux/signal.c:180
#12 <signal handler called>
#13 0x00000000604de3c0 in printk_caller_id () at
kernel/printk/printk.c:1924
#14 log_output (text_len=<optimized out>, text=<optimized out>,
dev_info=<optimized out>, lflags=<optimized out>, level=<optimized
out>, facility=<optimized out>) at kernel/printk/printk.c:1932
#15 vprintk_store (facility=1624806843, level=5, dev_info=0x0, fmt=0x35
<error: Cannot access memory at address 0x35>, args=0x1) at
kernel/printk/printk.c:2004
#16 0x00000000604de8b7 in vprintk_emit (facility=1624806843,
level=1622768673, dev_info=0x35, fmt=0x1 <error: Cannot access memory
at address 0x1>, args=0x60b97c22) at kernel/printk/printk.c:2029
#17 0x00000000604debad in vprintk_deferred (fmt=0x1 <error: Cannot
access memory at address 0x1>, args=0x60b97c21) at
kernel/printk/printk.c:3079
#18 0x0000000060a554de in printk_deferred (fmt=0x60d895bb <textbuf+91>
"\n") at kernel/printk/printk.c:3091
#19 0x000000006092680f in _warn_unseeded_randomness
(previous=<optimized out>, caller=<optimized out>, func_name=<optimized
out>) at drivers/char/random.c:1534
#20 _warn_unseeded_randomness (func_name=0x60abf380 <__func__.38>
"get_random_u32", caller=0x608b5f25 <bucket_table_alloc+287>,
previous=0x35) at drivers/char/random.c:1516
#21 0x0000000060927d47 in get_random_u32 () at
drivers/char/random.c:2221
#22 0x00000000608b5f25 in bucket_table_alloc (nbuckets=64, gfp=3264,
ht=<optimized out>) at lib/rhashtable.c:203
#23 0x00000000608b6733 in rhashtable_init (ht=0x60c60e30
<init_ipc_ns+80>, params=0x608b5e06 <bucket_table_alloc>) at
lib/rhashtable.c:1061
#24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
at ipc/util.c:119
#25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
ipc/sem.c:254
#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2
#28 0x00007f8990ab8fb2 in call_init (l=<optimized out>,
argc=***@entry=5, argv=***@entry=0x7ffe3e7a4c98,
env=***@entry=0x7ffe3e7a4cc8) at dl-init.c:72
#29 0x00007f8990ab90b9 in call_init (env=0x7ffe3e7a4cc8,
argv=0x7ffe3e7a4c98, argc=5, l=<optimized out>) at dl-init.c:30
#30 _dl_init (main_map=0x61497ea0, argc=5, argv=0x7ffe3e7a4c98,
env=0x7ffe3e7a4cc8) at dl-init.c:119
#31 0x00007f89909d82bd in __GI__dl_catch_exception
(exception=***@entry=0x0, operate=***@entry=0x7f8990abc5a
0
<call_dl_init>, args=***@entry=0x7ffe3e7a1e80) at dl-error-
skeleton.c:182
#32 0x00007f8990abd028 in dl_open_worker (a=***@entry=0x7ffe3e7a2020) at
dl-open.c:758
#33 0x00007f89909d8260 in __GI__dl_catch_exception
(exception=***@entry=0x7ffe3e7a2000,
operate=***@entry=0x7f8990abcc70 <dl_open_worker>,
args=***@entry=0x7ffe3e7a2020) at dl-error-skeleton.c:208
#34 0x00007f8990abc8ca in _dl_open (file=0x7ffe3e7a22a0
"libnss_nis.so.2", mode=-2147483646, caller_dlopen=0x7f89909bf3a6
<nss_load_library+294>, nsid=-2, argc=5, argv=0x7ffe3e7a2000,
env=0x7ffe3e7a4cc8)
at dl-open.c:837
#35 0x00007f89909d76dd in do_dlopen (ptr=***@entry=0x7ffe3e7a2260) at
dl-libc.c:96
#36 0x00007f89909d8260 in __GI__dl_catch_exception
(exception=***@entry=0x7ffe3e7a21e0,
operate=***@entry=0x7f89909d76a0 <do_dlopen>,
args=***@entry=0x7ffe3e7a2260) at dl-error-skeleton.c:208
#37 0x00007f89909d831f in __GI__dl_catch_error
(objname=***@entry=0x7ffe3e7a2238,
errstring=***@entry=0x7ffe3e7a2240,
mallocedp=***@entry=0x7ffe3e7a2237,
operate=***@entry=0x7f89909d76a0 <do_dlopen>,
args=***@entry=0x7ffe3e7a2260) at dl-error-skeleton.c:227
#38 0x00007f89909d77b7 in dlerror_run
(operate=***@entry=0x7f89909d76a0 <do_dlopen>,
args=***@entry=0x7ffe3e7a2260) at dl-libc.c:46
#39 0x00007f89909d7846 in __GI___libc_dlopen_mode
(name=***@entry=0x7ffe3e7a22a0 "libnss_nis.so.2", mode=***@entry=-
2147483646) at dl-libc.c:195
#40 0x00007f89909bf3a6 in nss_load_library (ni=***@entry=0x61497db0) at
nsswitch.c:359
#41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
fct_name=<optimized out>, ***@entry=0x7f899089b020 "setgrent") at
nsswitch.c:467
#42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
#45 0x00007f8990968b85 in __getgrnam_r (name=***@entry=0x7f8990a2a1e0
"tty", resbuf=***@entry=0x7ffe3e7a2910,
buffer=***@entry=0x7ffe3e7a24e0 "", buflen=1024,
result=***@entry=0x7ffe3e7a2908)
at ../nss/getXXbyYY_r.c:315
#46 0x00007f89909d6b77 in grantpt (fd=***@entry=5) at
../sysdeps/unix/grantpt.c:152
#47 0x00007f8990a9394e in __GI_openpty (amaster=0x60c2bd94,
aslave=0x60c2bd98, name=0x0, termp=0x0, winp=0x0) at openpty.c:103
#48 0x00000000604a1f65 in openpty_cb (arg=0x60c2bd94) at arch/um/os-
Linux/sigio.c:407
#49 0x00000000604a58d0 in start_idle_thread (stack=0x60c28000
<init_thread_info>, switch_buf=0x60c31e08 <init_task+4936>) at
arch/um/os-Linux/skas/process.c:598
#50 0x0000000060004a3d in start_uml () at
arch/um/kernel/skas/process.c:45
#51 0x00000000600047b2 in linux_main (argc=1624806843, argv=0x40709000)
at arch/um/kernel/um_arch.c:334
#52 0x000000006000574f in main (argc=5, argv=0x7ffe3e7a4c98, envp=0x35)
at arch/um/os-Linux/main.c:144
(gdb)

```
--
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System
Anton Ivanov
2021-03-03 09:40:02 UTC
Reply
Permalink
Post by Ritesh Raj Sarraf
So the best I can extract for you is to compile the kernel with as
much
information as possible.
Can you try using one of the older kernels so we can verify if this
is indeed a 5.10 thing.
That was the first thing I tried. I tested it with 5.10, 5.9 and 5.4.
All 3 crashed. That's when I knew this one was going to be painful one
to conclude.
The only other input I have is that I have one more user who's reported
to be able to reproduce the issue.
OTOH, I have one more user (other than you) who's not been able to
reproduce the issue.
I will do a dissect the moment I figure out how to reproduce it. I
will try to do some more experiments on that tomorrow.
I tried to alter the userspace a bit, but it makes no difference.

Out of curiosity, what are you running it on?
Post by Ritesh Raj Sarraf
Meanwhile, I enabled some debug info in the kernel. Here's what I have
```
(gdb) bt
#0 0x00007f89908dc087 in kill () at ../sysdeps/unix/syscall-
template.S:120
#1 0x00000000604a3514 in uml_abort () at arch/um/os-Linux/util.c:94
#2 0x00000000604a3791 in os_dump_core () at arch/um/os-
Linux/util.c:149
#3 0x000000006048d126 in panic_exit (self=0x2e66d5, unused1=6,
unused2=0x0) at arch/um/kernel/um_arch.c:217
#4 0x00000000604c725a in notifier_call_chain (nl=0x2e66d5, val=0,
v=0x60d82f40 <buf>, nr_to_call=-1, nr_calls=0x0) at
kernel/notifier.c:83
#5 0x00000000604c72f6 in atomic_notifier_call_chain (nh=0x2e66d5,
val=6, v=0x0) at kernel/notifier.c:217
#6 0x0000000060a54607 in panic (fmt=0x60a55225 <printk>
"UH\211\345H\201\354", <incomplete sequence \320>) at
kernel/panic.c:272
#7 0x000000006048cca3 in segv (fi=<incomplete type>, ip=1615717312,
is_user=0, regs=0x60c2ee58 <cpu0_irqstack+11864>) at
arch/um/kernel/trap.c:246
#8 0x000000006048ce64 in segv_handler (sig=3040981, unused_si=0x6,
regs=0x60c2ee58 <cpu0_irqstack+11864>) at arch/um/kernel/trap.c:190
#9 0x00000000604a2556 in sig_handler_common (sig=11, si=0x60c2fbf0
<cpu0_irqstack+15344>, mc=0x60c2fae8 <cpu0_irqstack+15080>) at
arch/um/os-Linux/signal.c:48
#10 0x00000000604a2aa2 in sig_handler (sig=3040981, si=0x6, mc=0x0) at
arch/um/os-Linux/signal.c:81
#11 0x00000000604a265f in hard_handler (sig=3040981, si=0x60c2fbf0
<cpu0_irqstack+15344>, p=0x0) at arch/um/os-Linux/signal.c:180
#12 <signal handler called>
The code here is:

static inline u32 printk_caller_id(void)
{
return in_task() ? task_pid_nr(current) :
0x80000000 + raw_smp_processor_id();
}


That is something which should not bomb out unless we have memory corruption or something along those lines - current being invalid.

A.
Post by Ritesh Raj Sarraf
#13 0x00000000604de3c0 in printk_caller_id () at
kernel/printk/printk.c:1924
#14 log_output (text_len=<optimized out>, text=<optimized out>,
dev_info=<optimized out>, lflags=<optimized out>, level=<optimized
out>, facility=<optimized out>) at kernel/printk/printk.c:1932
#15 vprintk_store (facility=1624806843, level=5, dev_info=0x0, fmt=0x35
<error: Cannot access memory at address 0x35>, args=0x1) at
kernel/printk/printk.c:2004
#16 0x00000000604de8b7 in vprintk_emit (facility=1624806843,
level=1622768673, dev_info=0x35, fmt=0x1 <error: Cannot access memory
at address 0x1>, args=0x60b97c22) at kernel/printk/printk.c:2029
#17 0x00000000604debad in vprintk_deferred (fmt=0x1 <error: Cannot
access memory at address 0x1>, args=0x60b97c21) at
kernel/printk/printk.c:3079
#18 0x0000000060a554de in printk_deferred (fmt=0x60d895bb <textbuf+91>
"\n") at kernel/printk/printk.c:3091
#19 0x000000006092680f in _warn_unseeded_randomness
(previous=<optimized out>, caller=<optimized out>, func_name=<optimized
out>) at drivers/char/random.c:1534
#20 _warn_unseeded_randomness (func_name=0x60abf380 <__func__.38>
"get_random_u32", caller=0x608b5f25 <bucket_table_alloc+287>,
previous=0x35) at drivers/char/random.c:1516
#21 0x0000000060927d47 in get_random_u32 () at
drivers/char/random.c:2221
#22 0x00000000608b5f25 in bucket_table_alloc (nbuckets=64, gfp=3264,
ht=<optimized out>) at lib/rhashtable.c:203
#23 0x00000000608b6733 in rhashtable_init (ht=0x60c60e30
<init_ipc_ns+80>, params=0x608b5e06 <bucket_table_alloc>) at
lib/rhashtable.c:1061
#24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
at ipc/util.c:119
#25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
ipc/sem.c:254
#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2
#28 0x00007f8990ab8fb2 in call_init (l=<optimized out>,
#29 0x00007f8990ab90b9 in call_init (env=0x7ffe3e7a4cc8,
argv=0x7ffe3e7a4c98, argc=5, l=<optimized out>) at dl-init.c:30
#30 _dl_init (main_map=0x61497ea0, argc=5, argv=0x7ffe3e7a4c98,
env=0x7ffe3e7a4cc8) at dl-init.c:119
#31 0x00007f89909d82bd in __GI__dl_catch_exception
skeleton.c:182
dl-open.c:758
#33 0x00007f89909d8260 in __GI__dl_catch_exception
#34 0x00007f8990abc8ca in _dl_open (file=0x7ffe3e7a22a0
"libnss_nis.so.2", mode=-2147483646, caller_dlopen=0x7f89909bf3a6
<nss_load_library+294>, nsid=-2, argc=5, argv=0x7ffe3e7a2000,
env=0x7ffe3e7a4cc8)
at dl-open.c:837
dl-libc.c:96
#36 0x00007f89909d8260 in __GI__dl_catch_exception
#37 0x00007f89909d831f in __GI__dl_catch_error
#38 0x00007f89909d77b7 in dlerror_run
#39 0x00007f89909d7846 in __GI___libc_dlopen_mode
2147483646) at dl-libc.c:195
nsswitch.c:359
#41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
nsswitch.c:467
#42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
at ../nss/getXXbyYY_r.c:315
../sysdeps/unix/grantpt.c:152
#47 0x00007f8990a9394e in __GI_openpty (amaster=0x60c2bd94,
aslave=0x60c2bd98, name=0x0, termp=0x0, winp=0x0) at openpty.c:103
#48 0x00000000604a1f65 in openpty_cb (arg=0x60c2bd94) at arch/um/os-
Linux/sigio.c:407
#49 0x00000000604a58d0 in start_idle_thread (stack=0x60c28000
<init_thread_info>, switch_buf=0x60c31e08 <init_task+4936>) at
arch/um/os-Linux/skas/process.c:598
#50 0x0000000060004a3d in start_uml () at
arch/um/kernel/skas/process.c:45
#51 0x00000000600047b2 in linux_main (argc=1624806843, argv=0x40709000)
at arch/um/kernel/um_arch.c:334
#52 0x000000006000574f in main (argc=5, argv=0x7ffe3e7a4c98, envp=0x35)
at arch/um/os-Linux/main.c:144
(gdb)
```
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
Johannes Berg
2021-03-03 23:00:03 UTC
Reply
Permalink
Post by Ritesh Raj Sarraf
#24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
at ipc/util.c:119
#25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
ipc/sem.c:254
#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2
You're in the init of libcom_err.so.2, which is loaded by
Post by Ritesh Raj Sarraf
"libnss_nis.so.2"
nsswitch.c:359
#41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
nsswitch.c:467
#42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
at ../nss/getXXbyYY_r.c:315
You have a strange nsswitch configuration that causes all of this
(libnss_nis.so.2 -> libcom_err.so.2) to get loaded.

Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
... Linux's sem_init() instead of libpthread's.

And then the crash.

Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.


johannes
Anton Ivanov
2021-03-04 07:40:01 UTC
Reply
Permalink
Post by Johannes Berg
Post by Ritesh Raj Sarraf
#24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
at ipc/util.c:119
#25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
ipc/sem.c:254
#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2
You're in the init of libcom_err.so.2, which is loaded by
Post by Ritesh Raj Sarraf
"libnss_nis.so.2"
nsswitch.c:359
#41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
nsswitch.c:467
#42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
at ../nss/getXXbyYY_r.c:315
You have a strange nsswitch configuration that causes all of this
(libnss_nis.so.2 -> libcom_err.so.2) to get loaded.
Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
... Linux's sem_init() instead of libpthread's.
And then the crash.
Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.
I have not looked in depth in how the linking process works, but it
should have picked up the sem_init from the kernel library, not libc.

We are already supposed to do that regarding kernel vs libc string.h
functions - memcpy, etc.

Though for all of them the libc does the same so invoking the wrong one
does not kill you so this may have been broken for a while and we were
simply not noticing it.
Post by Johannes Berg
johannes
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
Anton Ivanov
2021-03-04 07:50:01 UTC
Reply
Permalink
On Thu, 04 Mar 2021 07:40:00 +0900,
Post by Johannes Berg
Post by Ritesh Raj Sarraf
#24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
at ipc/util.c:119
#25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
ipc/sem.c:254
#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2
You're in the init of libcom_err.so.2, which is loaded by
Post by Ritesh Raj Sarraf
"libnss_nis.so.2"
nsswitch.c:359
#41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
nsswitch.c:467
#42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
at ../nss/getXXbyYY_r.c:315
You have a strange nsswitch configuration that causes all of this
(libnss_nis.so.2 -> libcom_err.so.2) to get loaded.
Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
... Linux's sem_init() instead of libpthread's.
And then the crash.
Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.
objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file). It also does renaming symbols. But
not sure this is the ideal solution.
How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ? AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.
It used to handle them. I do not think it does now - something broke and
it's fairly recent.

I actually have something which confirms this.

I worked on a patch around 5.8-5.9 which would give the option to pick
up libc equivalents for the functions from string.h and there was a
clear performance difference of ~ 20%+ This is because UML has no means
of optimizing them and picks up the worst case scenario x86 version.

I parked that for a while, because had to look at other stuff at work.

I restarted working on it after 5.10. My first observation was that
despite not changing anything in the patches, the gain was no longer
there. The performance was the same as if it picked up libc equivalents.

I can either try to reproduce the nss config which causes the sem_init
issue or use my own libc patchset to try to dissect. The problem commit
will be roughly around the time the performance difference from applying
the "switch to libc" goes away.

Brgds,

A.
-- Hajime
_______________________________________________
linux-um mailing list
http://lists.infradead.org/mailman/listinfo/linux-um
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
Johannes Berg
2021-03-04 08:00:01 UTC
Reply
Permalink
Post by Johannes Berg
Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.
objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file). It also does renaming symbols. But
not sure this is the ideal solution.
Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols to
export, for symbol versioning, that we could/should use to basically not
export any of the kernel symbols out to libs.
How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ? AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.
I fear it doesn't?

johannes

Loading...