Ticket #752 (closed defect: fixed)

Opened 2 years ago

Last modified 23 months ago

[PATCH] asyncns timeout causes delay when pulseaudio service not running

Reported by: jezaustin Owned by: lennart
Milestone: 0.9.22 Component: core
Keywords: asyncns timeout socket DNS Cc:

Description

Client applications try to connect to pulseaudio, and if there is no service running the pa_context_connect function blocks for about 5s. I discovered this through testing the firefox betas (https://bugzilla.mozilla.org/show_bug.cgi?id=533470), and I've also raised the issue with libcanberra (http://bugs.freedesktop.org/show_bug.cgi?id=25742).

To reproduce: 1) compile pulseaudio with HAVE_ASYNCNS. 2) make sure pulse is not running 3) run any application which attempts a pa_context_connect.

There is no delay at all if either a) pulseaudio process is running, or b) pulseaudio is compiled without HAVE_ASYNCNS.

When the symptoms are showing, an error message "socket(): Address family not supported by protocol" is displayed immediately, then the delay is experienced after which the client application resumes. I think that pulsecore/socket-client.c:486 in pa_socket_client_new_string calls asyncns_getaddrinfo which produces the error message, then line 487 calls start_timeout which produces the delay.

Is there any way of avoiding the timeout when using ASYNCNS?

Attachments

pulseaudio.patch (390 bytes) - added by jezaustin 23 months ago.
patch to fix 5s hang

Change History

Changed 2 years ago by lennart

What distribution is this? What kind of kernel is this? does it lack ipv6?

Could you please do a "PULSE_LOG=99 pactl stat"? This should be more verbose while connecting to PA.

Changed 2 years ago by jezaustin

I'm running gentoo, kernel 2.6.31-gentoo-r6. No IPV6.

$ PULSE_LOG=99 pactl stat Using shared memory pool with 1024 slots of size 64.0 KiB each, total size is 64.0 MiB, maximum usable slot size is 65472 Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native... connect(): No such file or directory (2) Trying to connect to /home/jra/.pulse/native... connect(): No such file or directory (2) Trying to connect to /var/run/pulse/native... connect(): No such file or directory (2) Trying to connect to tcp4:127.0.0.1... connect(): Connection refused Trying to connect to tcp6:[::1]... socket(): Address family not supported by protocol Trying to autospawn... D: caps.c: Cleaning up privileges. Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native... SHM possible: yes Protocol version: remote 16, local 16 Negotiated SHM: yes Currently in use: 1 blocks containing 63.9 KiB bytes total. Allocated during whole lifetime: 3 blocks containing 127.9 KiB bytes total. Sample cache size: 0 B User name: jra Host Name: harrison Server Name: pulseaudio Server Version: 0.9.21 Default Sample Specification: s16le 2ch 44100Hz Default Channel Map: front-left,front-right Default Sink: alsa_output.pci-0000_00_1b.0.analog-stereo Default Source: alsa_input.pci-0000_00_1b.0.analog-stereo Cookie: 4e590cae

$ pulseaudio -D $ PULSE_LOG=99 pactl stat Using shared memory pool with 1024 slots of size 64.0 KiB each, total size is 64.0 MiB, maximum usable slot size is 65472 Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native... SHM possible: yes Protocol version: remote 16, local 16 Negotiated SHM: yes Currently in use: 1 blocks containing 63.9 KiB bytes total. Allocated during whole lifetime: 36 blocks containing 1.2 MiB bytes total. Sample cache size: 0 B User name: jra Host Name: harrison Server Name: pulseaudio Server Version: 0.9.21 Default Sample Specification: s16le 2ch 44100Hz Default Channel Map: front-left,front-right Default Sink: alsa_output.pci-0000_00_1b.0.analog-stereo Default Source: alsa_input.pci-0000_00_1b.0.analog-stereo Cookie: 9c353b82

Changed 2 years ago by lennart

And between which two lines is the long delay?

(tbh my interest in fixing this could be vastly increased by reproducing this problem on a sane distribution with a normal kernel. i kinda get the feeling that this is simply caused by some gentoo weirdness with a weird kernel)

Changed 2 years ago by coling

Hi - Please use WikiFormatting for pastes like that :)

Here is a cleaner version:

 $ PULSE_LOG=99 pactl stat
 Using shared memory pool with 1024 slots of size 64.0 KiB each, total size
 is 64.0 MiB, maximum usable slot size is 65472
 Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-
 runtime/native...
 connect(): No such file or directory (2)
 Trying to connect to /home/jra/.pulse/native...
 connect(): No such file or directory (2)
 Trying to connect to /var/run/pulse/native...
 connect(): No such file or directory (2)
 Trying to connect to tcp4:127.0.0.1...
 connect(): Connection refused
 Trying to connect to tcp6:[::1]...
 socket(): Address family not supported by protocol
 Trying to autospawn...
 D: caps.c: Cleaning up privileges.
 Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-
 runtime/native...
 SHM possible: yes
 Protocol version: remote 16, local 16
 Negotiated SHM: yes
 Currently in use: 1 blocks containing 63.9 KiB bytes total.
 Allocated during whole lifetime: 3 blocks containing 127.9 KiB bytes
 total.
 Sample cache size: 0 B
 User name: jra
 Host Name: harrison
 Server Name: pulseaudio
 Server Version: 0.9.21
 Default Sample Specification: s16le 2ch 44100Hz
 Default Channel Map: front-left,front-right
 Default Sink: alsa_output.pci-0000_00_1b.0.analog-stereo
 Default Source: alsa_input.pci-0000_00_1b.0.analog-stereo
 Cookie: 4e590cae
 $ pulseaudio -D
 $ PULSE_LOG=99 pactl stat
 Using shared memory pool with 1024 slots of size 64.0 KiB each, total size
 is 64.0 MiB, maximum usable slot size is 65472
 Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-
 runtime/native...
 SHM possible: yes
 Protocol version: remote 16, local 16
 Negotiated SHM: yes
 Currently in use: 1 blocks containing 63.9 KiB bytes total.
 Allocated during whole lifetime: 36 blocks containing 1.2 MiB bytes total.
 Sample cache size: 0 B
 User name: jra
 Host Name: harrison
 Server Name: pulseaudio
 Server Version: 0.9.21
 Default Sample Specification: s16le 2ch 44100Hz
 Default Channel Map: front-left,front-right
 Default Sink: alsa_output.pci-0000_00_1b.0.analog-stereo
 Default Source: alsa_input.pci-0000_00_1b.0.analog-stereo
 Cookie: 9c353b82

Can you highlight if the delay in the auto spawn case was obvious when it was outputting it's debug? e.g did each connect take a while to timeout? or was there a specific pause at a given connection attempt?

Changed 2 years ago by jezaustin

Previous comment was pulseaudio compiled with HAVE_LIBASYNCNS. recompiled without HAVE_LIBASYNCNS:

$ PULSE_LOG=99 pactl stat
Using shared memory pool with 1024 slots of size 64.0 KiB each, total size is 64.0 MiB, maximum usable slot size is 65472
Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native...
connect(): No such file or directory (2)
Trying to connect to /home/jra/.pulse/native...
connect(): No such file or directory (2)
Trying to connect to /var/run/pulse/native...
connect(): No such file or directory (2)
Trying to connect to tcp4:127.0.0.1...
connect(): Connection refused
Trying to connect to tcp6:[::1]...
socket(): Address family not supported by protocol
Trying to autospawn...
D: caps.c: Cleaning up privileges.
Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native...
SHM possible: yes
Protocol version: remote 16, local 16
Negotiated SHM: yes
Currently in use: 1 blocks containing 63.9 KiB bytes total.
Allocated during whole lifetime: 3 blocks containing 127.9 KiB bytes total.
Sample cache size: 0 B
User name: jra
Host Name: harrison
Server Name: pulseaudio
Server Version: 0.9.21
Default Sample Specification: s16le 2ch 44100Hz
Default Channel Map: front-left,front-right
Default Sink: alsa_output.pci-0000_00_1b.0.analog-stereo
Default Source: alsa_input.pci-0000_00_1b.0.analog-stereo
Cookie: 4c571bdb

$ pulseaudio -D
$ PULSE_LOG=99 pactl stat
Using shared memory pool with 1024 slots of size 64.0 KiB each, total size is 64.0 MiB, maximum usable slot size is 65472
Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native...
connect(): No such file or directory (2)
Trying to connect to /home/jra/.pulse/native...
connect(): No such file or directory (2)
Trying to connect to /var/run/pulse/native...
connect(): No such file or directory (2)
Trying to connect to tcp4:127.0.0.1...
connect(): Connection refused
Trying to connect to tcp6:[::1]...
socket(): Address family not supported by protocol
Trying to autospawn...
D: caps.c: Cleaning up privileges.
Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native...
SHM possible: yes
Protocol version: remote 16, local 16
Negotiated SHM: yes
Currently in use: 1 blocks containing 63.9 KiB bytes total.
Allocated during whole lifetime: 3 blocks containing 127.9 KiB bytes total.
Sample cache size: 0 B
User name: jra
Host Name: harrison
Server Name: pulseaudio
Server Version: 0.9.21
Default Sample Specification: s16le 2ch 44100Hz
Default Channel Map: front-left,front-right
Default Sink: alsa_output.pci-0000_00_1b.0.analog-stereo
Default Source: alsa_input.pci-0000_00_1b.0.analog-stereo
Cookie: 35deba41

Both cases reported in this comment were quick. In comment 2, the first case (without pulseaudio process running) was delayed approx. 5s before the "Trying to autospawn..." line.

I'm totally new to this code, but the delay seems to be about the length of CONNECT_TIMEOUT (I tried different values, and it is), and it seems as though asyncns_getaddrinfo() is done quickly, but we wait for the timeout to be reached anyway. I notice that the event bitmask is just PA_IO_EVENT_INPUT (socket-client.c:485). Would the callback have a chance of being made with a more liberal mask of eg. PA_IO_EVENT_INPUT | PA_IO_EVENT_ERROR ?

I've tested with libasyncns versions 0.3 and 0.7.

By the way, the gentoo kernel is certainly not "weird" -- it's almost completely stock with a few bug- and security-fixes applied (generally fixes that have already been accepted by the upstream kernel). Anyway, I'll compile up a vanilla kernel but I don't think it will affect this...

Changed 2 years ago by jezaustin

Actually no time to test with a vanilla kernel until January, but anyway I don't think it's important. When I have time, I'll check with an IPv6-enabled kernel. I suspect that pulseaudio is not learning about an error from asyncns (namely "Address family not supported by protocol") until the timeout finishes, but it can receive a "Connection refused" message in a timely fashion. I think that ignoring an error until a timeout is reached is a bug. It may also be a bug for asyncns to even attempt a tcp6 connection in an environment without IPv6.

Changed 2 years ago by jezaustin

As I suspected, a kernel configured with CONFIG_IPV6 doesn't pause:

$ PULSE_LOG=99 pactl stat
Using shared memory pool with 1024 slots of size 64.0 KiB each, total size is 64.0 MiB, maximum usable slot size is 65472
Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native...
connect(): No such file or directory (2)
Trying to connect to /home/jra/.pulse/native...
connect(): No such file or directory (2)
Trying to connect to /var/run/pulse/native...
connect(): No such file or directory (2)
Trying to connect to tcp4:127.0.0.1...
connect(): Connection refused
Trying to connect to tcp6:[::1]...
connect(): Connection refused
Trying to autospawn...
D: caps.c: Cleaning up privileges.
Trying to connect to /home/jra/.pulse/c00984e45ca680c114160c07498b171f-runtime/native...
SHM possible: yes
Protocol version: remote 16, local 16
Negotiated SHM: yes
Currently in use: 1 blocks containing 63.9 KiB bytes total.
Allocated during whole lifetime: 3 blocks containing 127.9 KiB bytes total.
Sample cache size: 0 B
User name: jra
Host Name: harrison
Server Name: pulseaudio
Server Version: 0.9.21
Default Sample Specification: s16le 2ch 44100Hz
Default Channel Map: front-left,front-right
Default Sink: alsa_output.pci-0000_00_1b.0.analog-stereo
Default Source: alsa_input.pci-0000_00_1b.0.analog-stereo
Cookie: 7b3d5ae8

Unless you consider a kernel without IPv6 to be broken, this is a bug.

Changed 2 years ago by lennart

Could you get me a full bt over all threads when the freeze happens?

Changed 2 years ago by jezaustin

Only one thread on pactl stat during the freeze:

#0  0x00007ffa13836a57 in ppoll (fds=0x1f7eeb0, nfds=2, 
    timeout=0x7fff5708f4e0, sigmask=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:57
        oldtype = 0
        tval = {tv_sec = 1, tv_nsec = 432795426}
        result = <value optimized out>
#1  0x00007ffa176c2a87 in pa_mainloop_poll (m=0x1f73490)
    at pulse/mainloop.c:887
        ts = {tv_sec = 1, tv_nsec = 998698000}
        __func__ = "pa_mainloop_poll"
        __PRETTY_FUNCTION__ = "pa_mainloop_poll"
#2  0x00007ffa176c3da9 in pa_mainloop_iterate (m=0x1f73490, 
    block=<value optimized out>, retval=0x7fff5708f61c) at pulse/mainloop.c:961
        r = -514
        __func__ = "pa_mainloop_iterate"
        __PRETTY_FUNCTION__ = "pa_mainloop_iterate"
#3  0x00007ffa176c3e60 in pa_mainloop_run (m=0x1f73490, retval=0x7fff5708f61c)
    at pulse/mainloop.c:979
        r = <value optimized out>
#4  0x0000000000405e71 in main (argc=<value optimized out>, 
    argv=0x7fff5708f738) at utils/pactl.c:1289
        m = 0x1f73490
        ret = 1
        c = <value optimized out>
        server = 0x0
        bn = 0x7fff5709115d "pactl"
        long_options = {{name = 0x4072f2 "server", has_arg = 1, flag = 0x0, 
            val = 115}, {name = 0x4072f9 "client-name", has_arg = 1, 
            flag = 0x0, val = 110}, {name = 0x407305 "version", has_arg = 0, 
            flag = 0x0, val = 256}, {name = 0x407284 "help", has_arg = 0, 
            flag = 0x0, val = 104}, {name = 0x0, has_arg = 0, flag = 0x0, 
            val = 0}}
        __func__ = "main"
        __PRETTY_FUNCTION__ = "main"

Changed 2 years ago by lennart

So, there is no other thread?

Hmm, is there a child process around?

Changed 2 years ago by jezaustin

At the time of the hang, I think the other threads have finished already. There are no other processes.

Consider testing for system support of tcp6 before doing pulse/context.c line 1000:

c->server_list = pa_strlist_prepend(c->server_list, "tcp6:[::1]");

although, this would not fix the more general problem which might exist, where pulse needlessly waits for its timeout whenever libasyncns returns an error instead of "Connection refused".

Changed 23 months ago by jezaustin

Hi, had another look at this tonight, I think I found the real problem by comparing the call to sockaddr_prepare in pa_socket_client_new_sockaddr with the one in asyncns_cb, the former tests for failure, while the latter doesn't. By adding that test, I get timely completion running pactl stat on a kernel without ipv6.

Changed 23 months ago by jezaustin

patch to fix 5s hang

Changed 23 months ago by coling

  • summary changed from asyncns timeout causes delay when pulseaudio service not running to [PATCH] asyncns timeout causes delay when pulseaudio service not running
  • milestone set to 0.9.22

Great, thanks for that.

I'm not super familiar with this side of things so will leave for Lennart to fix. I've marked the milestone and updated the title to make sure it's not forgotten.

Thanks again.

Changed 23 months ago by lennart

  • status changed from new to closed
  • resolution set to fixed

Thanks a lot. Patch looks fine. Merged now.

Note: See TracTickets for help on using tickets.