Project

General

Profile

Bug #5295

mpegts: too much queued input data

Added by Luis Alves almost 4 years ago. Updated over 2 years ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2018-10-27
Due date:
% Done:

0%

Estimated time:
Found in version:
github master
Affected Versions:

Description

I'm having this issue where at some point tvheadend starts spamming the "too much queued input data" message and locks out (and killing the process is the only thing that can be done).

This happens all the time I have several active recordings where the input streams are from an iptv network and because of this issue I switched the recordings to use the stream from dvb adapters.
It seemed to solve the issue and for around 3 days everything worked fine - but today happened again (had to kill tvheadend):

My recording profile use hw transcoding (intel/vaapi) with no cpu usage.

Started at around 7am:

Oct 27 07:00:33 server tvheadend[10216]: mpegts: too much queued input data (over 50MB) for NOS#11, discarding new
Oct 27 07:00:53 server tvheadend[10216]: message repeated 2 times: [ mpegts: too much queued input data (over 50MB) for NOS#11, discarding new]
Oct 27 07:01:03 server tvheadend[10216]: mpegts: too much queued input data (over 50MB) for NOS#11, discarding new
Oct 27 07:04:53 server tvheadend[10216]: message repeated 23 times: [ mpegts: too much queued input data (over 50MB) for NOS#11, discarding new]

[...]

Oct 27 10:08:51 server tvheadend[10216]: mpegts: too much queued input data (over 50MB) for NOS#11, discarding new
Oct 27 10:09:01 server tvheadend[10216]: mpegts: too much queued input data (over 50MB) for NOS#11, discarding new
Oct 27 10:09:11 server tvheadend[10216]: mpegts: too much queued input data (over 50MB) for NOS#11, discarding new
Oct 27 10:09:31 server tvheadend[10216]: message repeated 2 times: [ mpegts: too much queued input data (over 50MB) for NOS#11, discarding new]

What I have seen is that whenever this message starts, tvheadend it will likely to end up dead.
In what conditions does this message show up? (I suppose it's when tvheadend can't flush the input data to the "output")

Anyway, the action tvheadend takes when this happens (discarding new data) doesn't seem to be enough to handle the situation.

How can I further debug this issue?

History

#1

Updated by Luis Alves almost 4 years ago

I was doing some debug and I have noticed that when I abort multiple recordings at once I always get this issue.

Can someone try to start 4 recordings, then select all 4 and press "Abort" (try at least 2 or 3 times).

#2

Updated by Luis Alves almost 4 years ago

(by the way, do the above test using transcoding - I will try without)

Also, I just checked the logs and the original issue happened exactly when 1 recording ended and a new one started:

Oct 27 06:59:30 server tvheadend[10216]: dvr: "xxxxxxx" on "xxxxxxx" recorder starting
Oct 27 06:59:30 server tvheadend[10216]: subscription: 1202: "xxxxxxxxxxx" subscribing on channel "xxxxxxxxx", weight: 500, adapter: "NOS#11", network: "xxxx", mux: "xxxx", provider: "xxxxx", service: "xxxxxxx", profile="rec_hd", username="admin" 
Oct 27 06:59:34 server tvheadend[10216]: dvr: /STORAGE/tvheadend/recordings/2018-10-27/xxxxxxxxxxxxxxxxx from adapter: "NOS#11", network: "xxxx", mux: "xxxx", provider: "xxxxx", service: "xxxxx" 
Oct 27 06:59:34 server tvheadend[10216]: dvr:  #  type              lang  resolution  aspect ratio  sample rate  channels
Oct 27 06:59:34 server tvheadend[10216]: dvr:  1  H264                    1920x1080   ?                                    <disabled, no valid input>
Oct 27 06:59:34 server tvheadend[10216]: dvr:  2  AAC               eng                             48000        2
Oct 27 07:00:00 server tvheadend[10216]: subscription: 1201: "xxxxxxxxxxxxxxxxx" unsubscribing from "xxxxxxx", username="admin" 
Oct 27 07:00:00 server tvheadend[10216]: dvr: "xxxxxxxxxxxxxxx" on "xxxxxxxxxxxx": End of program: Completed OK
Oct 27 07:00:33 server tvheadend[10216]: mpegts: too much queued input data (over 50MB) for NOS#11, discarding new
Oct 27 07:00:53 server tvheadend[10216]: message repeated 2 times: [ mpegts: too much queued input data (over 50MB) for NOS#11, discarding new]
Oct 27 07:01:03 server tvheadend[10216]: mpegts: too much queued input data (over 50MB) for NOS#11, discarding new

I suspect that something wrong is happening when recordings end(?)

#3

Updated by Luis Alves almost 4 years ago

Just confirmed that I when NOT using transcoding, the issues doesn't happen.

#4

Updated by Luis Alves almost 4 years ago

And even when using transcoding with "no transcoding" (audio and video codecs = Copy) the issue still happens.

#5

Updated by Joe User almost 4 years ago

I do not have hw transcoding setup at the moment, so sorry, I cannot test.
At the end of transcoding, the headers are rewritten, but shouldn't cause any extra load on the system.
Does it occur if you use "stop" instead of "abort"? (abort also deletes the files...)

#6

Updated by Luis Alves almost 4 years ago

I was now testing with IPTV as input, and most of the times it hangs when start/stopping just 1 recording (much easier to replicate).

I do not have hw transcoding setup at the moment, so sorry, I cannot test.

No need for hw transcode, even with audio/video codec set to "Copy" it will hang (no transcoding at all).

You just need to use a "Transcode/av-lib" profile...

#7

Updated by Jaroslav Kysela almost 4 years ago

  • Priority changed from High to Normal
#8

Updated by Luis Alves almost 4 years ago

Hi Jaroslav,
Were you able to replicate this? Or someone else?

From the quick debug I've made it seemed to be related to a global lock being held forever somewhere along the recording stop actions.

And there seem to happen 2 situations (one that causes tvh to hang, other is ok):
Two functions grabbing the global lock simultaneously, where one depends on the other to end(?):
1 - OK) The first one to get the lock ends and releases for the next one.
2 - BAD) The first one to get the lock has to wait for the other one that is still waiting for the mutex to unlock.

It's my first time looking at the whole processing chain "input->stream(profile/trancode)->record".
Can you point me what files/functions I should look into?

Thank you.

#9

Updated by Jaroslav Kysela almost 4 years ago

If it's global lock, just run tvh in gdb (or attach gdb to tvh) and do 'print global_lock' when the lock occurs. Use the 'sid' value to determine the thread who owns the lock and do backtrace in this thread.

#10

Updated by Luis Alves almost 4 years ago

(gdb) print global_lock

$2 = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 17207, Robust = No, Shared = No, Protocol = None}

(gdb) thread find 17207

Thread 80 has target id 'Thread 0x7fffb77db700 (LWP 17207)'

(gdb) info threads 80

  80   Thread 0x7fffb77db700 (LWP 17207) "tvh:tcp-start" 0x00007ffff617fd2d in __GI___pthread_timedjoin_ex (threadid=140736255047424, [email protected]=0x7fffb77da410, 
    [email protected]=0x0, [email protected]=true) at pthread_join_common.c:89

(gdb) thread 80

[Switching to thread 80 (Thread 0x7fffb77db700 (LWP 17207))]
#0  0x00007ffff617fd2d in __GI___pthread_timedjoin_ex (threadid=140736255047424, [email protected]=0x7fffb77da410, [email protected]=0x0, [email protected]=true)
    at pthread_join_common.c:89
89      in pthread_join_common.c

(gdb) bt

#0  0x00007ffff617fd2d in __GI___pthread_timedjoin_ex (threadid=140736255047424, [email protected]=0x7fffb77da410, [email protected]=0x0, [email protected]=true)
    at pthread_join_common.c:89
#1  0x00007ffff617fb5c in __pthread_join (threadid=<optimized out>, [email protected]=0x7fffb77da410) at pthread_join.c:24
#2  0x00005555557bf628 in dvr_rec_unsubscribe (de=0x7fffa8007c50) at src/dvr/dvr_rec.c:204
#3  0x00005555557b72e5 in dvr_stop_recording (de=0x7fffa8007c50, [email protected]=300, [email protected]=1, [email protected]=0) at src/dvr/dvr_db.c:2728
#4  0x00005555557ba231 in dvr_entry_cancel (de=0x7fffa8007c50, rerecord=0) at src/dvr/dvr_db.c:4647
#5  0x0000555555798f94 in api_idnode_handler (idc=0x555556e04360 <dvr_entry_class>, perm=0x7fffa0014900, args=<optimized out>, resp=<optimized out>, [email protected]=0x55555579e360 <api_dvr_cancel>, 
    [email protected]=0x55555681ea57 "cancel", destroyed=0) at src/api/api_idnode.c:645
#6  0x000055555579e1a9 in api_dvr_entry_cancel (perm=<optimized out>, opaque=<optimized out>, op=<optimized out>, args=<optimized out>, resp=<optimized out>) at src/api/api_dvr.c:329
#7  0x00005555557cf58f in webui_api_handler (hc=0x7fffb77da800, remain=0x7fffa000ac9a "dvr/entry/cancel", opaque=<optimized out>) at src/webui/webui_api.c:43
#8  0x000055555573e81a in http_exec ([email protected]=0x7fffb77da800, [email protected]=0x7fffb77da580, remain=0x7fffa000ac9a "dvr/entry/cancel") at src/http.c:1220
#9  0x000055555573fc8d in http_cmd_post (spill=<optimized out>, hc=0x7fffb77da800) at src/http.c:1357
#10 http_process_request (hc=0x7fffb77da800, spill=<optimized out>) at src/http.c:1379
#11 0x000055555573ee89 in process_request ([email protected]=0x7fffb77da800, [email protected]=0x7fffb77da780) at src/http.c:1512
#12 0x000055555574002b in http_serve_requests ([email protected]=0x7fffb77da800) at src/http.c:1997
#13 0x000055555574033b in http_serve (fd=38, opaque=0x7fffb0001148, peer=0x7fffb0001160, self=0x7fffb00011e0) at src/http.c:2048
#14 0x0000555555736f02 in tcp_server_start (aux=0x7fffb0001120) at src/tcp.c:715
#15 0x0000555555731e88 in thread_wrapper (p=0x7fffb00012a0) at src/wrappers.c:181
#16 0x00007ffff617e6db in start_thread (arg=0x7fffb77db700) at pthread_create.c:463
#17 0x00007ffff4c0d88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It seems that it gets stuck on the "dvr_rec_unsubscribe".

Let me know if you want me to try anything else.

#11

Updated by Luis Alves almost 4 years ago

Is "dvr_rec_unsubscribe" stuck waiting for the "dvr_thread" to finish?

(gdb) thread find dvr
Thread 24 has target name 'tvh:dvr-inotify'
Thread 96 has target name 'tvh:dvr'
(gdb) thread 96
[Switching to thread 96 (Thread 0x7fffb6bd5700 (LWP 17707))]
#0  0x00007ffff61849f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x555557448320) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88      ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
#0  0x00007ffff61849f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x555557448320) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x5555574482d0, cond=0x5555574482f8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait ([email protected]=0x5555574482f8, [email protected]=0x5555574482d0) at pthread_cond_wait.c:655
#3  0x00005555557325e5 in tvh_cond_wait ([email protected]=0x5555574482f8, [email protected]=0x5555574482d0) at src/wrappers.c:298
#4  0x00005555557beb5f in dvr_thread (aux=<optimized out>) at src/dvr/dvr_rec.c:1615
#5  0x0000555555731e88 in thread_wrapper (p=0x55555748d750) at src/wrappers.c:181
#6  0x00007ffff617e6db in start_thread (arg=0x7fffb6bd5700) at pthread_create.c:463
#7  0x00007ffff4c0d88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Seems that dvr_thread is stuck on "tvh_cond_wait":

[...]
  while(run) {
    sm = TAILQ_FIRST(&sq->sq_queue);
    if(sm == NULL) {
      tvh_cond_wait(&sq->sq_cond, &sq->sq_mutex);    <<<< here
      continue;
    }
[...]

#12

Updated by Jaroslav Kysela almost 4 years ago

There should be SMT_EXIT event queued before join:

https://github.com/tvheadend/tvheadend/blob/master/src/dvr/dvr_rec.c#L200

#13

Updated by Luis Alves almost 4 years ago

I've noticed that this also hangs tvheadend:

1) Setup 2 profiles:
- one using transcoding (a/v codecs can be "Copy")
- other using anything else (like mpeg-ts passthru).

2) Go to one of your channels and remove the associated services.
3) Go to the EPG and start recording the current programme.

You will get different results using each profile:

non-transcoding:

SM = 8
2018-10-29 16:37:45.828 [WARNING] subscription: 0003: restarting channel FOX News
SM = 8
2018-10-29 16:37:47.828 [WARNING] subscription: 0003: restarting channel FOX News
[...]

transcoding:

2018-10-29 16:40:23.022 [ NOTICE] subscription: 0003: No input source available for subscription "DVR: Happening Now" to channel "FOX News" 
2018-10-29 16:40:25.022 [ NOTICE] subscription: 0003: No input source available for subscription "DVR: Happening Now" to channel "FOX News" 
SM = 7
SM = 7
2018-10-29 16:40:25.022 [  ERROR] dvr: Recording unable to start: "Happening Now": No assigned adapters

And then pressing the dvr "Abort/Stop" buttons right after this message "2018-10-29 16:44:59.358 [ ERROR] dvr: Recording unable to start: "Happening Now": No assigned adapters" it hangs (thread never gets the SMT_EXIT).

SM is a printf on the code showing the message type:
7 = SMT_NOSTART
8 = SMT_NOSTART_WARN

#14

Updated by Jaroslav Kysela almost 4 years ago

A different 'restart' settings in the profile.

#15

Updated by Luis Alves almost 4 years ago

I traced down the problem to the SMT_EXIT message not being delivered to the dvr_thread.

Then I observed that the message arrives to the profile input queue function but the handler thread never gets signaled to "forward" that exit message.

Looking at the queue input function, the logic seems to be that it will only wake up the queue handler thread when there is already something in the queue "on a new message".
So, if the queue is empty, and a new message arrives, it only gets delivered when the next message arrives (in this case, never).

To test this theory I've done the following hack (deliver the exit message twice):

diff --git a/src/dvr/dvr_rec.c b/src/dvr/dvr_rec.c
index fe1b96a07..bc8287cd2 100644
--- a/src/dvr/dvr_rec.c
+++ b/src/dvr/dvr_rec.c
@@ -197,10 +197,24 @@ dvr_rec_unsubscribe(dvr_entry_t *de)

   de->de_in_unsubscribe = 1;

+  streaming_target_deliver(prch->prch_st, streaming_msg_create(SMT_EXIT));
   streaming_target_deliver(prch->prch_st, streaming_msg_create(SMT_EXIT));

   atomic_add(&de->de_thread_shutdown, 1);

Which seems to fix it.

Then, changing the logic to always wake up the thread when we insert a new message on the queue also seems to fix it.

diff --git a/src/profile.c b/src/profile.c
index 00570b9fa..4f710b07d 100644
--- a/src/profile.c
+++ b/src/profile.c
@@ -728,9 +728,8 @@ profile_input_queue(void *opaque, streaming_message_t *sm)
   psm->psm_sm = sm;
   pthread_mutex_lock(&prsh->prsh_queue_mutex);
   if (prsh->prsh_queue_run) {
-    if (TAILQ_FIRST(&prsh->prsh_queue))
-      tvh_cond_signal(&prsh->prsh_queue_cond, 0);
     TAILQ_INSERT_TAIL(&prsh->prsh_queue, psm, psm_link);
+    tvh_cond_signal(&prsh->prsh_queue_cond, 0);
   } else {
     streaming_msg_free(sm);
     free(psm);

Possible fix here:
https://github.com/tvheadend/tvheadend/pull/1198

Does this makes sense?

I'll try do more testing...

#16

Updated by Luis Alves almost 4 years ago

So far so good... no more tvheadend hanging.

I'm wondering how this issue didn't cause more issues...

#17

Updated by M. Bergmann almost 4 years ago

Hello,

I wish this thread would have been opened a week ago....
I experienced the same problems as described by Luis and kodiaq (https://tvheadend.org/issues/4774#note-3): WebGUI of TVheadend didn't show up, Kodi wasn't able to get a connection, but systemctl just showed: Active(running). Killing the service took some time, so I imagine Ubuntu made use of the default timeout of 90 seconds to finally kill the process. After a reboot and and thus restart of TVH, the satip-server (Digibit R1) hasn't been found by TVH, I always had to reboot the Digibit (with perex firmware) to get it recognized.

I did not looked into in more details, because I made several changes to the system (ubuntu) at once: Installed docker, changed kernel from 4.4 to 4.15 and installed ffmpeg and iptv-channels for the first time. So I wasn't aware where to start the search and finally restored a backup.

I will keep an eye on this, but hopefully the failure won't return ;-)

#18

Updated by Jaroslav Kysela almost 4 years ago

  • Status changed from New to Fixed
#19

Updated by Victor S over 2 years ago

Not fixed. + HTS Tvheadend 4.3-1857~g221c29b40

#20

Updated by Flole Systems over 2 years ago

The issue that an SMT_EXIT Message is not being delivered has been solved. If the same solution doesn't work for you the issue you are facing is not the same.

Also available in: Atom PDF