Project

General

Profile

Bug #5659

Bug #5459: Something is slow in 4.3

Descrambler issues in 4.3

Added by Robin Mitra 5 months ago. Updated 4 months ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
Descrambling
Target version:
-
Start date:
2019-06-16
Due date:
% Done:

0%

Estimated time:
Found in version:
4.3 since Dec 25 2017 !!!
Affected Versions:

Description

As described (in a convoluted way) in issue #5459, I've been unable to use 4.3 for auite some time now due to descrambling fails every few seconds.
my setup is a ivy bridge i7 with 2 dual sat tuners from digitaldevices.
Everything works fine in the released 4.2.8-23~g74cdf3f16.

I've taken the time to trace back to where the problems start:
The last working version is
1156bf17439c67d52510cfa0738042de47478f92
Date: Mon Dec 25 2017 18:07:59 GMT+0100 (Central European Standard Time)

the commit after that started fiddling with cwc, newcamd and broke descrambling:
Commit: c69368baef8022563b2d6646082268e59983048c
Parents: 1156bf17439c67d52510cfa0738042de47478f92
Author: Jaroslav Kysela <>
Date: Mon Dec 25 2017 18:17:27 GMT+0100 (Central European Standard Time)
Committer: Jaroslav Kysela
cwc, newcamd: add detection for DVN ECM message, issue #4813

After further commits to fix this commit it got somewaht better, but never to the stability of before.

The way I test this is simple:
watch a scrambled channel, and start the OTA grabber on the remaining 3 tuners... I get scrambling errors (due to packet loss or whatever) evry few seconds as long as the OTA is running.
Even after OTA, using any of teh other tuners to tune to a service will more likely than not create a descrambling issue.

The other problem I mentioned in #5459 (slow OTA grab) seems unrelated to this issue, but still persists.


Files

tvheadend4.3-890~ga274d5dd5.log.7z (73.1 KB) tvheadend4.3-890~ga274d5dd5.log.7z Robin Mitra, 2019-06-22 09:32
tvhworkingDebug.7z (46.7 MB) tvhworkingDebug.7z Robin Mitra, 2019-06-22 09:32
4.3-1795~g771dfd6be-dirty-patched.log.7z (40.3 MB) 4.3-1795~g771dfd6be-dirty-patched.log.7z Robin Mitra, 2019-06-29 10:46
Tvh4.3-1801~g8f1de1621-dirty.log.7z (41.9 MB) Tvh4.3-1801~g8f1de1621-dirty.log.7z Robin Mitra, 2019-07-06 07:52
valgrind.txt (861 KB) valgrind.txt Pablo R., 2019-07-26 21:09

Associated revisions

Revision 6be300c4 (diff)
Added by Jaroslav Kysela 4 months ago

cclient: fix the ECM PID flag for newcamd and cccam, fixes #5659

History

#1

Updated by Jaroslav Kysela 5 months ago

Show traces: '--trace descrambler,cwc,cccam' - https://tvheadend.org/projects/tvheadend/wiki/Traces

#2

Updated by Jaroslav Kysela 5 months ago

  • Status changed from New to Rejected
  • Parent task set to #5459
#3

Updated by Robin Mitra 5 months ago

traces already were provided in #5459

#4

Updated by Jaroslav Kysela 5 months ago

No, the logs do not have any TRACE lines. See my last comment.

#5

Updated by Robin Mitra 5 months ago

hmm I remeber trying to get all possible traces and the log was quite big.
Anyway the system is in use now so I can play with it only coming weekend.
When I do so, please tell me how exactly you want me enable the traces above (if not just in zhe web ui)

Which commit should I test ? IIRC the (channels etc)config I currently have is 4.2 and at some point it wasn't usable anymore in 4.3.
I'd like to avoid having to setup all the channels again.
Also using the latest commit will suffer from a different issue (really slow eit grab).

#6

Updated by Pablo R. 5 months ago

Robin Mitra wrote:

hmm I remeber trying to get all possible traces and the log was quite big.
Anyway the system is in use now so I can play with it only coming weekend.
When I do so, please tell me how exactly you want me enable the traces above (if not just in zhe web ui)

Attach log with traces (commonly huge size) -> descrambler,cwc,cccam

On webui:
Configuration -> Debugging ->
  • Set Filename (including path)
  • Tick Debug trace (low-level)
  • Trace subsystems: descrambler,cwc,cccam
#7

Updated by Robin Mitra 5 months ago

I've add 2 logs .
One working with commit 1156bf17439c67d52510cfa0738042de47478f92
the second one from a commit a few days later when descrambling on single channels seems to work again, but has lots of errors when eit is runuing on other tuners

#8

Updated by Jaroslav Kysela 5 months ago

It looks like that there is no bug in the descrambler at the first look. The ECMs are "fast" ones (the key must be used within 500ms - half a second - when the ECM is received). It seems that the data flow inside the recent tvh is not so fast for a reason (something blocks the ECM processing). The key is required early than the ECM is sent to the key server. Show the log with all traces for the recent tvh with this extra code:

diff --git a/src/descrambler/descrambler.c b/src/descrambler/descrambler.c
index 71cc9ebd6..b83f1d6aa 100644
--- a/src/descrambler/descrambler.c
+++ b/src/descrambler/descrambler.c
@@ -1341,6 +1341,10 @@ descrambler_table_callback
   }
   tvh_mutex_unlock(&mt->mt_mux->mm_descrambler_lock);

+  if (!emm)
+    tvhtrace(LS_DESCRAMBLER, "ECM message %02x:{%02x:%02x}:%02x (len %d, pid %d)",
+             ptr[0], ptr[1], ptr[2], ptr[3], len, mt->mt_pid);
+
   LIST_FOREACH(des, &sections, active_link) {
     if (des->changed == 2) {
       des->callback(des->opaque, mt->mt_pid, ptr, len, emm);
#9

Updated by Robin Mitra 5 months ago

I agree that something is slower. I will try it out coming weekend.

Note that his happened way back in Dec/2017 - Jan/2018 long before you changed to using libdvbsa instead of ffdecsa. So it seems to have nothing to do with that.
Also, the released 4.2 doesn't show this behaviour.

#10

Updated by Robin Mitra 5 months ago

here goes, using current master, applied your patch:
a log for 5 mins with 1 channel being watched and 3 tuners doing an eit grab.
Eit grab is fast again now (
5mins, as opposed to > 30mins a few commits ago)

I don't see the descrambler or any thing else being slow per se.
It's more like the established descrambling pipeline is sharing some resource being locked for an unusually long time when the eit grabber (or anyone else) tries to build a new streaming pipeline.

#11

Updated by Jaroslav Kysela 5 months ago

The patch was not applied. The problem is that ECM is not detected/sent before the key change in some cases.

EDIT: You may verify the patch - there should be 'ECM message' lines in the TRACE log without 'for service' words.

#12

Updated by Robin Mitra 4 months ago

next try.

Again, I don't see why the descrambler is being disturbed at all when tuning on a different tuner. The ECM not being seen is IMHO only the result of something else being blocked for longer than it needs to be.

But that is of course only an observation and analogy (a 'gut feeling') of having programmed other stuff for a living for > 40 yrs.

#13

Updated by Robin Mitra 4 months ago

BTW,
I went back to the builds of dec 25 2017 and had a closer look.
That is not the commit that introducedd this behaviour. I was mistatken due to another bug in that commit.
Right now I'm looking at the time frame from Jan 25. 2018 (working) and Jan 31. 2018 (bad).

Shouldn't be to hard to pinpoint now...

#14

Updated by Jaroslav Kysela 4 months ago

  • Status changed from Rejected to Fixed

Mea culpa. I forgot to flag the ECM PIDs in the common cclient code (newcamd, cccam). So they were evaluated in the slow EMM path. Could you try latest? v4.3-1802-g6be300c43

#15

Updated by Robin Mitra 4 months ago

Yup,
that fixed it.
thanks.

#16

Updated by Pablo R. 4 months ago

Jul  7 11:19:52 dell tvheadend[6642]: CRASH: Signal: 11 in PRG: /usr/bin/tvheadend (4.3-1802~g6be300c43) [921a17bbb26f4ae358b76c511e7122722ccce354] CWD: /
Jul  7 11:19:52 dell tvheadend[6642]: CRASH: Fault address 0xb0 (Address not mapped)
Jul  7 11:19:52 dell tvheadend[6642]: CRASH: Loaded libraries: linux-vdso.so.1 /usr/lib/x86_64-linux-gnu/libdvbcsa.so.1 /usr/lib/x86_64-linux-gnu/libssl.so.1.1 /usr/lib/x86_6$
Jul  7 11:19:52 dell tvheadend[6642]: CRASH: Register dump [23]: 00005599f45b08b000000000000000000000000000000005000000000000000000000000000000a000007f0c575f95100000000000000$
Jul  7 11:19:52 dell tvheadend[6642]: CRASH: STACKTRACE
Jul  7 11:19:52 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/trap.c:176 0x5599f261f53d 0x5599f240a000
Jul  7 11:19:53 dell tvheadend[6642]: CRASH: ??:0 0x7f0caa9bd890 0x7f0caa9ab000
Jul  7 11:19:53 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/descrambler/cclient.c:883 0x5599f270ba81 0x5599f240a000
Jul  7 11:19:53 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/descrambler/descrambler.c:1346 0x5599f2699341 0x5599f240a000
Jul  7 11:19:53 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/input/mpegts/mpegts_table.c:101 0x5599f26b1a03 0x5599f240a000
Jul  7 11:19:53 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/input/mpegts/dvb_psi_lib.c:135 (discriminator 4) 0x5599f26a7dd2 0x5599f240a000
Jul  7 11:19:54 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/input/mpegts/dvb_psi_lib.c:185 0x5599f26a8237 0x5599f240a000
Jul  7 11:19:54 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/input/mpegts/mpegts_input.c:1338 (discriminator 3) 0x5599f26a09fb 0x5599f240a000
Jul  7 11:19:54 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/input/mpegts/mpegts_input.c:1544 0x5599f26a3fac 0x5599f240a000
Jul  7 11:19:54 dell tvheadend[6642]: CRASH: /project/repo/checkout/src/tvh_thread.c:91 0x5599f25df988 0x5599f240a000
Jul  7 11:19:54 dell tvheadend[6642]: CRASH: ??:0 0x7f0caa9b26db 0x7f0caa9ab000
Jul  7 11:19:54 dell tvheadend[6642]: CRASH: clone+0x3f  (/lib/x86_64-linux-gnu/libc.so.6)
Jul  7 11:19:54 dell kernel: [415886.850582] show_signal_msg: 9 callbacks suppressed
Jul  7 11:19:54 dell kernel: [415886.850584] tvh:mi-main[6713]: segfault at b0 ip 00005599f270ba81 sp 00007f0c575f93f0 error 4 in tvheadend[5599f240a000+14be000]
#17

Updated by Pablo R. 4 months ago

Thread 84 "tvh:mi-main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f89511e8700 (LWP 15786)]
cc_table_input (opaque=0x7f898409e560, pid=2000, data=0x7f898402d720 "\200pdpbdX(\370\256.\371'\021\240\[email protected]\273Z3H\356\064\345\370\354p\275\335s\210\[email protected]\020\340\342\033P\221\005\376\027x\302\226aH\335xzD\232|\344V,\200\373\352\362\272\227\216U\207D\215\021\257\\\003\064FU\303ic<\317V\245i\022\273o\362\223Q\231h+ID\217\250\037\232ƌ\034\261v\213", '\377' <repeats 80 times>, len=103, emm=<optimized out>) at src/descrambler/cclient.c:884
warning: Source file is more recent than executable.
884        if(ep->ep_capid == pid) break;

#0  cc_table_input (opaque=0x7f898409e560, pid=2000, data=0x7f898402d720 "\200pdpbdX(\370\256.\371'\021\240\[email protected]\273Z3H\356\064\345\370\354p\275\335s\210\[email protected]\020\340\342\033P\221\005\376\027x\302\226aH\335xzD\232|\344V,\200\373\352\362\272\227\216U\207D\215\021\257\\\003\064FU\303ic<\317V\245i\022\273o\362\223Q\231h+ID\217\250\037\232ƌ\034\261v\213", '\377' <repeats 80 times>, len=103, emm=<optimized out>) at src/descrambler/cclient.c:884
#1  0x0000562407636341 in descrambler_table_callback (mt=<optimized out>, ptr=<optimized out>, len=103, tableid=<optimized out>) at src/descrambler/descrambler.c:1346
#2  0x000056240764ea03 in mpegts_table_dispatch (sec=<optimized out>, r=<optimized out>, aux=0x7f898402d6b0) at src/input/mpegts/mpegts_table.c:101
#3  0x0000562407644dd2 in mpegts_psi_section_reassemble0 ([email protected]=0x7f898402d6b0, [email protected]=0x7f898406b440 "11973V in Astra V+", [email protected]=0x7f893c06a5b6 "\200pdpbdX(\370\256.\371'\021\240\[email protected]\273Z3H\356\064\345\370\354p\275\335s\210\[email protected]\020\340\342\033P\221\005\376\027x\302\226aH\335xzD\232|\344V,\200\373\352\362\272\227\216U\207D\215\021\257\\\003\064FU\303ic<\317V\245i\022\273o\362\223Q\231h+ID\217\250\037\232ƌ\034\261v\213", '\377' <repeats 80 times>, "G\017\253Ւ\n\357s\n\357P\210\r\f\021", <incomplete sequence \326>..., [email protected]=183, [email protected]=64, [email protected]=0, cb=0x56240764e970 <mpegts_table_dispatch>, opaque=0x7f898402d6b0) at src/input/mpegts/dvb_psi_lib.c:135
#4  0x0000562407645237 in mpegts_psi_section_reassemble ([email protected]=0x7f898402d6b0, [email protected]=0x7f898406b440 "11973V in Astra V+", [email protected]=0x7f893c06a5b1 "GG\320\026", crc=0, cb=0x56240764e970 <mpegts_table_dispatch>, [email protected]=0x7f898402d6b0) at src/input/mpegts/dvb_psi_lib.c:185
#5  0x000056240763d9fb in mpegts_input_table_dispatch ([email protected]=0x56240a2b7900, logprefix=0x7f898406b440 "11973V in Astra V+", [email protected]=0x7f893c06a5b1 "GG\320\026", [email protected]=188, [email protected]=1) at src/input/mpegts/mpegts_input.c:1338
#6  0x0000562407640fac in mpegts_input_process (mpkt=<optimized out>, mi=<optimized out>) at src/input/mpegts/mpegts_input.c:1544
#7  mpegts_input_thread (p=<optimized out>) at src/input/mpegts/mpegts_input.c:1743
#8  0x000056240757c988 in thread_wrapper (p=0x7f89840069d0) at src/tvh_thread.c:91
#9  0x00007f89b2e1d6db in start_thread (arg=0x7f89511e8700) at pthread_create.c:463
#10 0x00007f89b1fed88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Detaching from program: /usr/bin/tvheadend, process 15700
#18

Updated by Flole Systems 4 months ago

You can try changing line 884 of src/descrambler/cclient.c to

if(ep && ep->ep_capid == pid) break;

to prevent comparison if ep is null (which seems to be causing the crash at the moment).

#19

Updated by Pablo R. 4 months ago

Flole Systems wrote:

You can try changing line 884 of src/descrambler/cclient.c to
[...]
to prevent comparison if ep is null (which seems to be causing the crash at the moment).

With that change it gets full deadlocked after 2 minutes.

(gdb) bt

#0  0x00007f1948feef85 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7ffc025a80b0, expected=0, futex_word=0x55e02e280d08 <gtimer_cond+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x7ffc025a80b0, mutex=0x55e02e286900 <gtimer_lock>, cond=0x55e02e280ce0 <gtimer_cond>) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait ([email protected]=0x55e02e280ce0 <gtimer_cond>, [email protected]=0x55e02e286900 <gtimer_lock>, [email protected]=0x7ffc025a80b0) at pthread_cond_wait.c:667
#3  0x000055e02dd6f5c8 in tvh_cond_timedwait_ts (cond=0x55e02e280ce0 <gtimer_cond>, mutex=0x55e02e286900 <gtimer_lock>, ts=0x7ffc025a80b0) at src/tvh_thread.c:425
#4  0x000055e02dd5e344 in mainloop () at src/main.c:775
#5  main (argc=<optimized out>, argv=<optimized out>) at src/main.c:1364
#20

Updated by Pablo R. 4 months ago

There is definitely something wrong with 6be300c4 commit,

The unique change is

DESCRAMBLER_ECM_PID(ct->cs_epids.pids[i].pid),
^^^^^^^^^^^^^^^^^^

And if we refer to 883, 884 lines (from my crash).
we see ep that in the same void is:

cc_ecm_pid_t *ep;
^^^^^^^^^^^^

#21

Updated by saen acro 4 months ago

cppcheck suggest this

[cclient.c:136] -> [cclient.c:140]: (warning) Either the condition 'pcard==0' is redundant or there is possible null pointer dereference: pcard.
[cclient.c:137] -> [cclient.c:140]: (warning) Either the condition 'pcard==0' is redundant or there is possible null pointer dereference: pcard.
[cclient.c:399] -> [cclient.c:402]: (warning) Either the condition 'if(es2)' is redundant or there is possible null pointer dereference: es2.
[cclient.c:400] -> [cclient.c:402]: (warning) Either the condition 'if(es2)' is redundant or there is possible null pointer dereference: es2.
[cclient.c:779] -> [cclient.c:780]: (warning) Either the condition 'if(pcard)' is redundant or there is possible null pointer dereference: pcard.
[cclient.c:884] -> [cclient.c:886]: (warning) Either the condition 'ep==0' is redundant or there is possible null pointer dereference: ep.
[cclient.c:954] -> [cclient.c:956]: (warning) Either the condition 'es==0' is redundant or there is possible null pointer dereference: es.
[cclient.c:1112] -> [cclient.c:1114]: (warning) Either the condition 'if(c)' is redundant or there is possible null pointer dereference: c.
[cclient.c:131]: (style) The scope of the variable 'id' can be reduced.
[cclient.c:665]: (style) The scope of the variable 'fd' can be reduced.
[cclient.c:669]: (style) The scope of the variable 'port' can be reduced.
[cclient.c:771]: (style) The scope of the variable 'pcard' can be reduced.
[cclient.c:1206]: (style) The scope of the variable 'emmp' can be reduced.
[cclient.c:1207]: (style) The scope of the variable 'i' can be reduced.
[cclient.c:1243]: (style) The scope of the variable 'tid' can be reduced.
[cclient.c:1244]: (style) The scope of the variable 'cm' can be reduced.
[cclient.c:1245]: (style) The scope of the variable 'tname' can be reduced.

if this can help

#22

Updated by Jaroslav Kysela 4 months ago

The only difference is that ECM PIDs are now marked as fast (MT_FAST), so the locking in mpegts_input.c is different and the global_lock is not active when the table callback is called. There is locking issue somewhere, all cclient code should be reviewed.

#23

Updated by Stefan Dietzel 4 months ago

I have alsoproblems. Descrambling is working but every x minutes is fails for one key then after 10 seconds is works again. I am using oscam with own card. At a normal Enigma2-Receiver which also connects to my oscam I have no problems. TVH is connected to oscam via DVBAPI.

#24

Updated by Flole Systems 4 months ago

@Jaroslav
So should my change be done now to move forward and fix the crash? If yes I can make that a PR (or you modify that single line directly).

@People who want this fixed
Someone with that issues needs to go through the full debugging of mutex issues and figure out which mutex is waiting for a lock where and what is currently holding that mutex. (In Pablo's case that was the mutex at 0x55e02e286900). Somewhere there are more or less complete instructions on how to do this, basically its look at the mutex, then look at the holding thread and see what it does, if it's waiting for a mutex look at that mutex and so on. Do it until you find a thread that is not waiting for a mutex.

#25

Updated by Flole Systems 4 months ago

Totally forgot that we have an automatic deadlock detector now (as mentioned in the wiki):

Buildin deadlock mutex checker (since latest 4.3 version)
Use '--thrdebug 1' as the command line option. The deadlock will be printed to the tvheadend's configuration directory to file mutex-deadlock.txt and to the standard task error output (so you can see it through the systemctl service log for example).

#26

Updated by Pablo R. 4 months ago

Does this help? (happens before crash)

[email protected]:~$ nc -lu -p 7777
thread: mutex 0x557b76f15800 at src/main.c:285 took 28ms
thread: mutex 0x557b76f15880 at src/http.c:274 took 28ms
thread: mutex 0x557b78f02680 at src/input/mpegts/tsdemux.c:164 took 72ms
thread: mutex 0x7f5ad800ab38 at src/input/mpegts/mpegts_input.c:1733 took 72ms
thread: mutex 0x557b78f02680 at src/input/mpegts/tsdemux.c:164 took 24ms
thread: mutex 0x7f5ad800ab38 at src/input/mpegts/mpegts_input.c:1733 took 24ms
thread: mutex 0x7f5ad800ab38 at src/input/mpegts/mpegts_input.c:1733 took 23ms
thread: mutex 0x557b76f15800 at src/main.c:285 took 24ms
thread: mutex 0x557b76f15880 at src/http.c:274 took 24ms
thread: mutex 0x557b76f15800 at src/main.c:285 took 23ms
thread: mutex 0x557b76f15880 at src/http.c:274 took 23ms
thread: mutex 0x557b78ee5a40 at src/input/mpegts/tsdemux.c:164 took 24ms
thread: mutex 0x7f5ad800ab38 at src/input/mpegts/mpegts_input.c:1733 took 24ms
thread: mutex 0x557b76f15800 at src/main.c:285 took 24ms
thread: mutex 0x557b76f15880 at src/http.c:274 took 24ms
thread: mutex 0x7f5ad800ab38 at src/input/mpegts/mpegts_input.c:1733 took 36ms
thread: mutex 0x557b76f15800 at src/main.c:718 took 24ms
thread: mutex 0x557b76f15800 at src/main.c:285 took 24ms
thread: mutex 0x557b76f15880 at src/http.c:274 took 24ms
thread: mutex 0x7f5ad800ab38 at src/input/mpegts/mpegts_input.c:1733 took 23ms
thread: mutex 0x557b78f02680 at src/input/mpegts/tsdemux.c:164 took 24ms
thread: mutex 0x557b76f15880 at src/main.c:703 took 503ms

It just crash without any mutex-deadlock.txt on tvh directory.

#27

Updated by Pablo R. 4 months ago

Hope this can help...
Also full log attached.

==21113== Invalid read of size 8
==21113==    at 0x3874E5: cc_table_input (cclient.c:849)
==21113==    by 0x31EEF0: descrambler_table_callback (descrambler.c:1346)
==21113==    by 0x3375B2: mpegts_table_dispatch (mpegts_table.c:101)
==21113==    by 0x32D981: mpegts_psi_section_reassemble0 (dvb_psi_lib.c:135)
==21113==    by 0x32DDE6: mpegts_psi_section_reassemble (dvb_psi_lib.c:185)
==21113==    by 0x3265AA: mpegts_input_table_dispatch (mpegts_input.c:1338)
==21113==    by 0x329B5B: mpegts_input_process (mpegts_input.c:1544)
==21113==    by 0x329B5B: mpegts_input_thread (mpegts_input.c:1743)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)
==21113== Invalid read of size 8
==21113==    at 0x3874F7: cc_table_input (cclient.c:850)
==21113==    by 0x31EEF0: descrambler_table_callback (descrambler.c:1346)
==21113==    by 0x3375B2: mpegts_table_dispatch (mpegts_table.c:101)
==21113==    by 0x32D981: mpegts_psi_section_reassemble0 (dvb_psi_lib.c:135)
==21113==    by 0x32DDE6: mpegts_psi_section_reassemble (dvb_psi_lib.c:185)
==21113==    by 0x3265AA: mpegts_input_table_dispatch (mpegts_input.c:1338)
==21113==    by 0x329B5B: mpegts_input_process (mpegts_input.c:1544)
==21113==    by 0x329B5B: mpegts_input_thread (mpegts_input.c:1743)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)
==21113== Invalid read of size 4
==21113==    at 0x387550: cc_table_input (cclient.c:869)
==21113==    by 0x31EEF0: descrambler_table_callback (descrambler.c:1346)
==21113==    by 0x3375B2: mpegts_table_dispatch (mpegts_table.c:101)
==21113==    by 0x32D981: mpegts_psi_section_reassemble0 (dvb_psi_lib.c:135)
==21113==    by 0x32DDE6: mpegts_psi_section_reassemble (dvb_psi_lib.c:185)
==21113==    by 0x3265AA: mpegts_input_table_dispatch (mpegts_input.c:1338)
==21113==    by 0x329B5B: mpegts_input_process (mpegts_input.c:1544)
==21113==    by 0x329B5B: mpegts_input_thread (mpegts_input.c:1743)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)
==21113== Invalid read of size 4
==21113==    at 0x387557: cc_table_input (cclient.c:872)
==21113==    by 0x31EEF0: descrambler_table_callback (descrambler.c:1346)
==21113==    by 0x3375B2: mpegts_table_dispatch (mpegts_table.c:101)
==21113==    by 0x32D981: mpegts_psi_section_reassemble0 (dvb_psi_lib.c:135)
==21113==    by 0x32DDE6: mpegts_psi_section_reassemble (dvb_psi_lib.c:185)
==21113==    by 0x3265AA: mpegts_input_table_dispatch (mpegts_input.c:1338)
==21113==    by 0x329B5B: mpegts_input_process (mpegts_input.c:1544)
==21113==    by 0x329B5B: mpegts_input_thread (mpegts_input.c:1743)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)
==21113== Invalid read of size 4
==21113==    at 0x3875A0: cc_table_input (cclient.c:887)
==21113==    by 0x31EEF0: descrambler_table_callback (descrambler.c:1346)
==21113==    by 0x3375B2: mpegts_table_dispatch (mpegts_table.c:101)
==21113==    by 0x32D981: mpegts_psi_section_reassemble0 (dvb_psi_lib.c:135)
==21113==    by 0x32DDE6: mpegts_psi_section_reassemble (dvb_psi_lib.c:185)
==21113==    by 0x3265AA: mpegts_input_table_dispatch (mpegts_input.c:1338)
==21113==    by 0x329B5B: mpegts_input_process (mpegts_input.c:1544)
==21113==    by 0x329B5B: mpegts_input_thread (mpegts_input.c:1743)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)
==21113== Invalid read of size 4
==21113==    at 0x2669C5: tvh_mutex_check_magic (tvh_thread.c:144)
==21113==    by 0x266E59: tvh__mutex_lock (tvh_thread.c:252)
==21113==    by 0x38752A: cc_table_input (cclient.c:866)
==21113==    by 0x31EEF0: descrambler_table_callback (descrambler.c:1346)
==21113==    by 0x3375B2: mpegts_table_dispatch (mpegts_table.c:101)
==21113==    by 0x32D981: mpegts_psi_section_reassemble0 (dvb_psi_lib.c:135)
==21113==    by 0x32DDE6: mpegts_psi_section_reassemble (dvb_psi_lib.c:185)
==21113==    by 0x3265AA: mpegts_input_table_dispatch (mpegts_input.c:1338)
==21113==    by 0x329B5B: mpegts_input_process (mpegts_input.c:1544)
==21113==    by 0x329B5B: mpegts_input_thread (mpegts_input.c:1743)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)
==21113==  Address 0x200 is not stack'd, malloc'd or (recently) free'd
#28

Updated by Flole Systems 4 months ago

Does it crash or deadlock? According to valgrind it crashes, segmentation fault due to mutex corruption, this should be visible in stdout (probably some kind of logfile if running as service) aswell as said txt file. However, that won't help in this case, at least to me it's obvious whats going on (as I had something similar a while ago, it was absolutely difficult to figure out the issue).

Valgrind also figured this one out (basically it confirmed what I already knew based on experience and gdb/stacktrace output):

==21113== Invalid read of size 4
==21113==    at 0x2669C5: tvh_mutex_check_magic (tvh_thread.c:144)
==21113==    by 0x266E59: tvh__mutex_lock (tvh_thread.c:252)
==21113==    by 0x38752A: cc_table_input (cclient.c:866)
==21113==    by 0x31EEF0: descrambler_table_callback (descrambler.c:1346)
==21113==    by 0x3375B2: mpegts_table_dispatch (mpegts_table.c:101)
==21113==    by 0x32D981: mpegts_psi_section_reassemble0 (dvb_psi_lib.c:135)
==21113==    by 0x32DDE6: mpegts_psi_section_reassemble (dvb_psi_lib.c:185)
==21113==    by 0x3265AA: mpegts_input_table_dispatch (mpegts_input.c:1338)
==21113==    by 0x329B5B: mpegts_input_process (mpegts_input.c:1544)
==21113==    by 0x329B5B: mpegts_input_thread (mpegts_input.c:1743)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)
==21113==  Address 0x200 is not stack'd, malloc'd or (recently) free'd

We can see here, that tvheadend tried to read the mutex's magic from an address that's not in use. That can mean 2 things, either it was never allocated or freed before the access.

Looking at other things Valgrind found, I see something that could be related:

==21113== Invalid read of size 8
==21113==    at 0x3874F7: cc_table_input (cclient.c:850)
==21113==    by 0x31EEF0: descrambler_table_callback (descrambler.c:1346)
==21113==    by 0x3375B2: mpegts_table_dispatch (mpegts_table.c:101)
==21113==    by 0x32D981: mpegts_psi_section_reassemble0 (dvb_psi_lib.c:135)
==21113==    by 0x32DDE6: mpegts_psi_section_reassemble (dvb_psi_lib.c:185)
==21113==    by 0x3265AA: mpegts_input_table_dispatch (mpegts_input.c:1338)
==21113==    by 0x329B5B: mpegts_input_process (mpegts_input.c:1544)
==21113==    by 0x329B5B: mpegts_input_thread (mpegts_input.c:1743)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)
==21113==  Address 0xb1397c8 is 72 bytes inside a block of size 152 free'd
==21113==    at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21113==    by 0x386757: cc_service_destroy0 (cclient.c:1036)
==21113==    by 0x3867CA: cc_service_destroy (cclient.c:1052)
==21113==    by 0x31FEE5: descrambler_service_stop (descrambler.c:486)
==21113==    by 0x28E89C: service_stop (service.c:261)
==21113==    by 0x28C03B: subscription_unlink_service0 (subscriptions.c:164)
==21113==    by 0x28C6AF: subscription_unsubscribe (subscriptions.c:736)
==21113==    by 0x304C94: http_stream_channel (webui.c:1313)
==21113==    by 0x3052A1: http_stream (webui.c:1378)
==21113==    by 0x273944: http_exec (http.c:1241)
==21113==    by 0x274BE9: http_cmd_get (http.c:1316)
==21113==    by 0x274E24: http_process_request (http.c:1398)
==21113==  Block was alloc'd at
==21113==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21113==    by 0x38978A: cc_service_start (cclient.c:1127)
==21113==    by 0x3234D5: caclient_start (caclient.c:344)
==21113==    by 0x28EB47: service_start (service.c:334)
==21113==    by 0x2909EF: service_find_instance (service.c:535)
==21113==    by 0x28AF88: subscription_start_instance (subscriptions.c:336)
==21113==    by 0x28CA1F: subscription_reschedule (subscriptions.c:409)
==21113==    by 0x258D98: mtimer_thread (main.c:706)
==21113==    by 0x266797: thread_wrapper (tvh_thread.c:91)
==21113==    by 0x66D66DA: start_thread (pthread_create.c:463)
==21113==    by 0x6FB588E: clone (clone.S:95)

That means we have definitely cases where memory get alloced, freed and used afterwards. Chances that the mutex corruption is caused by the exact same problem are pretty high.

While freed memory can still hold data that could even be used (until it's overwritten), it may get overwritten at any time. That is the case here after some time, figuring out why this happens takes a lot of time and you don't know for sure when that memory gets overwritten.

As you might have noticed you did a pretty good job finding the relevant parts of the valgrind output already :)

There's one thing that I need clarification on: As you can see in valgrind the alloc was done in subscription start (obviously) and the free was done in the unsubscribe (also makes sense). So you start a stream, then you stop it, then it crashes? Or do you start it on one device, start the same stream on another one, end on first and see the crash?

#29

Updated by Allan Lee 4 months ago

I'm encountering the same issue here. More details: https://tvheadend.org/issues/5691

After testing for several days, I'm pretty sure that the crash happens when unsubscribing from a channel, i.e. when a streaming or recording subscription stops.
When there's only one or a few subscription ongoing, everything works fine. But when more channels being used (e.g. 6-8 or even 10+), there will be a high chance of crash.

The crash appears consistently - I have fixed daily recording schedules and Tvheadend crashes at the same time everyday.
Tvheadend almost 100% crashes when doing a scan after importing is m3u playlist as "Automatic IPTV Network" to another Tvheadend in the same LAN.

Hope the information helps.

#31

Updated by Pablo R. 4 months ago

Working fine now. Thanks perexg. (and Flole ;) )

Also available in: Atom PDF