Project

General

Profile

Bug #5651

CPU at 100% after some hours

Added by deganza 11 about 1 year ago. Updated 12 days ago.

Status:
Invalid
Priority:
Normal
Assignee:
-
Category:
General
Target version:
-
Start date:
2019-06-09
Due date:
% Done:

0%

Estimated time:
Found in version:
4.3-1792
Affected Versions:

Description

I'm using Tvheadend 4.3-1792~g466a01431~cosmic on an Ubuntu machine with an I7 with 12GB RAM.
Every morning I've the same problem: the CPU is on 100% and I've to restart with "systemctl restart tvheadend" the TVH-Server to cool down the machine.

Do you know why the machine get's such high CPU usage? After the restart everything is fine until the next end of the day.


Files

tvhhtop100.jpg (319 KB) tvhhtop100.jpg deganza 11, 2019-06-12 04:18
tvh_CPU.jpg (243 KB) tvh_CPU.jpg deganza 11, 2019-06-30 09:58

History

#1

Updated by saen acro about 1 year ago

try with

top -H

Very strange way to restart service normally command is

service tvheadend restart 

#2

Updated by deganza 11 about 1 year ago

saen acro wrote:

try with
[...]

Very strange way to restart service normally command is
[...]

saen acro wrote:

try with
[...]

Very strange way to restart service normally command is
[...]

What happens if write top -H ?

deganza 11 wrote:

I'm using Tvheadend 4.3-1792~g466a01431~cosmic on an Ubuntu machine with an I7 with 12GB RAM.
Every morning I've the same problem: the CPU is on 100% and I've to restart with "systemctl restart tvheadend" the TVH-Server to cool down the machine.

Do you know why the machine get's such high CPU usage? After the restart everything is fine until the next end of the day.

#3

Updated by saen acro about 1 year ago

top -H show sub-treads of TVH with will point to problematic part

if you want to be more detailed use HTOP and on setting set to show custom treads names


#4

Updated by deganza 11 about 1 year ago

saen acro wrote:

top -H show sub-treads of TVH with will point to problematic part

if you want to be more detailed use HTOP and on setting set to show custom treads names


ok, thank you. The most ressources are used when TVH is recording programms.
So if TVH is recording 5 programms at the same time, the CPU loads at 100%.

But after the recording are finished, the CPU still stays very high. But there are no recordings anymore.

What are the ideal sytemrequirements if i want to record 15 programms at the same time?
Is an i7 with 12 GB RAM enough?

#5

Updated by saen acro about 1 year ago

External RAID controller ;)
Depend on bit-rate of recorded content.
Still guessing what operating system you use.

#6

Updated by deganza 11 about 1 year ago

saen acro wrote:

External RAID controller ;)
Depend on bit-rate of recorded content.
Still guessing what operating system you use.

I'm using Ubuntu 18.10.

With htoo it's showing that 11 processes are on. But tvh isn't recording nothing anymore.
Any idea? It happens every day.

I'm not able to upload the screenshot. It gives me everytime an error.

#7

Updated by saen acro about 1 year ago

So with is process is with high cpu use?
https://postimages.org

#8

Updated by deganza 11 about 1 year ago

saen acro wrote:

So with is process is with high cpu use?
https://postimages.org

Here are the processes

#9

Updated by Pablo R. about 1 year ago

Use

top -H

when the problem occurs

#10

Updated by Pablo R. about 1 year ago

But, from your last screenshot is a problem with DVR...

#11

Updated by deganza 11 about 1 year ago

Pablo R. wrote:

But, from your last screenshot is a problem with DVR...

correct, it's a problem with DVR. This processes in the screenshot are all terminated but still consume the whole CPU.

#12

Updated by Flole Systems about 1 year ago

Use gdb to check what they are doing. Then you know why they're "locked up".

#13

Updated by deganza 11 about 1 year ago

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

I'm sorry but I don't know how to use this command with tvh.

#14

Updated by deganza 11 about 1 year ago

deganza 11 wrote:

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

I'm sorry but I don't know how to use this command with tvh.

deganza 11 wrote:

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

I'm sorry but I don't know how to use this command with tvh.

Here's you see what I mean. This are the used cpu ressources of the last 24 hours (red).
I'ver every day the same pattern. The cpu rises and then stays all the time at 90%.

#16

Updated by bas co 11 months ago

Same here, did the HTOP thing, thv.dvr process at 100%. using master build of today.
tried different setting, but nothing improved the situation so far.
i'm using mergerfs. do my initial thoughts where that tvh does not cope well with mergerfs.
but no errors in mergerfs log.

#17

Updated by Flole Systems 11 months ago

Then same thing for you aswell:

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

#18

Updated by bas co 11 months ago

ok, i did some research.
i'm using mergerfs/fuse to store my recordings.

this is what i found:

as soon as one of the disks in my jbod that tvheadend is recording onto is full, the recording thread stops recording and the cpu goes to 100%.

this is what i did:

i configured mergerfs to always use the disk that has most free space (category.create=mfs instead of the disk that has the correct directory tree).
so tvheadend does not encounter full disks anymore, except the one occasion where my entire jbod is full.

#19

Updated by Jonathan S 5 months ago

This bug also surfaces when the user running Tvheadend does not have write permissions on the recording directory.

#20

Updated by Hugh Phoenix-Hulme 3 months ago

I'm also getting 99% from two instances of tvh:pvr. However, I have no "funny business" in the pipeline, it's just writing to a local disk to which it has permission to write. Two threads are chewing up 100% CPU, IO is quite low. There are also two idle instances of tvh:pvr.

The stream profile is 'pass', and by comparison, ffmpeg streaming h264 1080p video from video cameras to the same disk is hardly taking any CPU time at all. Those processes should be doing roughly the same: copying the data directly onto disk.

I get this when there's one programme recording (I imagine the other is for Timeshift) - I'll see what happens with no recordings tomorrow (or else I'll have to wail til 1am)

#21

Updated by Flole Systems 3 months ago

Same thing for you then: Use gdb to find out what exactly is looping so fast/using that much CPU.

#22

Updated by Hugh Phoenix-Hulme 3 months ago

I get it with no recordings. Just two instances of htv:pvr sitting there at 99% CPU.

Disabling Timeshift makes no difference.

Restarting the service brings it back down.

Enabling Timeshift doesn't make it increase.

Recording doesn't make it increase.

Deleting the recording doesn't make it increase.

#23

Updated by Flole Systems 3 months ago

  • Status changed from New to Invalid

Information needed are not provided. This can be re-opened once gdb traces are provided.

#24

Updated by Hugh Phoenix-Hulme 3 months ago

Unfortunately "use gdb" isn't itself enough information...

#25

Updated by Stephen West 23 days ago

I think I'm getting hit with this issue:

Flole Systems wrote:

Use gdb to check what they are doing. Then you know why they're "locked up".

I'm not sure how to gdb this right now, but I can strace:


(gdb) bt full
#0  0xb7f2bbb5 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb5e62208 in futex_wake (private=-1243144192, processes_to_wake=2147483647, 
    futex_word=0x21f4238) at ../sysdeps/unix/sysv/linux/futex-internal.h:231
        resultvar = <optimized out>
        __ret = <optimized out>
        res = <optimized out>
        res = <optimized out>
        __ret = <optimized out>
        resultvar = <optimized out>
#2  __condvar_confirm_wakeup ([email protected]=0x21f4214, [email protected]=0)
    at pthread_cond_wait.c:55
No locals.
#3  0xb5e629e3 in __pthread_cond_wait_common (abstime=0x0, mutex=0x21f41d4, 
    cond=0x21f4214) at pthread_cond_wait.c:419
        err = 22
        g = 0
        flags = <optimized out>
        g1_start = <optimized out>
        signals = <optimized out>
        result = 0
        wseq = <optimized out>
        seq = 4384125476
        private = <optimized out>
        maxspin = <optimized out>
        err = <optimized out>
        result = <optimized out>
        wseq = <optimized out>
        g = <optimized out>
        seq = <optimized out>
        flags = <optimized out>
        private = <optimized out>
        signals = <optimized out>
        g1_start = <optimized out>
        __atg101_val = <optimized out>
        __atg100_val = <optimized out>
        spin = <optimized out>
        buffer = <optimized out>
        cbuffer = <optimized out>
        __atg101_val = <optimized out>
        __atg100_val = <optimized out>
        __atg101_val = <optimized out>
        __atg100_val = <optimized out>
        rt = <optimized out>
        __atg101_val = <optimized out>
        __atg100_val = <optimized out>
        __atg102_expected = <optimized out>
        s = <optimized out>
        __atg100_val = <optimized out>
        __atg102_expected = <optimized out>
#4  __pthread_cond_wait (cond=0x21f4214, mutex=0x21f41d4) at pthread_cond_wait.c:655
No locals.
#5  0x004c3286 in ?? ()
No symbol table info available.
#6  0x0055e2f6 in ?? ()
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#7  0x004c242a in ?? ()
No symbol table info available.
#8  0xb5e5c3bd in start_thread (arg=0xa54ffb40) at pthread_create.c:463
        pd = 0xa54ffb40
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-1243144192, -1521484992, 
                -1243144192, -1521489624, -1199842656, -332536703}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#9  0xb5c66fe6 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108
No locals.
(gdb) 

I don't think that was helpful, I don't have symbols for the middle part (#5,6,7).

Is there anything else to try? I'll just restart the service and hope it doesn't come back.

#26

Updated by Stephen West 23 days ago

forgot I'm running:
HTS Tvheadend 4.3.0~pre+202007131151-0~built202007140102~gitce09077~ubuntu18.04.1

...maybe I shouldn't be running the unstable branch. I just realized that.

#27

Updated by Flole Systems 23 days ago

That's perfectly fine, the real issue is that you are running a version that contains some kinds of modifications. With that it's impossible to find the issue. Also it seems like for the gdb trace the debug symbols are missing.

#28

Updated by Stephen West 23 days ago

I didn't think I modified anything. How would I go about getting an unmodified version? I tried to follow the instructions at https://tvheadend.org/projects/tvheadend/wiki/AptRepositories , I think I did the "sudo apt-add-repository ppa:mamarley/tvheadend-git" instead of the Doozer packages. Would it be helpful/dangerous to switch to the official Doozer package?

#29

Updated by Flole Systems 21 days ago

When you use an official version this would be the right place to ask for support, otherwise you would have to contact the maintainer of the package. You will notice that the version number is completely different on the official repository. There shouldn't be any issues switching the repository but you should make a config backup just to be safe.

#30

Updated by saen acro 21 days ago

Stephen West wrote:

From attached image is clear that only one of two cores is on 100%
With can mean that cpu is slow for task.

#31

Updated by Martin Klayer 14 days ago

Flole Systems Systems I also followed the instruction at https://tvheadend.org/projects/tvheadend/wiki/AptRepositories .
What else could I do? What is the correct repro to use.

I am seeing also the 100 % CPU utilization once a recording has be started. But even after the recording has finished and not other tasks are running on the tvhead server, the utilization is staying at 100%

The only workaround ist to perform a cron job which executes systemctl restart tvhead.service.

#32

Updated by Flole Systems 12 days ago

The correct repo is obviously the one under the official section. Once you have a version from that you should reproduce the issue, note down the PID, identify the Thread that's using the CPU and use gdb to trace what that thread is doing.

Also available in: Atom PDF