Feature #4652

Dedup by program id

Added by Em Smith about 5 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


Programmes on many OTA channels and with many xmltv providers have unique ids. This is the crid or the dd_progid.

So a particular movie will always have the same MV code, a particular episode has the same EP code. Technically crid can be reused, but they don't appear to be reused where I am.

This is useful since many daytime programmes don't have unique descriptions or any episode data in OTA, but can be distinguished by the crid.

Next year I'd like to look at maybe adding a new optional de-dup method based on this id for people with good quality guide data.

So currently if I have a rule for "Simpsons" it will record episodes but I have to decide what dedup to use. If I use episode, then what happens when the Simpsons movie is on? Is it recorded once, every repeat showing, never recorded?

If I use description, then I have to check the guide since daytime shows often just have the same descriptions every day but have unique episode numbers in xmltv, and no episode information in OTA.

I'm hoping the new optional algorithm will just do the right thing for most cases; but only for people with good, consistent guide data. People can of course use the existing dedup for cases where the new algorithm consistently fails for them.

So for every-day cases like "record one copy of any remake of 'Dracula'", "record anything Simpsons", or "record daytime tv", it would just work.

My current thinking of dedup algorithm for two programmes is something like:
  1. both have id and id is equal: dup;
  2. both have title+season+episode that are equal: dup;
  3. only one has title+season+episode: not dup;
  4. both have title+season or title+episode: undetermined if dup or not so continue on to subsequent checks, this is because some OTA gives us only season or episode but not both;
  5. either has id or id is not equal: not dup;
  6. title+subtitle+description equal: dup;
  7. else not dup.

We leave the "id not equal" check to relatively late in an attempt to reduce false "not dup" matches against existing dvr log entries that don't have an id. Similarly the title+subtitle+description is left late since these are often always the same for daytime tv.

Other considerations: currently tvh replaces SH/SP (shows without detailed description and sports) with its own unique identifier. I'd probably alter that so programmes with a season+episode xmltv_ns just append that to the identifier, allowing it to match against the same programme in the future, but programmes with no season+episode would continue with the existing unique id algorithm.

Also, this would only apply for crid and ddprogid. It's probable that we'd have to strip everything up to the last slash off the ddprogid since we currently store the pathname of the grabber there which would be annoying if you moved machine.

If we compare a ddprogid against a tvhid, or ddprogid against crid etc., then they can't be equal so we'd consider them as if they were both null strings in the above algorithm which means we eventually fall through to title+subtitle+description check.

Additionally, UK regions sometimes use different crid or different crid authority for other countries for ITV/UTV/STV, etc. That would mean this algorithm might not work for that channel group; so perhaps not use the algorithm, have a variant where title+subtitle+description is checked before the final id check, or just limit recordings to specific channels.

The actual effort I believe is relatively minor, maybe a hundred lines across js, dvr db and match function, xmltv + requesting Kodi plugin updates.

So, I'm raising this now so people can either save me time by saying it will never work, shouldn't be implemented, or fix up the algorithm for me.

Also available in: Atom PDF