TVH can not do this by default. You can actually have both OTA and XMLTV EPG sources for a channel, but the last active source will normally completely overwrite the previous one. There is no 'merge' functionality per se.
I have an idea, but it is very convoluted and you will have to write some code.
You need 2 sources of EPG data, OTA and XMLTV, but you need them to be manifested as a united EPG in a third location.
Getting the external XMLTV should be straight forward enough.
In TVH, duplicate your Network/Mux/Service/Channel structures, but with a lower priority. This will give you 2 sets of channels. The first set that will be used exclusively for OTA EPG grabbing (lower priority) and the second set that will be used exclusively for viewing/recording TV with the merged EPG.
For the first set of channels, set their mux EPG grabber to manually be your preferred OTA grabber. For the second set of channels, make sure that the mux EPG grabber is 'None'.
Depending upon your client software, you will need to find a way to hide the first set of channels. Maybe Channel Tags could do the job.
In the EPG Grabber Channels screen, make sure that your first set of channels have no entries because you don't want your new merged XMLTV EPG to overwrite your OTA source.
From TVH, you can extract the EPG in either JSON or XMLTV format. XMLTV is already formatted but JSON has more possible data fields.
https://docs.tvheadend.org/documentation/development/json-api/api-description/epg
https://docs.tvheadend.org/documentation/development/xmltv/output
Take the external XMLTV data and merge it with your TVH JSON/XMLTV data and create a new XMLTV file with channel numbers matching the second set of channels. Feed that new XMLTV file into TVH.
Merging may need some refining. I have a curated XMLTV feed and often when the OTA EPG has a single event for double episodes, my XMLTV feed will have 2 separate episodes. My XMLTV feed also rounds start/stop times to 5 minute boundaries. Also, OTA sometimes has extra text in the title like "Movie: XYZ (Including news break at 19:37)", so your matching logic may need to be a little 'fuzzy'.
This seems like a lot of work for little gain. Consider picking the least bad EPG source and using that.
Best of luck.