Poynt...
I would agree with your eloquently-stated argument except for one major detail:
If the claimant comes forth and states specifically "My box/circuit/DUT will cause a certain kind of load to produce more thermal energy than by using the same amount of real input power fed directly into the specified load from "conventional sources", then my test procedure is the one to use. At least until we actually notice the balck box glowing red hot with only 9 watts input with 7 in the load, at which time we might advise the claimant to reword the claim. Unless the DUT is so small that 2W dissipation was likely to make it glow red hot, that is.
So far, in all the claims and schemes I've seen, the DUT exhibits the levels of internal heating that would be expected using "conventional analysis" in "conventional circuitry". Until it is apparently anomalistic
or until the claimant states as such, I think we can ignore the normal, expected amounts of heat created by the DUT's circuit components. If your propsed test procedure were to be applied in all cases despite normal-looking losses in the DUT circuits, I think it would unnecessarily complicate initial testing to no particular gain in knowledge.
Sorry, I was writing while you were posting your last post so it looked like i was ignoring you.
If, like Bedini et al, the claimant will never say exactly which output (thermal, torque, battery charging energy, etc.) represents a COP>1 or that only when all things are added together will the COP compute to overunity, then your procedure would be correct.
Maybe this is why Bedini and company keep building bigger and bigger ferris wheels...to make it more impractical to put the whole shebang in a thermally-insulated box withe DeProny brake glowing red hot on the shaft, a gigantic array of incadescent bulbs or heaters across all the batteries being charged, etc.
You see, if it takes 12 years for the DUT test to reach thermal equilibrium, those guys will sell a lot of ferris wheels and battery swappers in the mean time!
Keep in mind that this thread is in the context of a DUT claimed to heat somewhat inductive load resistors far better than "normal power" directly applied. For things like the JT, it may well be that total-inclusion testing is not only convenient and easy (no more arduous than testing only the load) but possibly superior to boot.