Initial results with a 4-element cascode (1 driven element, the other three cascading) are promising (but ugly).

The circuit 'works'. With caps across the gates as described in the original post, the configuration causes the lower MOSFET to automatically trigger the rest in very rapid succession.

Notes: (Test performed with 1000pf caps, IRF840 MOSFETs and an extremely crude configuration. 20v drive to limit the possibility of gate damage).
* Total rise+fall time appears to be somewhat proportional to the number of FETs in series. (10ns rise time for one FET means 40ns with 4 FETs.)
* Switch-off is about 2x the switch-on speed. (Similar performance as driving a single MOSFET with the same driver)
* Without balancing, the topmost FET takes almost all of the shock. (EE's will see the same problem when using diodes in series)
Conclusion: The concept appears to be viable, but good performance will require better snubbing and filtration circuitry.
* MOV's absolutely required across all gates with clamp voltage less than the limits of the FETs.
* MOV's likely required across source-drain as well, clamping voltage well below the source-drain limit.
* Additional source-drain resistance may be needed to balance the load between each FET.
* GDT's might prove useful both as bypass capacitance and for impulse shock absorption.
* The balance-capacitor values should to be greater than the gate capacitance of the FETs.
* The primary driver needs to be as fast and clean as possible. Any oscillations on the low-end will ripple up through the chain.

And of course, snubbers, bypass capacitors, schotkey diodes to help with harmonics and other interference (which is cumulative in circuits like this). The test I performed used none of this, hence the somewhat horrific waveforms.
The next phase I think will be to design a proper PCB to help with all the artifacts encountered in this prototype, and to comfortably start testing higher voltages.

Came across this and was wondering if anyone has had any experience driving FET's in series? (CASCODE/cascade)
Ideally we would want to drive each stage with a floating, galvanically isolated driver, but I think the incredible simplicity of the attached document makes it an option worth exploring.
I'll update this thread once I've done a few basic bench tests.
Step-recovery diodes and avalanching transistors look to be interesting options as well, but they tend to limit you to 1-2 kilovolts. If we're going HV, may as well go full-bore. And using spark-gaps tends to give you much less control.
(edit: added a PNG version of the document)