I'm interested in examining the idea of a programming language that eschews the notion of a callstack and returns to having a single fixed activation record per function. This obviously places a lot of limits on language design, but in return you get a statically-guaranteed upper bound on memory usage, plus weirdo stuff like the ability to hand out references to local data in functions that have already returned (which remains valid as long as you don't call the function again, which I think should be possible to enforce via a borrow checker).
>plus weirdo stuff like the ability to hand out references to local data in functions that have already returned (which remains valid as long as you don't call the function again, which I think should be possible to enforce via a borrow checker).
The C programming language supports this with the static keyword. Further calls may overwrite the pointed data. I have played with allocating fixed-size data in static locals, but I always found that allocating in the caller does the same job better. For example, compare strerror() with strerror_s(). (A sensible strerror implementation should return a pointer to static immutable data, but the Standard doesn't specify it.)
A procedural language can achieve a statically bounded call stack by restricting self recursion, mutual recursion, and function pointers. I struggle to imagine how a language without a call stack could perform differently or better.
There's prior art in this, very very old prior art. If you check out the first volume of The Art of Computer Programming, it uses a fictitious architecture called MIX. I understand that while it was fictitious, it was made to be similar enough to contemporary architectures. In it there are no special stack manipulation instructions, because there is no first-class notion of a stack! Functions used global variables for their scratch space. To call a function you would just jump to that address. Of course that function needed to know where to jump back to when complete. To do that, before jumping, the caller would WRITE THE RETURN ADDRESS TO THE JUMP INSTRUCTION AT THE END OF THE FUNCTION. This seems kind of insane in the modern day (function calls requiring self-modifying code!) but it meant you could implement functions without needing even a single extra word of storage space, and all you really gave up was recursion. I believe the original Fortran and some other older languages were originally implemented this way.
There's definitely an advantage with this idea, but also some downsides. While you have a guaranteed upper bound on memory usage, it's completely static and actually going to be worse than doing the equivalent on the stack. Suppose you have 100 functions that each need 100 bytes of scratch space. Statically allocating everything like you describe means you need 10KB of memory reserved. But suppose there is one main function, and the rest are helper functions, and the main function calls each helper function one at time and the helper functions don't call anything. With the static approach, you still need 10KB. With the stack-based approach, you only need 200 bytes (the stack space for the main function, and the stack space for the helper function it is currently calling). Another advantage to the stack-based approach is due to caching. In the static approach, whether or not a function's scratch space is in the cache or not depends on how often that function is called. But in the stack-based approach, the top of the stack is almost always in the L1 cache, giving a speed boost to functions, even when they are not called frequently.
That said, I do think it is an interesting idea that is worth exploring.
> Suppose you have 100 functions that each need 100 bytes of scratch space. Statically allocating everything like you describe means you need 10KB of memory reserved. But suppose there is one main function, and the rest are helper functions, and the main function calls each helper function one at time and the helper functions don't call anything. With the static approach, you still need 10KB. With the stack-based approach, you only need 200 bytes (the stack space for the main function, and the stack space for the helper function it is currently calling).
This is not necessarily the case: if you statically analyze the call graph, you may be able to prove that two functions will never be live at the same time, and thus reuse memory regions for their local variables: https://github.com/llvm-mos/llvm-mos/blob/main/llvm/lib/Targ...
Of course, recursion, function pointers, and dynamic linking will defeat (or at least complicate) this scheme.
That's a good point, you can definitely optimize. I'd be curious how far you could go using static analysis. Assuming no recursion, function pointers, dynamic linking, etc; can you optimize to the point that you use as little memory as a stack approach would? I think you ALMOST can! The only blocker would be specific call patterns that would require a lot of memory, but are unreachable. That's where the halting problem hits.
You just make a call graph where the nodes are functions and directed edges indicate which function calls which. Without recursion function pointers and the like you can compute this exactly and this forms a DAG. Each node is given a weight, which is the amount of local memory that function needs. Start at the entry point function, and compute the longest distance path (using the node weight).
The CDC 6600 of blessed memory had "return jump" instruction that wrote an unconditional jump back to the return address into the word prior to the first instruction of the called procedure. So one would implement a subroutine return by jumping to the word before the entry point.
(Fortran programmers are probably wondering how alternate ENTRY statements worked; yes, they had to allocate a word before the ENTRY, and copy whatever jump instruction was placed there by the hardware into the main subroutine's jump word, so that RETURN or END would work for all entry points. Recursive languages like Pascal had to copy that word to and from the stack. Reentrant code had to avoid using the return jump instruction entirely.)
As others have pointed out, that's pretty much how things originally started. A more recent example I remember is the Action! programming language for Atari 8-bit computers[1], an Algol descendent for 6502 with static activation frames.
But I don't understand the appeal. Many algorithms and data structures are most naturally expressed with recursion. Not allowing it forces the programmer to create and maintain their own manual stack data structures which brings us to square one in terms of statically-guaranteed upper bound on memory usage since they can grow indefinitely. At best, stack overflow errors will be replaced by array out of bounds errors which is the exact same thing anyway. In fact, it will be even worse: Manual stack implementation will come with its own complexity and source of bugs.
The old FORTRAN and COBOL language versions were like this.
In ancient FORTRAN programs, the programmer typically computed a maximum possible size for the data and allocated statically in the main program one or more work arrays corresponding to that maximum size, which were either placed in a COMMON block of global variables or they were passed as arguments to the invoked functions and subroutines.
In the functions or subroutines, the work arrays were typically partitioned by allocating in them various local variables.
Overall, this approach was more reliable, because there were no risks of exceeding the memory limits, but it was much more tedious for the programmer, because the worst cases for memory usage had to be accurately predicted.
> Overall, this approach was more reliable, because there were no risks of exceeding the memory limits, but it was much more tedious for the programmer, because the worst cases for memory usage had to be accurately predicted.
Well, most importantly, it fixated the program's limits. Got a bigger machine and wanted to tackle greater data-sets? You're out of luck, if you didn't have the source code. Old Unix programs were often like that. It were the GNU counterparts which often did away with such arbitrary limits allowing systems to grow and the software to stay relevant.
> I'm interested in examining the idea of a programming language that eschews the notion of a callstack
Technically, I don’t know any language whose spec mentions the notion of a call stack. For example, it’s perfectly OK for a conforming C compiler to use a linked list of dynamically allocated activation records (from a standards view point; users of such an implementation may have different opinions)
A conforming C compiler also could choose to only use a call stack for functions that take part in recursive calls or that may get called by multiple threads.
> plus weirdo stuff like the ability to hand out references to local data in functions that have already returned (which remains valid as long as you don't call the function again, which I think should be possible to enforce via a borrow checker).
If you add multi-threading (something that is almost a must have for languages on powerful CPUs nowadays), I don’t think that’s easy to do.
I don’t think that “call stack” implies “contiguous memory”, or “what the operating system might think of a process call stack”, so a linked list would still qualify as a call stack. While the C standard doesn’t use the word “stack”, it explicitly requires support for recursive function calls, and the related semantics it specifies with regard to automatic storage duration effectively describe a stack.
What about statically determining a fixed number of activation records at compile time? Similar in spirit to security focused languages that require loops to have a statically determined finite bound in order to successfully compile.
As to lifetime after returning, do you really hate continuations that much?
> What about statically determining a fixed number of activation records at compile time?
Sure, it may be useful to represent a function's local storage as a first-class concept, and then allow users to instantiate copies of the function at will, if they're willing to allocate more storage themselves, thereby allowing users to either precisely limit the number of instances of a function or otherwise use dynamic allocation to manually reimplement a callstack if they so choose.
> As to lifetime after returning, do you really hate continuations that much?
This is a language that forbids recursion, the functional programmers have already run screaming for the exits. :P
> I'm interested in examining the idea of a programming language that eschews the notion of a callstack and returns to having a single fixed activation record per function.
Welcome to the old 8-bit C compilers.
Everything was statically allocated so no recursion. The downside is that you can only have a single thread of execution.
It would be interesting to see what that would look like if extended to multiple threads (they would have to be defined/allocated at compile time).
Stack Clashing is pretty neat, something you should really pay attention to in embedded spaces (its often exploitable in UEFI land as most EDK2 builds lack guard pages).
I got to write some exploits for some recently, very fun bug class to exploit.
A colleague and I spent some time last year looking for DoS vulnerabilities caused by recursing on user input [1].
TL;DR: With CodeQL and some manual review, we found several issues resulting in two assigned CVEs, a rustsec advisory, and a handful of fixes implemented in various projects.
We mostly looked at Java projects. It is interesting to see a C vulnerability from around the same time.
It would be cool to see a larger study on how common this issue is across different programming languages.
There's a useful clang-tidy function to warn on this, for when you want to ensure there is no recursion lurking anywhere in a large codebase sensitive to stack overflow issues:
That warning doesn't ensure that there's no recursion, as the caveats point out. Indeed, it's trivial to show that ensuring there's no recursion is impossible as long as you have function pointers. (This is also why languages that claim to be working on a solution to prevent recursion statically are going to fail.)
In general, analysis becomes impossible when you have function pointers produced and used all over the place. But that doesn't have to be the case if the program is written deliberately to avoid the impossible cases. E.g., if you can separate the set of functions into "functions that (indirectly) are exported or have their address taken" and "functions that (indirectly) invoke a function pointer", then you know that there can't be any recursion through function pointers. Basically just a form of escape analysis.
And if you're writing your own language, you can just have explicit 'colored' functions (and corresponding pointers), and enforce that the call graph between different colors has no cycles, and no function invokes a pointer of its own color. Generics would get a bit messy, but it's still far from impossible.
> E.g., if you can separate the set of functions into "functions that (indirectly) are exported or have their address taken" and "functions that (indirectly) invoke a function pointer", then you know that there can't be any recursion through function pointers. Basically just a form of escape analysis.
Sure, in toy examples. But you can't actually do this in a nontrivial codebase that wasn't written to support it from the ground up.
> And if you're writing your own language, you can just have explicit 'colored' functions (and corresponding pointers), and enforce that the call graph between different colors has no cycles, and no function invokes a pointer of its own color. Generics would get a bit messy, but it's still far from impossible.
Everything is easy when you don't spend 5 minutes thinking about the possible problems with it, much less actually implement the idea you're proposing.
He didn't say "easy", he said "far from impossible".
> But you can't actually do this in a nontrivial codebase that wasn't written to support it from the ground up.
In the case where this is considered critical for whatever reason, you absolutely could have a somewhat overzealous detector that failed the compilation if triggered. It doesn't need to support every last edge case. You would then address this just as you would any other compiler error - by remediating the failures one by one.
If management doesn't want to pay you to do that then perhaps the requirement of "absolutely no recursion ever" wasn't actually so critical.
Yeah, if you're starting with, say, a heavily object-oriented program, then it would be nigh impossible. But I don't think it would even be extraordinarily difficult, if you start with a style that isn't suffused with dynamic function calls. E.g., languages like C++ and Rust with heavy monomorphization can have callbacks and other fun things without any vtables or function pointers.
From an architectural standpoint, the main challenge is layering everything properly so you don't have any circular dependencies, even in untaken branches. Again, how hard this is to achieve depends on the original design, but it's not like circular-dependency-avoidance is something that no one's done before.
You could probably restrict function pointer values with something like this:
if(fptr == x || fptr == y)
__builtin_unreachable()
Or...
if(fptr != z && fptr != w)
__builtin_unreachable()
But I'm not sure how well today's compilers can take advantage of this. You'd need a strict mode, where any function pointer is assumed to be the worst case. At that point might as well go for a real proof assistant
A more practical way (short of explicit language support) would be to have an enum and a dispatcher function:
enum { CALL_Z, CALL_W };
int call_z_or_w(int which, int arg) {
switch (which) {
case CALL_Z: return z(arg);
case CALL_W: return w(arg);
default: __builtin_unreachable();
}
}
Or you could even do the same thing, but switching on the pointer value instead of an enum. Either way, this lets the compiler statically know where the call may possibly lead to.
The fact that call depth is finite should basically kill the use of recursion in every codebase in favor of an explicit stack data structure. They're better in every way except as a way to confuse new CS students by hiding the very much lack of magic that underpins them.
You don't have stack frame overhead, you can nest arbitrarily, they'll always be more performant. You can peek into the stack to do things recursive functions can't. The optimizer is better at helping you with loops, memoization just becomes a normal cash of intermediate results.
We did. All "big" OSes have runtimes that put stacks in isolated (and very large) areas, with a guaranteed guard region at the bottom. An attack on stack bounds in Linux or Windows or OS X requires a very large depth, and will end with a regular process failure (a segfault, basically) and not a memory corruption bug.
But tools like libexpat are often used in embedded contexts, in 32 bit memory spaces, that don't have that freedom. So it's a relatively serious bug regardless.
A fixed capacity stack isn't growable, and while the capacity is pretty big by default on desktop/server OSes is pretty big, so are stack frames so the recursion depth limit is not that big. And if user input can overflow the stack it's still a DOS.
The problem to me is that recursive functions should be a special case in the ABI, where a compiler can insert a prelude to the callsite to grow/segment the stack if needed. This is a hard problem once you have references to things on the stack, so I can understand why most ecosystems don't do it - but that doesn't mean it can't be done.
What I'm saying is that stack allocation is still dynamic memory management and in systems where its critical to avoid OOM conditions because of unbounded dynamic memory management you need code with O(1) memory complexity, and that includes the stack. A common mistake I've seen people make is assume that pushing onto the stack isn't memory allocation. It's just cheap memory allocation that can sometimes be assumed to be free, in this case and many others that's a bad assumption.
Fair enough, but that's getting pretty far into "research language runtimes" and not practical solutions. The tricks most people play when faced with these problems are (1) let the OS protection do it job or (2) just disallow recursion if you really can't manage that.
There's no realistic technology stack out there that does what you want.
Go does this, as do most runtimes with stackful coroutines. Async Rust (kinda) does this where the async call stack is naturally segmented (although this doesn't get away from stack overflows in some cases because the polling call stack is not growable).
Growable stacks have been around for decades, it's far away from research language territory.
The real benefit is not to handle the worst case/OOM condition that a big enough callstack gives you. It's to make the default stack size much, much smaller for the common case where you don't need it at all (or want to shrink it later). Growable stacks use less memory on average than your typical embedded device because they can start in the dozens of bytes, instead of requiring kilo/mega bytes of stack space (allocated, not just reserved). It's kinda the only way you can make a runtime scale to millions of concurrent call stacks.
And hence the "kinda" in async Rust. While boxed futures can be thought of as a segmented stack the literal callstack is not, which can still give you a stack overflow with a bunch of nested poll() calls.
Depends on what you specifically mean by "growable": when a goroutine runs out of stack space, a new one that's larger (iirc double sized) is created, the old one is copied into it, and then execution can resume.
> while stack clashing was considered and is a theoretical possibility — denial of service was considered to be the realistic impact.
In many contexts, regular process failure is still a vulnerability.
And the stack is (usually) tiny compared to other resources. It doesn't take that many nested calls to get to the bottom of the stack. At least compared to trying to exhaust the heap or keep the CPU busy long enough to cause DoS.
This is sort of amusingly backwards. On embedded systems where I live, stacks are huge. Thread stacks of 4-16k are routinely the largest single objects the kernel sees in Zephyr. And yes, lots of RTOS apps disallow recursion (Zephyr doesn't, but does gate its use in the core system behind build-time config that can be turned off) because in that world it's hard to provide the guarantees you can get with a 64 bit mmu environment.
But if you are on a modern 64 bit OS, no: stacks are enormous. Many megabytes of mapping is routine. Obviously that's not all going to be faulted in, and most threads won't ever use more than a few kilobytes. But the region reserved for recursive use is extremely large, and unlikely to overflow except in a well-crafted deliberate attack (and even then it generally requires a few bugs in the code; most recursive algorithms are bounded by e.g. maximum tree height or something that is O(logN) and thus can't overflow).
Explicit recursion is as harmful as goto. We already have a solution - they're called recursion schemes[1].
Using recursion schemes is analogous to using structured programming rather than just constructing loops and branches out of gotos. The problem is the functional programming community got there first and the names are the equivalent of FactoryFactoryFactory and going to be a turn off to normal programmers even though the concepts are dead simple.
1. No citation because linking to a blog post containing Haskell is proving my point.
You throw goto around like it's some revolutionary change that we don't use gotos. Djikstra's paper was like 70 years ago and it was released like immediately after languages were being born.
Recursion schemes are at least as old as "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" (1991), which is closer to "Go To Statement Considered Harmful" (1968, so 23 years) than it is to today (34 years). Recursion schemes aren't at all new either.
I hope the term "deyodawgification" enters the pantheon of elegant code purification terms of art, alongside the classics like "degotofying", "deCOMtamination", and "outparamdelling".
"Soft lock picking" was the title of a YouTube series about strange ways out of impossible save files (soft locks, as opposed to hard locks where the game freezes) in video games.
Two things come to mind:
- Maximum stack depth greatly varies between targets. If you wanted to support a recursion depth of, say, 50, that may already be too much for some of the hardware this library needs to be able to run on, so the original problem still exists.
- This would be add an arbitrary limit to the parsing capabilities of the library. Stack space is very limited to begin with compared to heap space, So any solution that remains recursive would probably require a limit way lower than a heap-based solution could offer.
Or just take the limit as an argument so developers can adjust it based on the platform. Python also has a recursion limit but you can change it if you need to.
Depth is easy to miss when the parser calls a lot of functions that call each other and that have vastly different stack usage.
A more reliable check is to compare an address of a thing on the stack at api boundary with the current address of things on the stack. SpiderMonkey, the JS engine in Firefox, used something like that in past to stop run-away recursion or at least to switch to slower algorithms with bounded stack usage.
That idea works in general but causes false positives: No artificial limit you pick is "right" and the false positives can be avoided by getting rid of the recursion altogether.
PS: It's not one single function, not direct but indirect recursion.
Sure if it's indirect I agree it will get messy fast with a dozen functions suddenly needing to handle an additional parameter, but unrelated to that... I'd really like to know who needs recursion for this that's deeper than 3 or 4 levels. What's the use case? Such xml surely would be unreadable and unwritable to humans, but if it's used as some form of exchange format between systems, what would that be? How would it end up with such deeply nested entities? It sounds like something you deliberately implement that way to show how "smart" you are, but not "hey that seems the reasonable thing to do here".
This makes me wonder: does any of the popular xml libs have a sort of safe mode, where custom entities and similar features are disabled, those schema urls ignored, and namespaces just flattened (and whatever else I forgot or don't even know about)? You know for when I know I only need to parse simple xml files that should contain a couple plain tags and attributes, and want to reduce attack surface.
There are parsers that only implement a tiny subset of XML. And Expat has compile time flags to disable some of that machinery where not needed. It's arguably no longer XML then though.
You can think of this as having two base cases, your normal success case and an error case. You could use metaprogramming to do this semi-automatically, like a decorator that took a maximum depth and checked it before calling your inner function.
> Please leave recursion to math and keep it out of (in particular C) software: it kills and will kill again.
This is just nonsense. The issue is doing an unbounded amount of resource consuming work. Don't do an unbounded amount of resource consuming work, regardless of whether that work is expressed in a recursive or iterative form.
Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline). It's really not the language construct that is the issue here.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation.
You know, I got spoiled by Haskell, doing recursion everywhere without a care, and all I had to think was the evaluation order (when things blew up). Now that I'm doing some OCaml, I have to stop and think "am I writing a tail recursive function". It's easy to write multiple levels of recursion and lose track if you're writing a tail recursive function that the compiler will optimize.
I think recursions are really easy to make unbounded by mistake. Maybe not so much as for loops and off by ones.
Agreed, and this is why some languages have annotations that ask the compiler to check a function is indeed tail recursive.
However I don't think that is the case in Expat. If the algorithm is tail recursive, but accidentally not expressed in that way, its a simple (but perhaps tedious to manually apply) program transform to express it in a way that does not consume unbounded memory (heap or stack). From the scant details on the fix in the article it appears the parsing algorithm itself was completely changed. ("The third variant "Parameter entities" reuses ... the same mechanism of delayed interpretation.") If this is the case the issue is not recursion, as the original parsing algorithm would consume unbounded resources no matter how it was expressed in C.
I'm aware of the tailcall annotation but I didn't have to rely on it yet. For me the benefit of picking up OCaml is that I can do imperative constructs on a first pass (mutation and for loops), and refactor it after to pure code, when needed.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline)
A computer can chug through millions of loop iterations. I don't think tail recursion will ever result in heap exhaustion. But a stack overflow can be caused with tens of thousands of nested calls, which is tiny in comparison.
Recursion based stack overflow issues are easier to exploit because modern computer's stack space is much smaller relative to other resources.
>Sounds like a design issue in particular language implementations rather than a problem with the programming technique.
You say this like these are fundamentally separable things? I find this comment deeply confusing. Every single real software stack ever has a layer where the sausage gets made so to speak.
Stack overflow should not be a vulnerability for any modern tool chain. As to resource limits, LLVM has supported segmented stacks for something like a decade or maybe longer. Recursion is absolutely not the problem here. Outdated programming practices are.
In the general case? The failure to compile with -fsplit-stack when that's necessary for whatever your requirements are. The failure to enable the stack protector when ... pretty much always.
For this particular CVE? I'm not clear. Possibly none. The writeup didn't provide sufficient detail and I haven't bothered to wade through the code. There may well be a reason recursion won't work here but it certainly isn't general.
I'd be curious to know in this case why resource limits couldn't be enforced for the recursive implementation but could be for the iterative one.
I don't ever have a reason to practice this and can't be bothered to test it, so this might be off, but something like this ought to work?
fn recurse(tree, acc = Nil) :
tree.children match
case Cons(head, tail) => recurse(head, tail++acc)
case Nil => acc match
case Cons(head, tail) => recurse(head, tail)
case Nil => ()
Ah, yeah, i mixed up my thoughts and thought it was about making recursive functions into non-recursive ones rather than TC'd ones. Specifically this part tripped me up, I guess "exchanging stack allocation for heap allocation"
Altough, technically my "solution" would work for making it tail calls anyways, by replacing the looping part with tail calls. But I digrees
> And it gets more complex when there's a bunch of state that requires you to save the stack frame in your explicit stack.
You are incorrect about tail recursion (look at continuation-passing style; it transforms any program to tail recursive form) but you are getting at the core of the issue here. The issue is either:
* whatever the parser is doing is already tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
In other words, you can shift memory usage from the stack to the heap, but if a client can give you an unbounded amount of memory consuming work you are screwed either way. Ultimately the problem has nothing to do with recursion.
CPS transforms all programs into tail calls, not tail recursive. It buys you TCE in trivially tail recursive functions but does not magically transform something that needs to preserve state across recursion into something that is stateless.
But you're right, that's what I'm getting at. Recursion is an inherent semantic and whether you make the stack explicit or implicit doesn't get rid of memory allocation problems. I said in another comment that stack allocation is still dynamic memory management, it's just something many people take for granted.
>Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline). It's really not the language construct that is the issue here.
Not every recursion can be transformed into tail recursion. If it's not simply tail recursion (which is the high level language compiler way of implementing the old tried and true assembly language optimization technique of making the last function call be a JMP instead of a JSR followed by an RTS, that I learned from WOZ's Apple ][ 6502 monitor ROM code), you still have to manage the state somewhere. There ain't no such thing as a free lunch.
And if not on the C stack, then it has to be somewhere else, like another stack or linked list, and then you still have to apply your own limits to keep it from being unlimited, so you're back where you started (but in a Sisyphusly tail call recursive way, you still have to solve the problem again).
>Greenspun's tenth rule of programming "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
Of course you can always just pass a recursion depth limit parameter down the recursion and stop when it expires, even letting the caller decide how deep a recursion they want to permit, which is easier and safer than using malloc or cons or std::vector to roll your own stack.
Or the compiler could even do that for you, but you still have to figure out how to gracefully bottom out...
So if "Exception Handling" is hopefully failing upward, then "Depthception Handling" is hopelessly failing downward!
Depthception Handling is the vertical reflection of Exception Handling, with "implicit depth" recursion depth passing, like C++'s "implicit this" but deeper, not to be confused with the orthogonal "continuation passing" / "halt passing" dichotomy (passing HALT and CATCHFIRE instructions around but promising to never call them except in fatal emergencies):
"try" / "catch" = An active attempt to solve things.
"giveup" / "bottom" = Passive resignation to inevitability.
"throw" is an emergency exit UP.
"drop" is an emergency slide DOWN.
"try" => "giveup" introduces a futile attempt at unbounded recursion.
"catch" => "bottom" declares a bottoming out handler after a function signature, to guard its body below.
"throw" => "drop" immediately bottoms out from any depth!
fn doomed_recursion()
bottom { println!("Too deep, I give up."); }
{
giveup {
if going_too_deep() {
drop;
}
doomed_recursion();
}
}
And you can even bind the depth for explicit depth monitoring and control:
> Not every recursion can be transformed into tail recursion. If it's not simply tail recursion [...] you still have to manage the state somewhere. There ain't no such thing as a free lunch.
Ok, every recursion in a language that supports programmer-managed state can be transformed into a tail recursion. That still means any recursion in all serious programming languages and most joke programming languages too.
No, still false. Anything that recurses more than once cannot. A Fibonacci series, done by recursion, cannot be done by tail recursion (without memoization). A binary tree walker that visits all the notes (as opposed to searching for one node) cannot be done with tail recursion. And so on.
Without busting out CPS, you can also pass the current state along as parameters to your recursive calls:
def fib(n, acc1 = 0, acc2 = 1) =
n match
case 0 => 0
case 1 => acc2
case _ => fib(n-1, acc2, acc1+acc2)
You should be able to take my tree walker here[0], add a `visit: A => ()` parameter, and call `visit(tree.value)` (assuming you have a structure Tree[A] = (value: A, children: List[Tree[A]])) before the match.
And creating continuations for recursion requires memory allocation. How recursive! You can't just recursively move the memory allocation problem around and call it solved or somebody else's problem.
At least start with a rigorous, well-specified, robust, high-performance, complete implementation of Common Lisp, if you're going to recurse infinitely.
I agree that algorithms that run in bounded stack space can still consume arbitrary amounts of memory, and that is the central flaw in the article that I'm responding to.
The issue is either:
* whatever the parser is doing is already trivially tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation.
Can't all recursive functions be transformed to stack-based algorithms? And then, generally, isn't there a flatter resource consumption curve for stack-based algorithms, unless the language supports tail recursion?
E.g. Python has sys.getrecursionlimit() == 1000 by default, RecursionError, collections.deque; and "Performance of the Python 3.14 tail-call interpreter":
https://news.ycombinator.com/item?id=43317592
> Can't all recursive functions be transformed to stack-based algorithms?
Yes, you can explicitly manage the stack. You still consume memory at the same rate, but now it is a heap-allocated stack that is programmatically controlled, instead of a stack-allocated stack that is automatically controlled.
The issue is either:
* whatever the parser is doing is already tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
It's not clear how the issue was fixed. The only description in the blog post is "It used the same mechanism of delayed interpretation" which suggests they didn't translate the existing algorithm to a different form, but changed the algorithm itself to a different algorithm. (It reads like they went from eager to lazy evaluation.)
Either way, it's not recursion itself that is the problem.
> You still consume memory at the same rate, but now it is a heap-allocated stack that is programmatically controlled, instead of a stack-allocated stack that is automatically controlled
To be precise; with precision:
In C (and IIUC now Python, too), a stack frame is created for each function call (unless tail call optimization is applied by the compiler).
To avoid creating an unnecessary stack frame for each recursive function call, instead create a collections.deque and add traversed nodes to the beginning or end enqueued for processing in a loop within one non-recursive function.
Is Tail Call Optimization faster than collections.deque in a loop in one function scope stack frame?
Yes, that is basically it. A tail call should be the same jump instruction as a loop, but performance really depends on the language implementation and is hard to make general statements about.
There's a wonderful DDJ interview with James Clark (author of expat and developer many other open source sgml and xml standards and tools like Relax/NG, and even horrible ones like XSLT ;) called "A Triumph of Simplicity: James Clark on Markup Languages and XML", in which he explains how a standard has failed if everyone just uses the reference implementation, because the point of a standard is to be crisp and simple enough that many different implementations can interoperate perfectly.
A Triumph of Simplicity: James Clark on Markup Languages and XML:
I wrote more about his work in this discussion thread about Ted Nelson on What Modern Programmers Can Learn from the Past, and reading documents from 20 years ago:
>Reading documents from 20 years ago is a mixed bag. Links usually fail horribly, which was something Xanadu was trying to solve, but I'm not convinced they could have solved it so well that 20-year-old links would still actually work in practice. [...]
>In the ideal world we would all be using s-expressions and Lisp, but now XML and JSON fill the need of language-independent data formats.
>Not trying to defend XSLT (which I find to be a mixed bag), but you're aware that it's precursor was DSSSL (Scheme), with pretty much a one-to-one correspondence of language constructs and symbol names, aren't you?
>The mighty programmer James Clark wrote the de-facto reference SGML parser and DSSSL implementation, was technical lead of the XML working group, and also helped design and implement XSLT and XPath (not to mention expat, Trex / RELAX NG, etc)! It was totally flexible and incredibly powerful, but massively complicated, and you had to know scheme, which blew a lot of people's minds. But the major factor that killed SGML and DSSSL was the emergence of HTML, XML and XSLT, which were orders of magnitude simpler.
It is silly to make an overly broad statement about recursion killing. On modern "hosted" OSes, there are safeguards about stack overflows, which will quickly kill your process. This _can_ be a problem on embedded systems with limited memory management facilities (e.g., hw with no MMUs) and I do understand that the library author can't control where the library is used, and in fact some safety critical systems require a maximum stack depth guarantee which rules out recursion. However, some problems, especially parsing CFGs, are inherently recursive in nature and I'd argue going the non-recursive route with explicit stacks would result in bugs elsewhere because the code becomes hard to reason about.
>> denial of service was considered to be the realistic impact
> It is silly to make an overly broad statement about recursion killing. On modern "hosted" OSes, there are safeguards about stack overflows, which will quickly kill your process.
>> denial of service was considered to be the realistic impact
is in the article as justification for why this has low criticality and therefore isn't subject to the 90 day disclosure timeline. I.e. it's _limiting_ the predicted impact.
I assumed GP was referring to the other more critical risk, stack clashing, which I guess could lead to RCE? not being an issue on modern OS's.
The article basically said: "Letting your get killed this way would practically lead to DoS attacks (a security issue), therefore [conclusion]." The response was basically: "Actually, on modern OSes, your application gets killed, unlike on embedded systems. Therefore, [opposite conclusion]."
This doesn't make sense as a comment, regardless of the the particular conclusion.
You can fairly easily recurse with a context argument in which you maintain a depth count.
int recursive_fun(rec_context *ctx, ...)
{
int res = 0;
if (++ctx->depth > REC_LIMIT)
return ERROR_RECURSION_LIMIT;
...
res = recursive_fun(ctx, ...);
out:
ctx->depth--;
return res;
}
However, a problem with recursion might be something other than the maximum depth reached.
If recursion traverses an exponentially sized abstract space, it can chew up a lot of processing time before going anywhere near the protective limit.
The other problem, especially for libraries, is that they don't know either the overall stack size nor the size of a single stack frame (the former is specified when linking the executable, the latter is an implementation detail and subject to optimizer's trickery).
Many "fixes" for recursion effectively re-implement it, but do so with their own data structures that are equivalent to the stack (as originally used) but have precise definitions and therefore can be controlled wrt resources used.
If all the recursive code is under your control (no external libs at the bottom, or, worse, trips through external libs and back into recursion), you can make some generous estimate of the upper bound of a stack frame.
If the OS provides ample stack space, you can inquire about the limit, and subtract a safety margin from that to create your own soft limit that you can test against in the recursion.
Another way is to be absolutely restrictive, rather than track the limit. Create some test cases representing the worst recursive cases you're willing to handle (and document as such). Empirically measure their stack use. Multiply that by 1.5 and use that as the stack check limit.
On startup you can check that this soft limit is not too close to the real one and diagnose the situation.
If your soft limit is many megabytes away from the real one, you're not likely to hit the real one, unless something really stupid happens, like alloca, or a call into some code you don't control that goes rampant.
The submitted article, by the way, could use more nuance like this. Careless recursion in C, without reasoning about the consequences (what happens for certain inputs) is obviously bad, not recursion itself.
Keeping the focus specifically on this bug: Do you think it was "careless recursion" in libexpat? That library was started in 1997, and the recursion bug wasn't found until 2022.
> Another way is to be absolutely restrictive, rather than track the limit. Create some test cases representing the worst recursive cases you're willing to handle (and document as such). Empirically measure their stack use. Multiply that by 1.5 and use that as the stack check limit.
We look forward to your patches for libexpat to add new unit tests.
Unixes give us that. You have to fork the computation to have it contained in its own process, whose run-time, virtual memory limit, and stack depth you can control.
Doing it all in your VM/runtime, so you can bail the computation with an exception, is more challenging.
> Sending a deeply nested JSON object as part of a request to some API should not crash the server that handles the request.
But the API should have limits. For example, respond 413 Payload Too Large if the request is beyond any normal size for that API. Document the limits to API users. You now have an upper bound of how much nesting can be in the user input.
There's still not much reason to recurse using the program's own call stack rather than having your own stack structure that can live in dynamic memory and be handled in a context-appropriate way (e.g. returning an error at some depth limit rather than being killed by the OS).
Recursive parsing of CFGs is only better when they're LL grammars, but LR grammars (which are the most common grammar used in programming languages) are definitely better with an explicit stack due to the repeated way the state machine needs to run. You might be able to do a nicer LALR parser recursively but I personally haven't seen one.
Deeply nested instances of right-recursive rules will blow the stack. If the LARL parser has an unlimited stack due to dynamic allocation, that will perpetrate a DOS.
Table-driven LALR(1) with an explicit stack does not make the recursion issue go away. The tooling may provide built-in handling for it which translates excessive depth into a syntax error.
Recursive parsing using the native stack can take steps to protect itself, like by keeping track of the depth, and bailing upon hitting a limit.
Techniques involving obtaining the stack pointer (or close estimate thereof), and comparing it to a limit, are also possible.
I would argue that the title is misleading and overly alarmist here. This particular bug may have involved recursion and a stack overflow, but that's like saying "malloc kills" in the title of an article about a heap overflow bug. The existence of stack overflow bugs does not imply that recursion is bad any more than the existence of heap overflow bugs implies that malloc is bad. Recursion and malloc are tools that both have pretty well understood resource limitations, and one must take those limitations into account when employing those tools.
Did you see the article references [1][2] from 2006 and 2017 that already argue that recursion is a security problem? It's not new just not well-known.
>> I would argue that the title is misleading and overly alarmist here. This particular bug may have involved recursion and a stack overflow, but that's like saying "malloc kills" in the title of an article about a heap overflow bug.
Let's see what the article[1] you cited says:
Rule 3: Do not use dynamic memory allocation after initialization.
Rationale: This rule appears in most coding guidelines for safety-critical software. The reason is simple: Memory allocators, such as malloc, and garbage collectors often have unpredictable behavior that can significantly impact performance.
If you think recursion is a known security problem, do you also think using the heap is a known security problem?
Arguably, Stack Clash is just a compiler bug--recursive code shouldn't be able to jump the guard pages. This was fixed in Clang in 2021 [1], in GCC even earlier, and in MSVC earlier than that.
Recursion per se isn't an issue; unbounded stack use is. If you either know your input size is bounded (e.g. it's not user-generated) or use tail-recursion (which should get compiled to a loop), it's fine.
If your algorithm does unbounded heap allocations instead, you're still going to get oomkilled. The actual vulnerability is not enforcing request resource limits. Things like xml bombs can then exacerbate this by expanding a highly compressed request (so a small amount of attacker work can generate a large amount of receiver work).
Exactly. The article would have been much more informative if it had detailed why the usual approaches to limiting resource usage wouldn't work to prevent DoS here.
The problem, in practice, is the limit for malloc on most systems is a few GB, while the default stack size on windows is 1MB, a stupidly small size.
I love recursion, so I will spawn a thread to do it in with a decent sized stack, but it’s very easy to break if you use defaults, and the defaults are configured differently in every OS.
Parsing anything from a potential adversary needs to account for failure. Unbounded recursion is secure (ie fails safely) if the compiler is working properly.
As to DoS, without looking at the code I'm unclear why various approaches to bounding resource consumption wouldn't have worked. I assume something specific to this library and how it is used must have prevented the obvious approaches. Still, not an issue in the general case.
Guarding against unbounded recursion requires both compiler support and runtime environment support: you have to use enough resources to handle legitimate queries, but small enough memory constraints that a "query of death" doesn't kill nodes that are expensive to reactivate. Even then, by their very nature queries-of-death are usually hard to detect and a much simpler solution is something you can do in the static space, such as put an arbitrary hard-bound on recursion depth far below your resource constraints so you can fail the query without losing the whole processing node.
Google protobuffers have buried deep within at least their C++ parser an arbitrary hard limit for nesting depth (I think it may be 32). It's annoying when you hit it, but it's there for a reason.
> Guarding against unbounded recursion requires both compiler support and runtime environment support
I feel like this is similar in spirit to saying "guarding against infinite loops requires both ...".
Where resource consumption is concerned, as you pointed out you can track that manually. Presumably you have to do that anyway, since the iterative case will also need to give up and fail the task at some point.
I really don't see where recursion itself introduces an issue here. I guess if you expect to pass through a great many nodes that don't otherwise trigger resource allocation except for the stack frame, and the compiler can't optimize the activation records away, it could be problematic. That's pretty specific though. Is that really the case for a typical parser?
Please leave recursion to math and keep it out of (in particular C) software: it kills and will kill again.
Is the whole (tail-recursion optimized) Lisp language family subject to DOS? Just check that you terminate at some point, right? "Recursion kills" is just too broad and not helpful.
The whole point of this CVE is what do you do when the input you're parsing is so large that you run out of space before you can terminate. Tail recursion helps in the sense that the issue will happen later, but it isn't a fix when we're talking about arbitrary data like an XML parser.
My apologies, I interpreted "running out of space" as meaning running out of memory, rather than meaning overflowing the stack.
The tail recursion part isn't right though. If it isn't optimised into a loop, then tail recursion makes no difference at all, you will overflow the stack just as rapidly, not later. If it is optimised into a loop, then you will not overflow the stack at all.
An algorithm that uses an amount of stack proportional to the input N is said to require O(N) space. If it is rewritten to explicitly allocate the storage instead of using stack, it still uses O(N) space.
Not necessarily. If you are computing an aggregation for instance, if your computation is recursive and not tail call optimized, it may overflow the stack but the fixed version will not use additional memory for each iteration.
Otherwise, indeed stack is memory, but the memory in general is usually way less limited than the stack and also running out of memory doesn't have the same security implications as overflowing an unprotected stack.
And, unless you manage to encode everything in the stack, your iterative version will probably take the same amount of memory as your non-optimized recursive version, minus the space used for the stack.
Reasonable compilers translate tail-recursive functions into loops/jumps, so the stack does not grow. Tail recursion is easier to do in a garbage collected functional language though (e.g. if you need to use continuation passing).
Even in C, the recursive solution is usually simpler, so it makes sense to use it for unit tests to validate the more manual solution.
That really depends. Segmented stack makes it equivalent to heap. Heap might or might not fail gracefully. If the OS permits overcommit and you've mapped a large region, then an arbitrary process on the machine could trigger the OOM when writing to a supposedly allocated (but not previously written) piece of memory.
Presumably you configure resource limits in production but that just means the "correct" process gets unexpectedly killed instead of an arbitrary one.
Code handling arbitrary input needs to carefully limit resource usage. There's no avoiding it.
I think the hate on recursion is too strong. For one thing, some languages have tail call optimization, which can turn recursion into a loop without using up the stack. For another, recursion can be bounded, so that if a malicious document tries to use a lot of recursion, it just results in an error reporting the recursion was too deep.
I don’t understand the obsession of people with recursion. Sure it’s a cute math trick, it makes you feel smart, but it is an undebuggable mess with a ton of gotchas that lead to stack overflows.
Let the math to the math guys who never ship a product.
I use recursion a fair bit just because it's the easiest solution that requires the least thought. If I have a tree structure from e.g a JSON parser, or a directory tree, or an HTML node tree, what more debuggable options are there, realistically?
Re-writing recursive algorithms to be non-recursive typically requires less obvious code, where you make your own ad-hoc stack to keep track of where you are; essentially implementing the call stack manually.
In contexts where DoS or stack smashing is a concern because the input is attacker-controlled, it's often way easier to add a depth parameter and error at a certain depth than to rewrite the naturally recursive algorithm into iterative style. But tree structures are so naturally recursive that it's easy to end up with accidental unbounded recursion in complex real-world situations.
You can manage memory and compute budget so that your cute algo does not go rogue. On top of that very often you don’t have to explore the entire tree, and there is custom logic at each step to decide whether you want to proceed or not with exploration.
Programmers like recursion because some algorithms are much, much more pleasant to write this way. Others are easier to write iteratively. Both are easy to do wrong.
Example: depth-first tree walking algorithms. Implicit stack makes it trivial to express as recursion.
If your data structures are recursive, it makes sense your algorithms are too. It makes the code tidy and simpler to reason about. Plenty of times your code becomes "obviously correct".
I write a genealogy app (family history). Recursion is the foundation of the code, and permeates everywhere. The call stack can go 200 deep or more (~4,000 yrs). The code has been successfully running on millions of desktops for thirty years, and has never caused a crash or infinite loop because of recursion.
The secret is to check at every step that there is no "he's his own grandpa" loops in the user's tree, where (s)he inadvertently makes one of a person's descendants, also his ancestor. This happens sometimes because re-using the same names can cause confusion.
Recursions are super useful for dealing with certain data types, notably nested grammar parsing. Sure, it has gotcha, but that can be extremely readable.
I don't think we should ban recursions altogether, but remember that there exist associated risks, and consider them.
I'm interested in examining the idea of a programming language that eschews the notion of a callstack and returns to having a single fixed activation record per function. This obviously places a lot of limits on language design, but in return you get a statically-guaranteed upper bound on memory usage, plus weirdo stuff like the ability to hand out references to local data in functions that have already returned (which remains valid as long as you don't call the function again, which I think should be possible to enforce via a borrow checker).
>plus weirdo stuff like the ability to hand out references to local data in functions that have already returned (which remains valid as long as you don't call the function again, which I think should be possible to enforce via a borrow checker).
The C programming language supports this with the static keyword. Further calls may overwrite the pointed data. I have played with allocating fixed-size data in static locals, but I always found that allocating in the caller does the same job better. For example, compare strerror() with strerror_s(). (A sensible strerror implementation should return a pointer to static immutable data, but the Standard doesn't specify it.)
A procedural language can achieve a statically bounded call stack by restricting self recursion, mutual recursion, and function pointers. I struggle to imagine how a language without a call stack could perform differently or better.
There's prior art in this, very very old prior art. If you check out the first volume of The Art of Computer Programming, it uses a fictitious architecture called MIX. I understand that while it was fictitious, it was made to be similar enough to contemporary architectures. In it there are no special stack manipulation instructions, because there is no first-class notion of a stack! Functions used global variables for their scratch space. To call a function you would just jump to that address. Of course that function needed to know where to jump back to when complete. To do that, before jumping, the caller would WRITE THE RETURN ADDRESS TO THE JUMP INSTRUCTION AT THE END OF THE FUNCTION. This seems kind of insane in the modern day (function calls requiring self-modifying code!) but it meant you could implement functions without needing even a single extra word of storage space, and all you really gave up was recursion. I believe the original Fortran and some other older languages were originally implemented this way.
There's definitely an advantage with this idea, but also some downsides. While you have a guaranteed upper bound on memory usage, it's completely static and actually going to be worse than doing the equivalent on the stack. Suppose you have 100 functions that each need 100 bytes of scratch space. Statically allocating everything like you describe means you need 10KB of memory reserved. But suppose there is one main function, and the rest are helper functions, and the main function calls each helper function one at time and the helper functions don't call anything. With the static approach, you still need 10KB. With the stack-based approach, you only need 200 bytes (the stack space for the main function, and the stack space for the helper function it is currently calling). Another advantage to the stack-based approach is due to caching. In the static approach, whether or not a function's scratch space is in the cache or not depends on how often that function is called. But in the stack-based approach, the top of the stack is almost always in the L1 cache, giving a speed boost to functions, even when they are not called frequently.
That said, I do think it is an interesting idea that is worth exploring.
> Suppose you have 100 functions that each need 100 bytes of scratch space. Statically allocating everything like you describe means you need 10KB of memory reserved. But suppose there is one main function, and the rest are helper functions, and the main function calls each helper function one at time and the helper functions don't call anything. With the static approach, you still need 10KB. With the stack-based approach, you only need 200 bytes (the stack space for the main function, and the stack space for the helper function it is currently calling).
This is not necessarily the case: if you statically analyze the call graph, you may be able to prove that two functions will never be live at the same time, and thus reuse memory regions for their local variables: https://github.com/llvm-mos/llvm-mos/blob/main/llvm/lib/Targ...
Of course, recursion, function pointers, and dynamic linking will defeat (or at least complicate) this scheme.
That's a good point, you can definitely optimize. I'd be curious how far you could go using static analysis. Assuming no recursion, function pointers, dynamic linking, etc; can you optimize to the point that you use as little memory as a stack approach would? I think you ALMOST can! The only blocker would be specific call patterns that would require a lot of memory, but are unreachable. That's where the halting problem hits.
You just make a call graph where the nodes are functions and directed edges indicate which function calls which. Without recursion function pointers and the like you can compute this exactly and this forms a DAG. Each node is given a weight, which is the amount of local memory that function needs. Start at the entry point function, and compute the longest distance path (using the node weight).
Original Fortran did not modified the code. Rather each function had a global variable storing the address to return that the caller had to set.
The CDC 6600 of blessed memory had "return jump" instruction that wrote an unconditional jump back to the return address into the word prior to the first instruction of the called procedure. So one would implement a subroutine return by jumping to the word before the entry point.
(Fortran programmers are probably wondering how alternate ENTRY statements worked; yes, they had to allocate a word before the ENTRY, and copy whatever jump instruction was placed there by the hardware into the main subroutine's jump word, so that RETURN or END would work for all entry points. Recursive languages like Pascal had to copy that word to and from the stack. Reentrant code had to avoid using the return jump instruction entirely.)
As others have pointed out, that's pretty much how things originally started. A more recent example I remember is the Action! programming language for Atari 8-bit computers[1], an Algol descendent for 6502 with static activation frames.
But I don't understand the appeal. Many algorithms and data structures are most naturally expressed with recursion. Not allowing it forces the programmer to create and maintain their own manual stack data structures which brings us to square one in terms of statically-guaranteed upper bound on memory usage since they can grow indefinitely. At best, stack overflow errors will be replaced by array out of bounds errors which is the exact same thing anyway. In fact, it will be even worse: Manual stack implementation will come with its own complexity and source of bugs.
[1] https://en.wikipedia.org/wiki/Action!_(programming_language)
That programming language exists.
The old FORTRAN and COBOL language versions were like this.
In ancient FORTRAN programs, the programmer typically computed a maximum possible size for the data and allocated statically in the main program one or more work arrays corresponding to that maximum size, which were either placed in a COMMON block of global variables or they were passed as arguments to the invoked functions and subroutines.
In the functions or subroutines, the work arrays were typically partitioned by allocating in them various local variables.
Overall, this approach was more reliable, because there were no risks of exceeding the memory limits, but it was much more tedious for the programmer, because the worst cases for memory usage had to be accurately predicted.
> Overall, this approach was more reliable, because there were no risks of exceeding the memory limits, but it was much more tedious for the programmer, because the worst cases for memory usage had to be accurately predicted.
Well, most importantly, it fixated the program's limits. Got a bigger machine and wanted to tackle greater data-sets? You're out of luck, if you didn't have the source code. Old Unix programs were often like that. It were the GNU counterparts which often did away with such arbitrary limits allowing systems to grow and the software to stay relevant.
> I'm interested in examining the idea of a programming language that eschews the notion of a callstack
Technically, I don’t know any language whose spec mentions the notion of a call stack. For example, it’s perfectly OK for a conforming C compiler to use a linked list of dynamically allocated activation records (from a standards view point; users of such an implementation may have different opinions)
A conforming C compiler also could choose to only use a call stack for functions that take part in recursive calls or that may get called by multiple threads.
> plus weirdo stuff like the ability to hand out references to local data in functions that have already returned (which remains valid as long as you don't call the function again, which I think should be possible to enforce via a borrow checker).
If you add multi-threading (something that is almost a must have for languages on powerful CPUs nowadays), I don’t think that’s easy to do.
I don’t think that “call stack” implies “contiguous memory”, or “what the operating system might think of a process call stack”, so a linked list would still qualify as a call stack. While the C standard doesn’t use the word “stack”, it explicitly requires support for recursive function calls, and the related semantics it specifies with regard to automatic storage duration effectively describe a stack.
> single fixed activation record per function
What about statically determining a fixed number of activation records at compile time? Similar in spirit to security focused languages that require loops to have a statically determined finite bound in order to successfully compile.
As to lifetime after returning, do you really hate continuations that much?
> What about statically determining a fixed number of activation records at compile time?
Sure, it may be useful to represent a function's local storage as a first-class concept, and then allow users to instantiate copies of the function at will, if they're willing to allocate more storage themselves, thereby allowing users to either precisely limit the number of instances of a function or otherwise use dynamic allocation to manually reimplement a callstack if they so choose.
> As to lifetime after returning, do you really hate continuations that much?
This is a language that forbids recursion, the functional programmers have already run screaming for the exits. :P
related thoughts perhaps? http://lambda-the-ultimate.org/node/5555
(be warned that the old site links are slow, one has to wait for the unarchiving to run, or something.)
> I'm interested in examining the idea of a programming language that eschews the notion of a callstack and returns to having a single fixed activation record per function.
Welcome to the old 8-bit C compilers.
Everything was statically allocated so no recursion. The downside is that you can only have a single thread of execution.
It would be interesting to see what that would look like if extended to multiple threads (they would have to be defined/allocated at compile time).
I think this was a really brave call for help from the writer. They needed help and they asked for it, from strangers!
Thank you!
Stack Clashing is pretty neat, something you should really pay attention to in embedded spaces (its often exploitable in UEFI land as most EDK2 builds lack guard pages).
I got to write some exploits for some recently, very fun bug class to exploit.
This is a neat bug!
A colleague and I spent some time last year looking for DoS vulnerabilities caused by recursing on user input [1].
TL;DR: With CodeQL and some manual review, we found several issues resulting in two assigned CVEs, a rustsec advisory, and a handful of fixes implemented in various projects.
We mostly looked at Java projects. It is interesting to see a C vulnerability from around the same time.
It would be cool to see a larger study on how common this issue is across different programming languages.
[1]: https://resources.trailofbits.com/input-driven-recursion-whi...
Thanks for sharing that research!
There's a useful clang-tidy function to warn on this, for when you want to ensure there is no recursion lurking anywhere in a large codebase sensitive to stack overflow issues:
https://clang.llvm.org/extra/clang-tidy/checks/misc/no-recur...
That warning doesn't ensure that there's no recursion, as the caveats point out. Indeed, it's trivial to show that ensuring there's no recursion is impossible as long as you have function pointers. (This is also why languages that claim to be working on a solution to prevent recursion statically are going to fail.)
In general, analysis becomes impossible when you have function pointers produced and used all over the place. But that doesn't have to be the case if the program is written deliberately to avoid the impossible cases. E.g., if you can separate the set of functions into "functions that (indirectly) are exported or have their address taken" and "functions that (indirectly) invoke a function pointer", then you know that there can't be any recursion through function pointers. Basically just a form of escape analysis.
And if you're writing your own language, you can just have explicit 'colored' functions (and corresponding pointers), and enforce that the call graph between different colors has no cycles, and no function invokes a pointer of its own color. Generics would get a bit messy, but it's still far from impossible.
> E.g., if you can separate the set of functions into "functions that (indirectly) are exported or have their address taken" and "functions that (indirectly) invoke a function pointer", then you know that there can't be any recursion through function pointers. Basically just a form of escape analysis.
Sure, in toy examples. But you can't actually do this in a nontrivial codebase that wasn't written to support it from the ground up.
> And if you're writing your own language, you can just have explicit 'colored' functions (and corresponding pointers), and enforce that the call graph between different colors has no cycles, and no function invokes a pointer of its own color. Generics would get a bit messy, but it's still far from impossible.
Everything is easy when you don't spend 5 minutes thinking about the possible problems with it, much less actually implement the idea you're proposing.
He didn't say "easy", he said "far from impossible".
> But you can't actually do this in a nontrivial codebase that wasn't written to support it from the ground up.
In the case where this is considered critical for whatever reason, you absolutely could have a somewhat overzealous detector that failed the compilation if triggered. It doesn't need to support every last edge case. You would then address this just as you would any other compiler error - by remediating the failures one by one.
If management doesn't want to pay you to do that then perhaps the requirement of "absolutely no recursion ever" wasn't actually so critical.
Yeah, if you're starting with, say, a heavily object-oriented program, then it would be nigh impossible. But I don't think it would even be extraordinarily difficult, if you start with a style that isn't suffused with dynamic function calls. E.g., languages like C++ and Rust with heavy monomorphization can have callbacks and other fun things without any vtables or function pointers.
From an architectural standpoint, the main challenge is layering everything properly so you don't have any circular dependencies, even in untaken branches. Again, how hard this is to achieve depends on the original design, but it's not like circular-dependency-avoidance is something that no one's done before.
You could probably restrict function pointer values with something like this:
Or... But I'm not sure how well today's compilers can take advantage of this. You'd need a strict mode, where any function pointer is assumed to be the worst case. At that point might as well go for a real proof assistantA more practical way (short of explicit language support) would be to have an enum and a dispatcher function:
Or you could even do the same thing, but switching on the pointer value instead of an enum. Either way, this lets the compiler statically know where the call may possibly lead to.If we had standardized growable call stacks then this wouldn't happen
The fact that call depth is finite should basically kill the use of recursion in every codebase in favor of an explicit stack data structure. They're better in every way except as a way to confuse new CS students by hiding the very much lack of magic that underpins them.
You don't have stack frame overhead, you can nest arbitrarily, they'll always be more performant. You can peek into the stack to do things recursive functions can't. The optimizer is better at helping you with loops, memoization just becomes a normal cash of intermediate results.
We did. All "big" OSes have runtimes that put stacks in isolated (and very large) areas, with a guaranteed guard region at the bottom. An attack on stack bounds in Linux or Windows or OS X requires a very large depth, and will end with a regular process failure (a segfault, basically) and not a memory corruption bug.
But tools like libexpat are often used in embedded contexts, in 32 bit memory spaces, that don't have that freedom. So it's a relatively serious bug regardless.
A fixed capacity stack isn't growable, and while the capacity is pretty big by default on desktop/server OSes is pretty big, so are stack frames so the recursion depth limit is not that big. And if user input can overflow the stack it's still a DOS.
The problem to me is that recursive functions should be a special case in the ABI, where a compiler can insert a prelude to the callsite to grow/segment the stack if needed. This is a hard problem once you have references to things on the stack, so I can understand why most ecosystems don't do it - but that doesn't mean it can't be done.
What I'm saying is that stack allocation is still dynamic memory management and in systems where its critical to avoid OOM conditions because of unbounded dynamic memory management you need code with O(1) memory complexity, and that includes the stack. A common mistake I've seen people make is assume that pushing onto the stack isn't memory allocation. It's just cheap memory allocation that can sometimes be assumed to be free, in this case and many others that's a bad assumption.
Fair enough, but that's getting pretty far into "research language runtimes" and not practical solutions. The tricks most people play when faced with these problems are (1) let the OS protection do it job or (2) just disallow recursion if you really can't manage that.
There's no realistic technology stack out there that does what you want.
Go does this, as do most runtimes with stackful coroutines. Async Rust (kinda) does this where the async call stack is naturally segmented (although this doesn't get away from stack overflows in some cases because the polling call stack is not growable).
Growable stacks have been around for decades, it's far away from research language territory.
The real benefit is not to handle the worst case/OOM condition that a big enough callstack gives you. It's to make the default stack size much, much smaller for the common case where you don't need it at all (or want to shrink it later). Growable stacks use less memory on average than your typical embedded device because they can start in the dozens of bytes, instead of requiring kilo/mega bytes of stack space (allocated, not just reserved). It's kinda the only way you can make a runtime scale to millions of concurrent call stacks.
Both Go and Rust removed segmented stacks, for the same reasons: they can really kill performance.
Go now copies stacks, and Rust only has segments if you deliberately box up a recursive call.
Go stacks are still growable, no?
And hence the "kinda" in async Rust. While boxed futures can be thought of as a segmented stack the literal callstack is not, which can still give you a stack overflow with a bunch of nested poll() calls.
> Go stacks are still growable, no?
Depends on what you specifically mean by "growable": when a goroutine runs out of stack space, a new one that's larger (iirc double sized) is created, the old one is copied into it, and then execution can resume.
I can see the argument both ways.
> which can still give you a stack overflow
For sure.
> while stack clashing was considered and is a theoretical possibility — denial of service was considered to be the realistic impact.
In many contexts, regular process failure is still a vulnerability.
And the stack is (usually) tiny compared to other resources. It doesn't take that many nested calls to get to the bottom of the stack. At least compared to trying to exhaust the heap or keep the CPU busy long enough to cause DoS.
> And the stack is (usually) tiny
This is sort of amusingly backwards. On embedded systems where I live, stacks are huge. Thread stacks of 4-16k are routinely the largest single objects the kernel sees in Zephyr. And yes, lots of RTOS apps disallow recursion (Zephyr doesn't, but does gate its use in the core system behind build-time config that can be turned off) because in that world it's hard to provide the guarantees you can get with a 64 bit mmu environment.
But if you are on a modern 64 bit OS, no: stacks are enormous. Many megabytes of mapping is routine. Obviously that's not all going to be faulted in, and most threads won't ever use more than a few kilobytes. But the region reserved for recursive use is extremely large, and unlikely to overflow except in a well-crafted deliberate attack (and even then it generally requires a few bugs in the code; most recursive algorithms are bounded by e.g. maximum tree height or something that is O(logN) and thus can't overflow).
Explicit recursion is as harmful as goto. We already have a solution - they're called recursion schemes[1].
Using recursion schemes is analogous to using structured programming rather than just constructing loops and branches out of gotos. The problem is the functional programming community got there first and the names are the equivalent of FactoryFactoryFactory and going to be a turn off to normal programmers even though the concepts are dead simple.
1. No citation because linking to a blog post containing Haskell is proving my point.
I think focusing on the technicals is missing the forest for the trees.
Security vulnerabilities and limitations of languages are an inevitability. You won't fix them all, you will always find faults in code.
Now are we not seeing the structural problem with these organizations?
And yet we don't use goto today because of the bugs and we're phasing out manual memory management for the same reason.
You cannot org change yourself out of all technical problems.
You throw goto around like it's some revolutionary change that we don't use gotos. Djikstra's paper was like 70 years ago and it was released like immediately after languages were being born.
Recursion schemes are at least as old as "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" (1991), which is closer to "Go To Statement Considered Harmful" (1968, so 23 years) than it is to today (34 years). Recursion schemes aren't at all new either.
I hope the term "deyodawgification" enters the pantheon of elegant code purification terms of art, alongside the classics like "degotofying", "deCOMtamination", and "outparamdelling".
https://knowyourmeme.com/memes/xzibit-yo-dawg
https://wiki.mozilla.org/Gecko:DeCOMtamination_Algorithm
https://bugzilla.mozilla.org/show_bug.cgi?id=455943
https://news.ycombinator.com/item?id=22708241
Who knows (or can invent) any other good ones:
Deadlocksmithing, JSONcology, YAMLectomy, XMLimination, DeDTDification, DeXMLEntitification, ExCDATAration, AntiPOJOification, SOAPerfluousness Scrubbing, WSDLectomy, Delogorrhea, DeK8ification, HelmholtzianRechartification, DevOpsDevangelicalism, FactoryFactoryDefactoringDefactoring, Antiquasimonolithicmicrofragmentedmacrodecontainerizationarianism...
"Soft lock picking" was the title of a YouTube series about strange ways out of impossible save files (soft locks, as opposed to hard locks where the game freezes) in video games.
DTDetox
I wonder what would be wrong with changing the recursive function to take a depth and then bailing out if the depth is too big.
Two things come to mind: - Maximum stack depth greatly varies between targets. If you wanted to support a recursion depth of, say, 50, that may already be too much for some of the hardware this library needs to be able to run on, so the original problem still exists. - This would be add an arbitrary limit to the parsing capabilities of the library. Stack space is very limited to begin with compared to heap space, So any solution that remains recursive would probably require a limit way lower than a heap-based solution could offer.
Or just take the limit as an argument so developers can adjust it based on the platform. Python also has a recursion limit but you can change it if you need to.
Depth is easy to miss when the parser calls a lot of functions that call each other and that have vastly different stack usage.
A more reliable check is to compare an address of a thing on the stack at api boundary with the current address of things on the stack. SpiderMonkey, the JS engine in Firefox, used something like that in past to stop run-away recursion or at least to switch to slower algorithms with bounded stack usage.
That idea works in general but causes false positives: No artificial limit you pick is "right" and the false positives can be avoided by getting rid of the recursion altogether.
PS: It's not one single function, not direct but indirect recursion.
Sure if it's indirect I agree it will get messy fast with a dozen functions suddenly needing to handle an additional parameter, but unrelated to that... I'd really like to know who needs recursion for this that's deeper than 3 or 4 levels. What's the use case? Such xml surely would be unreadable and unwritable to humans, but if it's used as some form of exchange format between systems, what would that be? How would it end up with such deeply nested entities? It sounds like something you deliberately implement that way to show how "smart" you are, but not "hey that seems the reasonable thing to do here".
This makes me wonder: does any of the popular xml libs have a sort of safe mode, where custom entities and similar features are disabled, those schema urls ignored, and namespaces just flattened (and whatever else I forgot or don't even know about)? You know for when I know I only need to parse simple xml files that should contain a couple plain tags and attributes, and want to reduce attack surface.
There are parsers that only implement a tiny subset of XML. And Expat has compile time flags to disable some of that machinery where not needed. It's arguably no longer XML then though.
It's not ambitious enough. This 'solution' would be someone else's problem to be avoided at all cost.
You can think of this as having two base cases, your normal success case and an error case. You could use metaprogramming to do this semi-automatically, like a decorator that took a maximum depth and checked it before calling your inner function.
How awful. Did anyone call The Recurse Center?
https://news.ycombinator.com/item?id=43361773
Great article! Is the code or technique used to fix this easily available somewhere?
Here's the relevant pull request: https://github.com/libexpat/libexpat/pull/973
> Please leave recursion to math and keep it out of (in particular C) software: it kills and will kill again.
This is just nonsense. The issue is doing an unbounded amount of resource consuming work. Don't do an unbounded amount of resource consuming work, regardless of whether that work is expressed in a recursive or iterative form.
Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline). It's really not the language construct that is the issue here.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation.
You know, I got spoiled by Haskell, doing recursion everywhere without a care, and all I had to think was the evaluation order (when things blew up). Now that I'm doing some OCaml, I have to stop and think "am I writing a tail recursive function". It's easy to write multiple levels of recursion and lose track if you're writing a tail recursive function that the compiler will optimize.
I think recursions are really easy to make unbounded by mistake. Maybe not so much as for loops and off by ones.
Agreed, and this is why some languages have annotations that ask the compiler to check a function is indeed tail recursive.
However I don't think that is the case in Expat. If the algorithm is tail recursive, but accidentally not expressed in that way, its a simple (but perhaps tedious to manually apply) program transform to express it in a way that does not consume unbounded memory (heap or stack). From the scant details on the fix in the article it appears the parsing algorithm itself was completely changed. ("The third variant "Parameter entities" reuses ... the same mechanism of delayed interpretation.") If this is the case the issue is not recursion, as the original parsing algorithm would consume unbounded resources no matter how it was expressed in C.
Ah, I find myself in similar waters. In your experience does the [@@tailcal] annotation not cover enough of the cases?
I'm aware of the tailcall annotation but I didn't have to rely on it yet. For me the benefit of picking up OCaml is that I can do imperative constructs on a first pass (mutation and for loops), and refactor it after to pure code, when needed.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline)
A computer can chug through millions of loop iterations. I don't think tail recursion will ever result in heap exhaustion. But a stack overflow can be caused with tens of thousands of nested calls, which is tiny in comparison.
Recursion based stack overflow issues are easier to exploit because modern computer's stack space is much smaller relative to other resources.
> Recursion based stack overflow issues are easier to exploit because modern computer's stack space is much smaller relative to other resources.
Sounds like a design issue in particular language implementations rather than a problem with the programming technique.
>Sounds like a design issue in particular language implementations rather than a problem with the programming technique.
You say this like these are fundamentally separable things? I find this comment deeply confusing. Every single real software stack ever has a layer where the sausage gets made so to speak.
Stack overflow should not be a vulnerability for any modern tool chain. As to resource limits, LLVM has supported segmented stacks for something like a decade or maybe longer. Recursion is absolutely not the problem here. Outdated programming practices are.
> Outdated programming practices are.
What is the outdated programming practice at fault here?
In the general case? The failure to compile with -fsplit-stack when that's necessary for whatever your requirements are. The failure to enable the stack protector when ... pretty much always.
For this particular CVE? I'm not clear. Possibly none. The writeup didn't provide sufficient detail and I haven't bothered to wade through the code. There may well be a reason recursion won't work here but it certainly isn't general.
I'd be curious to know in this case why resource limits couldn't be enforced for the recursive implementation but could be for the iterative one.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation
how do you make this tail recursive?I don't ever have a reason to practice this and can't be bothered to test it, so this might be off, but something like this ought to work?
put the current nodes children onto a srack in reverse order, pop the top and do it again for that one.amd then just loop until done
That's not tail recursion.
And it gets more complex when there's a bunch of state that requires you to save the stack frame in your explicit stack.
If the language runtime has growable stacks you're just making the code more complex.
Ah, yeah, i mixed up my thoughts and thought it was about making recursive functions into non-recursive ones rather than TC'd ones. Specifically this part tripped me up, I guess "exchanging stack allocation for heap allocation"
Altough, technically my "solution" would work for making it tail calls anyways, by replacing the looping part with tail calls. But I digrees
> And it gets more complex when there's a bunch of state that requires you to save the stack frame in your explicit stack.
You are incorrect about tail recursion (look at continuation-passing style; it transforms any program to tail recursive form) but you are getting at the core of the issue here. The issue is either:
* whatever the parser is doing is already tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
In other words, you can shift memory usage from the stack to the heap, but if a client can give you an unbounded amount of memory consuming work you are screwed either way. Ultimately the problem has nothing to do with recursion.
CPS transforms all programs into tail calls, not tail recursive. It buys you TCE in trivially tail recursive functions but does not magically transform something that needs to preserve state across recursion into something that is stateless.
But you're right, that's what I'm getting at. Recursion is an inherent semantic and whether you make the stack explicit or implicit doesn't get rid of memory allocation problems. I said in another comment that stack allocation is still dynamic memory management, it's just something many people take for granted.
>Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline). It's really not the language construct that is the issue here.
Not every recursion can be transformed into tail recursion. If it's not simply tail recursion (which is the high level language compiler way of implementing the old tried and true assembly language optimization technique of making the last function call be a JMP instead of a JSR followed by an RTS, that I learned from WOZ's Apple ][ 6502 monitor ROM code), you still have to manage the state somewhere. There ain't no such thing as a free lunch.
And if not on the C stack, then it has to be somewhere else, like another stack or linked list, and then you still have to apply your own limits to keep it from being unlimited, so you're back where you started (but in a Sisyphusly tail call recursive way, you still have to solve the problem again).
https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule
>Greenspun's tenth rule of programming "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
Of course you can always just pass a recursion depth limit parameter down the recursion and stop when it expires, even letting the caller decide how deep a recursion they want to permit, which is easier and safer than using malloc or cons or std::vector to roll your own stack.
Or the compiler could even do that for you, but you still have to figure out how to gracefully bottom out...
So if "Exception Handling" is hopefully failing upward, then "Depthception Handling" is hopelessly failing downward!
Depthception Handling is the vertical reflection of Exception Handling, with "implicit depth" recursion depth passing, like C++'s "implicit this" but deeper, not to be confused with the orthogonal "continuation passing" / "halt passing" dichotomy (passing HALT and CATCHFIRE instructions around but promising to never call them except in fatal emergencies):
"try" / "catch" = An active attempt to solve things.
"giveup" / "bottom" = Passive resignation to inevitability.
"throw" is an emergency exit UP.
"drop" is an emergency slide DOWN.
"try" => "giveup" introduces a futile attempt at unbounded recursion.
"catch" => "bottom" declares a bottoming out handler after a function signature, to guard its body below.
"throw" => "drop" immediately bottoms out from any depth!
And you can even bind the depth for explicit depth monitoring and control:> Not every recursion can be transformed into tail recursion. If it's not simply tail recursion [...] you still have to manage the state somewhere. There ain't no such thing as a free lunch.
Ok, every recursion in a language that supports programmer-managed state can be transformed into a tail recursion. That still means any recursion in all serious programming languages and most joke programming languages too.
No, still false. Anything that recurses more than once cannot. A Fibonacci series, done by recursion, cannot be done by tail recursion (without memoization). A binary tree walker that visits all the notes (as opposed to searching for one node) cannot be done with tail recursion. And so on.
Without busting out CPS, you can also pass the current state along as parameters to your recursive calls:
You should be able to take my tree walker here[0], add a `visit: A => ()` parameter, and call `visit(tree.value)` (assuming you have a structure Tree[A] = (value: A, children: List[Tree[A]])) before the match.[0] https://news.ycombinator.com/item?id=43365880
https://en.wikipedia.org/wiki/Continuation-passing_style
"Programs can be automatically transformed ... to CPS. ... Every call in CPS is a tail call"
And creating continuations for recursion requires memory allocation. How recursive! You can't just recursively move the memory allocation problem around and call it solved or somebody else's problem.
At least start with a rigorous, well-specified, robust, high-performance, complete implementation of Common Lisp, if you're going to recurse infinitely.
I agree that algorithms that run in bounded stack space can still consume arbitrary amounts of memory, and that is the central flaw in the article that I'm responding to.
The issue is either:
* whatever the parser is doing is already trivially tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
Either way it is not the fault of recursion.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation.
Can't all recursive functions be transformed to stack-based algorithms? And then, generally, isn't there a flatter resource consumption curve for stack-based algorithms, unless the language supports tail recursion?
E.g. Python has sys.getrecursionlimit() == 1000 by default, RecursionError, collections.deque; and "Performance of the Python 3.14 tail-call interpreter": https://news.ycombinator.com/item?id=43317592
> Can't all recursive functions be transformed to stack-based algorithms?
Yes, you can explicitly manage the stack. You still consume memory at the same rate, but now it is a heap-allocated stack that is programmatically controlled, instead of a stack-allocated stack that is automatically controlled.
The issue is either:
* whatever the parser is doing is already tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
It's not clear how the issue was fixed. The only description in the blog post is "It used the same mechanism of delayed interpretation" which suggests they didn't translate the existing algorithm to a different form, but changed the algorithm itself to a different algorithm. (It reads like they went from eager to lazy evaluation.)
Either way, it's not recursion itself that is the problem.
> You still consume memory at the same rate, but now it is a heap-allocated stack that is programmatically controlled, instead of a stack-allocated stack that is automatically controlled
To be precise; with precision:
In C (and IIUC now Python, too), a stack frame is created for each function call (unless tail call optimization is applied by the compiler).
To avoid creating an unnecessary stack frame for each recursive function call, instead create a collections.deque and add traversed nodes to the beginning or end enqueued for processing in a loop within one non-recursive function.
Is Tail Call Optimization faster than collections.deque in a loop in one function scope stack frame?
Yes, that is basically it. A tail call should be the same jump instruction as a loop, but performance really depends on the language implementation and is hard to make general statements about.
Yet another article linking to the '10 rules of safety critical development' as a way to bash recursion.
If you're going to cite this, at least make sure you're not allocating any memory after startup.
There's a wonderful DDJ interview with James Clark (author of expat and developer many other open source sgml and xml standards and tools like Relax/NG, and even horrible ones like XSLT ;) called "A Triumph of Simplicity: James Clark on Markup Languages and XML", in which he explains how a standard has failed if everyone just uses the reference implementation, because the point of a standard is to be crisp and simple enough that many different implementations can interoperate perfectly.
A Triumph of Simplicity: James Clark on Markup Languages and XML:
https://www.drdobbs.com/a-triumph-of-simplicity-james-clark-...
I wrote more about his work in this discussion thread about Ted Nelson on What Modern Programmers Can Learn from the Past, and reading documents from 20 years ago:
https://news.ycombinator.com/item?id=16226209
>Reading documents from 20 years ago is a mixed bag. Links usually fail horribly, which was something Xanadu was trying to solve, but I'm not convinced they could have solved it so well that 20-year-old links would still actually work in practice. [...]
>In the ideal world we would all be using s-expressions and Lisp, but now XML and JSON fill the need of language-independent data formats. >Not trying to defend XSLT (which I find to be a mixed bag), but you're aware that it's precursor was DSSSL (Scheme), with pretty much a one-to-one correspondence of language constructs and symbol names, aren't you?
>The mighty programmer James Clark wrote the de-facto reference SGML parser and DSSSL implementation, was technical lead of the XML working group, and also helped design and implement XSLT and XPath (not to mention expat, Trex / RELAX NG, etc)! It was totally flexible and incredibly powerful, but massively complicated, and you had to know scheme, which blew a lot of people's minds. But the major factor that killed SGML and DSSSL was the emergence of HTML, XML and XSLT, which were orders of magnitude simpler.
James Clark:
http://www.jclark.com/
https://en.wikipedia.org/wiki/James_Clark_(programmer)
https://en.wikipedia.org/wiki/Tragedy_of_the_commons
It's crazy how most companies just mindlessly fish in the commons and cannot even respond to whoever produces the common good.
Shoutouts to the 2 companies that responded by sharing resources (money or engineer time) when asked to.
Shame that the other 40 companies basically get to enjoy the benefits while playing the clueless fool.
Hopefully these 2 companies find a competitive advantage in being supply chain aware. No sympathy to the rest of the companies if they get hacked.
EDIT: https://en.wikipedia.org/wiki/Free-rider_problem
This seems like a much more specific name for the problem
It is silly to make an overly broad statement about recursion killing. On modern "hosted" OSes, there are safeguards about stack overflows, which will quickly kill your process. This _can_ be a problem on embedded systems with limited memory management facilities (e.g., hw with no MMUs) and I do understand that the library author can't control where the library is used, and in fact some safety critical systems require a maximum stack depth guarantee which rules out recursion. However, some problems, especially parsing CFGs, are inherently recursive in nature and I'd argue going the non-recursive route with explicit stacks would result in bugs elsewhere because the code becomes hard to reason about.
>> denial of service was considered to be the realistic impact
> It is silly to make an overly broad statement about recursion killing. On modern "hosted" OSes, there are safeguards about stack overflows, which will quickly kill your process.
Something doesn't make sense here.
I might not entirely understand, but
>> denial of service was considered to be the realistic impact
is in the article as justification for why this has low criticality and therefore isn't subject to the 90 day disclosure timeline. I.e. it's _limiting_ the predicted impact.
I assumed GP was referring to the other more critical risk, stack clashing, which I guess could lead to RCE? not being an issue on modern OS's.
The article basically said: "Letting your get killed this way would practically lead to DoS attacks (a security issue), therefore [conclusion]." The response was basically: "Actually, on modern OSes, your application gets killed, unlike on embedded systems. Therefore, [opposite conclusion]."
This doesn't make sense as a comment, regardless of the the particular conclusion.
> On modern "hosted" OSes, there are safeguards about stack overflows, which will quickly kill your process.
There are lots of contexts where a processing being killed is bad.
Sending a deeply nested JSON object as part of a request to some API should not crash the server that handles the request.
In contexts where availability matters, recursing on user supplied input is dangerous.
You can fairly easily recurse with a context argument in which you maintain a depth count.
However, a problem with recursion might be something other than the maximum depth reached.If recursion traverses an exponentially sized abstract space, it can chew up a lot of processing time before going anywhere near the protective limit.
The other problem, especially for libraries, is that they don't know either the overall stack size nor the size of a single stack frame (the former is specified when linking the executable, the latter is an implementation detail and subject to optimizer's trickery).
Many "fixes" for recursion effectively re-implement it, but do so with their own data structures that are equivalent to the stack (as originally used) but have precise definitions and therefore can be controlled wrt resources used.
If all the recursive code is under your control (no external libs at the bottom, or, worse, trips through external libs and back into recursion), you can make some generous estimate of the upper bound of a stack frame.
If the OS provides ample stack space, you can inquire about the limit, and subtract a safety margin from that to create your own soft limit that you can test against in the recursion.
Another way is to be absolutely restrictive, rather than track the limit. Create some test cases representing the worst recursive cases you're willing to handle (and document as such). Empirically measure their stack use. Multiply that by 1.5 and use that as the stack check limit.
On startup you can check that this soft limit is not too close to the real one and diagnose the situation.
If your soft limit is many megabytes away from the real one, you're not likely to hit the real one, unless something really stupid happens, like alloca, or a call into some code you don't control that goes rampant.
The submitted article, by the way, could use more nuance like this. Careless recursion in C, without reasoning about the consequences (what happens for certain inputs) is obviously bad, not recursion itself.
Which is why it's reasonable to have configurable limits for both processing space and time in anything handling untrusted data.
Unixes give us that. You have to fork the computation to have it contained in its own process, whose run-time, virtual memory limit, and stack depth you can control.
Doing it all in your VM/runtime, so you can bail the computation with an exception, is more challenging.
> Unixes give us that. You have to fork the computation to have it contained in its own process
Is forking a new process on each call to a recursive function practical?
> Sending a deeply nested JSON object as part of a request to some API should not crash the server that handles the request.
But the API should have limits. For example, respond 413 Payload Too Large if the request is beyond any normal size for that API. Document the limits to API users. You now have an upper bound of how much nesting can be in the user input.
I agree. I use the word dangerous to mean there are risks that need to be considered, not that recursion should never be used under any circumstances.
In the general case though, recursion can be tricky to think through, the stack is small, and malicious inputs can be very creative.
There's still not much reason to recurse using the program's own call stack rather than having your own stack structure that can live in dynamic memory and be handled in a context-appropriate way (e.g. returning an error at some depth limit rather than being killed by the OS).
"Quickly kill the process" is still a denial of service security problem.
Recursive parsing of CFGs is only better when they're LL grammars, but LR grammars (which are the most common grammar used in programming languages) are definitely better with an explicit stack due to the repeated way the state machine needs to run. You might be able to do a nicer LALR parser recursively but I personally haven't seen one.
Deeply nested instances of right-recursive rules will blow the stack. If the LARL parser has an unlimited stack due to dynamic allocation, that will perpetrate a DOS.
Table-driven LALR(1) with an explicit stack does not make the recursion issue go away. The tooling may provide built-in handling for it which translates excessive depth into a syntax error.
Recursive parsing using the native stack can take steps to protect itself, like by keeping track of the depth, and bailing upon hitting a limit.
Techniques involving obtaining the stack pointer (or close estimate thereof), and comparing it to a limit, are also possible.
I would argue that the title is misleading and overly alarmist here. This particular bug may have involved recursion and a stack overflow, but that's like saying "malloc kills" in the title of an article about a heap overflow bug. The existence of stack overflow bugs does not imply that recursion is bad any more than the existence of heap overflow bugs implies that malloc is bad. Recursion and malloc are tools that both have pretty well understood resource limitations, and one must take those limitations into account when employing those tools.
Did you see the article references [1][2] from 2006 and 2017 that already argue that recursion is a security problem? It's not new just not well-known.
[1] https://www.researchgate.net/publication/220477862_The_Power...
[2] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.tx...
You might be agreeing without realising it.
>> I would argue that the title is misleading and overly alarmist here. This particular bug may have involved recursion and a stack overflow, but that's like saying "malloc kills" in the title of an article about a heap overflow bug.
Let's see what the article[1] you cited says:
If you think recursion is a known security problem, do you also think using the heap is a known security problem?Arguably, Stack Clash is just a compiler bug--recursive code shouldn't be able to jump the guard pages. This was fixed in Clang in 2021 [1], in GCC even earlier, and in MSVC earlier than that.
[1]: https://blog.llvm.org/posts/2021-01-05-stack-clash-protectio...
Recursion per se isn't an issue; unbounded stack use is. If you either know your input size is bounded (e.g. it's not user-generated) or use tail-recursion (which should get compiled to a loop), it's fine.
If your algorithm does unbounded heap allocations instead, you're still going to get oomkilled. The actual vulnerability is not enforcing request resource limits. Things like xml bombs can then exacerbate this by expanding a highly compressed request (so a small amount of attacker work can generate a large amount of receiver work).
Exactly. The article would have been much more informative if it had detailed why the usual approaches to limiting resource usage wouldn't work to prevent DoS here.
The problem, in practice, is the limit for malloc on most systems is a few GB, while the default stack size on windows is 1MB, a stupidly small size.
I love recursion, so I will spawn a thread to do it in with a decent sized stack, but it’s very easy to break if you use defaults, and the defaults are configured differently in every OS.
Using recursive techniques to parse potentially hostile inputs kills.
Parsing anything from a potential adversary needs to account for failure. Unbounded recursion is secure (ie fails safely) if the compiler is working properly.
As to DoS, without looking at the code I'm unclear why various approaches to bounding resource consumption wouldn't have worked. I assume something specific to this library and how it is used must have prevented the obvious approaches. Still, not an issue in the general case.
Guarding against unbounded recursion requires both compiler support and runtime environment support: you have to use enough resources to handle legitimate queries, but small enough memory constraints that a "query of death" doesn't kill nodes that are expensive to reactivate. Even then, by their very nature queries-of-death are usually hard to detect and a much simpler solution is something you can do in the static space, such as put an arbitrary hard-bound on recursion depth far below your resource constraints so you can fail the query without losing the whole processing node.
Google protobuffers have buried deep within at least their C++ parser an arbitrary hard limit for nesting depth (I think it may be 32). It's annoying when you hit it, but it's there for a reason.
> Guarding against unbounded recursion requires both compiler support and runtime environment support
I feel like this is similar in spirit to saying "guarding against infinite loops requires both ...".
Where resource consumption is concerned, as you pointed out you can track that manually. Presumably you have to do that anyway, since the iterative case will also need to give up and fail the task at some point.
I really don't see where recursion itself introduces an issue here. I guess if you expect to pass through a great many nodes that don't otherwise trigger resource allocation except for the stack frame, and the compiler can't optimize the activation records away, it could be problematic. That's pretty specific though. Is that really the case for a typical parser?
It was for the proto buffer c++ parser; couldn't say for typical.
Nobody should use recursion in production code, period.
And no, it's not like malloc. If you don't understand why then you definitely shouldn't be putting recursive calls in your codebase.
Please leave recursion to math and keep it out of (in particular C) software: it kills and will kill again.
Is the whole (tail-recursion optimized) Lisp language family subject to DOS? Just check that you terminate at some point, right? "Recursion kills" is just too broad and not helpful.
The whole point of this CVE is what do you do when the input you're parsing is so large that you run out of space before you can terminate. Tail recursion helps in the sense that the issue will happen later, but it isn't a fix when we're talking about arbitrary data like an XML parser.
No, this CVE is explicitly about recursive calls overflowing the stack, not running out of memory on large inputs.
The point of tail recursion is that it can be converted into a loop instead of actually recursing.
Exactly. A stack overflow caused by recursion more likely converts to an endless loop. Nothing saves a bad code design.
I did not mention memory once anywhere in my comment?
My apologies, I interpreted "running out of space" as meaning running out of memory, rather than meaning overflowing the stack.
The tail recursion part isn't right though. If it isn't optimised into a loop, then tail recursion makes no difference at all, you will overflow the stack just as rapidly, not later. If it is optimised into a loop, then you will not overflow the stack at all.
Someone who doesn't believe that stack is "space" will not believe that it's "memory" either. :)
Tail call optimization uses no stack at all
I think "space" (i.e. in "run out of space before you can terminate") is most naturally interpreted as referring to memory.
Stack is memory!
An algorithm that uses an amount of stack proportional to the input N is said to require O(N) space. If it is rewritten to explicitly allocate the storage instead of using stack, it still uses O(N) space.
Not necessarily. If you are computing an aggregation for instance, if your computation is recursive and not tail call optimized, it may overflow the stack but the fixed version will not use additional memory for each iteration.
Otherwise, indeed stack is memory, but the memory in general is usually way less limited than the stack and also running out of memory doesn't have the same security implications as overflowing an unprotected stack.
And, unless you manage to encode everything in the stack, your iterative version will probably take the same amount of memory as your non-optimized recursive version, minus the space used for the stack.
True, I'd just tend to call exceeding the stack limit a stack overflow, rather than the more generic running out of space.
Correct.
Reasonable compilers translate tail-recursive functions into loops/jumps, so the stack does not grow. Tail recursion is easier to do in a garbage collected functional language though (e.g. if you need to use continuation passing).
Even in C, the recursive solution is usually simpler, so it makes sense to use it for unit tests to validate the more manual solution.
The point of termination is beyond stack overflow here, that's the problem. And unlike heap, stack does not tell you gently that it's running out.
That really depends. Segmented stack makes it equivalent to heap. Heap might or might not fail gracefully. If the OS permits overcommit and you've mapped a large region, then an arbitrary process on the machine could trigger the OOM when writing to a supposedly allocated (but not previously written) piece of memory.
Presumably you configure resource limits in production but that just means the "correct" process gets unexpectedly killed instead of an arbitrary one.
Code handling arbitrary input needs to carefully limit resource usage. There's no avoiding it.
I think the hate on recursion is too strong. For one thing, some languages have tail call optimization, which can turn recursion into a loop without using up the stack. For another, recursion can be bounded, so that if a malicious document tries to use a lot of recursion, it just results in an error reporting the recursion was too deep.
> Please leave recursion to math and keep it out of (in particular C) software: it kills and will kill again.
Consider my jimmies rustled
I don’t understand the obsession of people with recursion. Sure it’s a cute math trick, it makes you feel smart, but it is an undebuggable mess with a ton of gotchas that lead to stack overflows.
Let the math to the math guys who never ship a product.
I use recursion a fair bit just because it's the easiest solution that requires the least thought. If I have a tree structure from e.g a JSON parser, or a directory tree, or an HTML node tree, what more debuggable options are there, realistically?
Re-writing recursive algorithms to be non-recursive typically requires less obvious code, where you make your own ad-hoc stack to keep track of where you are; essentially implementing the call stack manually.
In contexts where DoS or stack smashing is a concern because the input is attacker-controlled, it's often way easier to add a depth parameter and error at a certain depth than to rewrite the naturally recursive algorithm into iterative style. But tree structures are so naturally recursive that it's easy to end up with accidental unbounded recursion in complex real-world situations.
Keeping track of where you are is THE feature.
You can manage memory and compute budget so that your cute algo does not go rogue. On top of that very often you don’t have to explore the entire tree, and there is custom logic at each step to decide whether you want to proceed or not with exploration.
Programmers like recursion because some algorithms are much, much more pleasant to write this way. Others are easier to write iteratively. Both are easy to do wrong.
Example: depth-first tree walking algorithms. Implicit stack makes it trivial to express as recursion.
It is not smart, or special, or something.
If your data structures are recursive, it makes sense your algorithms are too. It makes the code tidy and simpler to reason about. Plenty of times your code becomes "obviously correct".
I write a genealogy app (family history). Recursion is the foundation of the code, and permeates everywhere. The call stack can go 200 deep or more (~4,000 yrs). The code has been successfully running on millions of desktops for thirty years, and has never caused a crash or infinite loop because of recursion.
The secret is to check at every step that there is no "he's his own grandpa" loops in the user's tree, where (s)he inadvertently makes one of a person's descendants, also his ancestor. This happens sometimes because re-using the same names can cause confusion.
Recursion is like magic :o)
Recursions are super useful for dealing with certain data types, notably nested grammar parsing. Sure, it has gotcha, but that can be extremely readable.
I don't think we should ban recursions altogether, but remember that there exist associated risks, and consider them.
The alternative is just recursion with more steps