Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: manage off-heap memory allocations #70224

Open
ti-mo opened this issue Nov 6, 2024 · 12 comments
Open

proposal: runtime: manage off-heap memory allocations #70224

ti-mo opened this issue Nov 6, 2024 · 12 comments
Labels
Milestone

Comments

@ti-mo
Copy link

ti-mo commented Nov 6, 2024

Proposal Details

Hi folks, I'd like to explore the possibility for the runtime to 'adopt' externally-allocated memory by tracking pointers to the span and unmapping the underlying memory if there are no more references. This opens up many interesting use cases for folks that need to interact with C or other FFI memory, or maybe even for writing custom allocators.

API

// AddForeignCleanup attaches a cleanup function to a non-Go memory region described
// by ptr and size. It behaves similarly to [AddCleanup], but must be used on memory that
// was not created by Go.
//
// AddForeignCleanup may panic if the given memory region overlaps, but this is not
// guaranteed.
func AddForeignCleanup(ptr unsafe.Pointer, size int, cleanup func(ptr uintptr, size int))

To be used as:

ptr, _ := unix.MmapPtr(fd, 0, nil, size, unix.PROT_READ|unix.PROT_WRITE, unix.MAP_SHARED)

runtime.AddForeignCleanup(ptr, size, func(p uintptr, s int) {
_ = unix.MunmapPtr(p, s)
})

Implementation

Incorporating early feedback on this proposal, it would be ideal if this would support memory ranges of arbitrary sizes. As far as I can tell, the runtime doesn't have a facility for representing ranges smaller than a page, so @mknyszek suggested that a new data structure may need to be introduced.

The Linux kernel 'recently' switched its memory management over to the Maple Tree, a B-tree variant optimized for storing non-overlapping ranges, which happens to be exactly what we need as well. The kernel implementation is completely general-purpose and very complex, but we might be able to get away with a subset of it.

If the runtime has active manual memory mappings, the garbage collector would first try to find a pointer's mspan, falling through to querying the Maple Tree if that turns up nothing. During a cycle, Maple Tree nodes would be considered heap allocations, scheduling a cleanup if all references are gone.


I originally got this idea from https://pkg.go.dev/github.com/Jille/gcmmap, a package that mmaps over the Go heap using MAP_FIXED. It uses runtime.mallocgc() but allocating a byte slice works just as well. This approach works, but makes several hard assumptions:

  • there's no moving garbage collector (although we have runtime.heapObjectsCanMove now)
  • the heap is always mapped using PROT_READ|PROT_WRITE and MAP_ANON|MAP_PRIVATE
  • there are no other protections like mseal()

Please let me know what you think. Thank you!

@ti-mo ti-mo added the Proposal label Nov 6, 2024
@gopherbot gopherbot added this to the Proposal milestone Nov 6, 2024
@randall77
Copy link
Contributor

I'm not entirely sure why this wouldn't work for you:

type int32holder struct {
    p *atomic.Int32
}
func (h *int32holder) Load() int32 {
    v := h.p.Load()
    runtime.KeepAlive(h)
    return v
}
// Same for store
var variable1 *int32holder
func init() {
    p := (*atomic.Int32)(mmap(...))
    h := &int32holder{p}
    runtime.SetFinalizer(h, func(h *int32holder) { munmap(h.p) })
    variable1 = h
    // or pass h around wherever
}

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Nov 6, 2024
@ianlancetaylor
Copy link
Member

CC @golang/runtime

@mknyszek
Copy link
Contributor

mknyszek commented Nov 6, 2024

FWIW this has come up between a few of us before (CC @dr2chase) and there are some benefits to direct support vs. @randall77's approach (which would also work). Mainly, it just smooths over some friction with interacting with non-Go managed memory, including C values, and makes them a little less error-prone to work with.

Aside from that, I don't like the name TrackPointers very much because it might make the caller think it tracks pointers inside the provided memory, which it very much does not. You absolutely cannot write Go pointers into that memory. What about runtime.AddForeignCleanup ("foreign" in the FFI sense), which matches the new proposal-accepted runtime.AddCleanup?

Lastly, I wonder if this is something that should go in the unsafe package? I am concerned about giving any user a false sense of security about the memory passed to this function.

@prattmic
Copy link
Member

prattmic commented Nov 6, 2024

Leaving aside the question of whether this should go in unsafe for sense-of-security reasons, is there some reason that this couldn't just be part of AddCleanup itself? i.e., AddCleanup would allow passing non-Go pointers?

@mknyszek
Copy link
Contributor

mknyszek commented Nov 6, 2024

@prattmic It almost could, but AddCleanup isn't sufficiently general enough. To pass a "size" parameter you'd need to model your value in the type system, otherwise the runtime has no idea what the actual span of memory it needs to cover is. (Slices also won't be supported to begin with, and it seems a little weird to use the slice bounds as the cleanup bounds.)

@mejedi
Copy link

mejedi commented Nov 7, 2024

tracking pointers to the [mmapped memory] and unmapping the underlying memory if there are no more references

I believe that proposed feature is valuable beyond ebpf use cases as well.

The challenge with Mmap/MmapPtr is that it is inherently unsafe. A dangling pointer left behind faults on access if we are lucky. If the address space was reused in meantime, we may end up corrupting something else.

It is true that Golang has unsafe anyway and there are many ways to sabotage oneself. Unsafe requires extreme caution though. It is easier to use correctly if the context we have to reason about is small. That's why packages normally offer safe interfaces, even if leveraging unsafe internally.

What if a package needs to expose data in mmap-ed memory? Making copies is rarely an option (overhead). Wrappers do not interface well with existing packages. If it was possible to expose *T or []T backed by mmap safely, it would be a tremendous improvement.

(For a concrete example, consider memory-mapped AF_PACKET and AF_XDP sockets in Linux. It is essentially a ring buffer shared between user space and the kernel. If we build a reusable package for interfacing with this ring buffer, we aren't going to handle queued packets internally, but rather expose them to the package user.)

@ti-mo
Copy link
Author

ti-mo commented Nov 7, 2024

@randall77 Thanks for the suggestion, but as you'd expect I glossed over a lot of the details that led us here. We've prototyped nearly every API we could come up with, but we keep running into potential memory safety issues. Check the two PRs I linked from if you're interested.

My first attempt looked a little like what you proposed, but since this is library code, it's hard to build something flexible enough to be useful to the majority of users. The underlying types represented by this memory can be any C type, including structs, including ones with fields accessed atomically, etc. Initially we had a bunch of functions returning atomic.(U)int* types embedded into some unexported types (to prevent the caller coaxing out the real pointers), but this got messy fast and still had the potential for use-after-free.

Also, a mapping represents a datasec, so potentially contains n amount of variables. We could implement a refcount mechanism on the *Memory managed by finalizers, but then the caller must be very careful not to copy any of the pointers, which is equally dangerous. In the end, we concluded that the only way to make this both safe and ergonomic and fast was to rely on the GC to free the underlying mapping, so we ended up with something like this:

// Memory is just a []byte and a bool as a readonly flag.
func MemoryPointer[T any](mm *Memory, off uint64) (*T, error)

// A Variable is carved out of a Memory at a given offset.
func VariablePointer[T any](v *Variable) (*T, error)

Another thing I tried was returning a **T so I could set a finalizer holding a *Memory reference, but I mean...

u32, _ := VariablePointer[uint32](v)
(**u32) += 1

// or

a64, _ := VariablePointer[atomic.Uint64](v)
(*a64).Add(1)

This technically works, but it's clunky and requires more unnecessary pointer chasing. If we ever end up in a situation where we can return *T instead of **T, we'd need to break the API, so I'd rather hold off.


@mknyszek Thanks for the input!

Aside from that, I don't like the name TrackPointers very much [...] What about runtime.AddForeignCleanup ("foreign" in the FFI sense), which matches the new proposal-accepted runtime.AddCleanup?

This is still a rough proposal, naming should probably be decided by someone more knowledgeable about the runtime/allocator/GC. 😉 runtime.AddForeignCleanup() sounds good, but it doesn't accurately describe 'what' that 'Foreign' is supposed to represent. (memory, not Go objects) runtime.AddForeignMemoryCleanup()? That's getting long. runtime.AddPointerCleanup()? Let's take some time to mull this over. 🙂

because it might make the caller think it tracks pointers inside the provided memory, which it very much does not. You absolutely cannot write Go pointers into that memory.

Actually... 😉 Linux' bpf uapi is now frozen, which means no new map types will be introduced solely for bringing new data types to bpf. Going forward, new shared user/kernel datatypes will need to be implemented on top of a so-called 'bpf arena', which is a 4GiB range of memory that can be mapped into user space, exactly like the array map I demonstrated above. These contain pages allocated dynamically by bpf programs. Either side can write pointers into this arena, and as long as they point to somewhere inside the arena itself, the kernel will translate the pointers between kernel and user address space. (Note: it knows which values are pointers, but that's another story) Any pointers pointing outside the arena cause an exception when dereferenced from within a bpf program.

Just to add some nuance to your statement, 'cannot' should really be 'should not'. The user indeed shouldn't expect the GC to follow an off-heap pointer into the Go heap when scanning for references, so when designing these data types, care needs to be taken not to take a caller-provided Go pointer and stuff it somewhere off-heap. This property is definitely something we should document if/when working on implementing this proposal. As I understand it, the Go Arenas concept is stalled for similar reasons, though in reverse. (Go pointers into an arena can become dangling when the arena is freed)

Lastly, I wonder if this is something that should go in the unsafe package? I am concerned about giving any user a false sense of security about the memory passed to this function.

Makes sense, but nothing in the unsafe package currently modifies runtime state, so I'm not sure it fits. I'd argue there's nothing inherently unsafe about this API, since it doesn't need to dereference the given pointer and doesn't necessarily sidestep Go's type system. If anything, it aims to enhance memory safety.

@mejedi floated making this an implicit part of unix.Mmap, which made me think of a potentially-unintended interaction. unix.Mmap maintains a strong (unsafe.Pointer) reference to the underlying memory in some sort of internal cache. This isn't really documented and requires some spelunking through sys.mmapper. It would act as a permanent reference if used as follows, causing the finalizer to never run:

b, _ := unix.Mmap(...)
runtime.TrackPointers(unsafe.SliceData(b), len(b), func (ptr unsafe.Pointer) {
  unix.Munmap(unsafe.Slice(ptr, len(b)))
})

Using unix.MmapPtr instead of unix.Mmap would be necessary to make this work. Integrating this as part of unix.Mmap would make sense, but would technically be a breaking change. Although one could argue that if you lose the reference to b, you clearly have a memory leak. (barring bugs in uintptr() conversions, though this is a well-known footgun of unsafe)

@mknyszek
Copy link
Contributor

mknyszek commented Nov 7, 2024

This is still a rough proposal, naming should probably be decided by someone more knowledgeable about the runtime/allocator/GC. 😉 runtime.AddForeignCleanup() sounds good, but it doesn't accurately describe 'what' that 'Foreign' is supposed to represent. (memory, not Go objects) runtime.AddForeignMemoryCleanup()? That's getting long. runtime.AddPointerCleanup()? Let's take some time to mull this over. 🙂

AddForeignMemoryCleanup is not bad. I think it's OK for it to be long. While useful, this is a fairly niche concept that I suspect most Go developers will not need to interact with, so being clear in the API seems more important than being concise here.

Actually... 😉 Linux' bpf uapi is now frozen, which means no new map types will be introduced solely for bringing new data types to bpf. Going forward, new shared user/kernel datatypes will need to be implemented on top of a so-called 'bpf arena', which is a 4GiB range of memory that can be mapped into user space, exactly like the array map I demonstrated above. These contain pages allocated dynamically by bpf programs. Either side can write pointers into this arena, and as long as they point to somewhere inside the arena itself, the kernel will translate the pointers between kernel and user address space. (Note: it knows which values are pointers, but that's another story) Any pointers pointing outside the arena cause an exception when dereferenced from within a bpf program.

I mean that's fine. That's just writing a non-Go pointer into that memory. Maybe I misunderstand your point.

Just to add some nuance to your statement, 'cannot' should really be 'should not'.

Er, sure. You can pin the memory. This is closely related to the cgo pointer rules, so what I really meant to say was you absolutely cannot write unpinned Go pointers into C memory.

Lastly, I wonder if this is something that should go in the unsafe package? I am concerned about giving any user a false sense of security about the memory passed to this function.

Makes sense, but nothing in the unsafe package currently modifies runtime state, so I'm not sure it fits. I'd argue there's nothing inherently unsafe about this API, since it doesn't need to dereference the given pointer and doesn't necessarily sidestep Go's type system. If anything, it aims to enhance memory safety.

That is a fair point; it's just an idea. Mainly what I'm trying to get across is that there should be alarm bells ringing inside anyone's head when they see this function being used that something subtle is happening. The runtime APIs all tend to be more memory safe than this. One option is to put it in runtime/cgo even though it doesn't strictly require cgo, it does certainly help. (And I think it fits OK alongside something like runtime/cgo.Handle too.)

@mejedi floated making this an implicit part of unix.Mmap, which made me think of a potentially-unintended interaction.

Yeah, that would be a more specific subset of this functionality. I'm pretty sure this has also come up before a bunch of times, though I'm not certain there's an issue open for it (wouldn't be surprised if there is).

Thing is, if we have mechanism in the runtime for unix.Mmap, it's really trivial to extend that to the API proposed here. EDIT: This is actually a substantially easier problem, because the minimum size is a 4 KiB aligned region. This is much easier to represent in the runtime's existing data structures.

One thing not discussed in this issue yet is the actual implementation of this functionality. That part is easy if we place restrictions on the size and shape of these regions (must be at least one physical page in size). If we want to support arbitrarily-sized things, like anything that comes from malloc, that becomes substantially harder, because we need full on byte-level tracking (although this would certainly be the right thing to do).

@apparentlymart
Copy link

For whatever this one opinion is worth, the fact that the first argument is an unsafe.Pointer already hinted to me that this was a pretty niche and subtle function, regardless of what package the function itself is placed in.

@ti-mo ti-mo changed the title proposal: runtime: manage off-heap memory lifecycle using the garbage collector proposal: runtime: manage off-heap memory allocations Feb 27, 2025
@ti-mo
Copy link
Author

ti-mo commented Feb 27, 2025

@mknyszek I've updated the original proposal and cut out all of the cruft. I've added a few paragraphs on high-level implementation based on what we've discussed offline. I landed on AddManualMemoryCleanup after reading parts of the runtime package. Not sure if 'manual memory' is an overloaded term since comments refer to manual memory when describing stack memory as well, so this is still up for discussion.

How can we get this proposal to be formally accepted?

@mknyszek
Copy link
Contributor

mknyszek commented Feb 27, 2025

A few notes on the API.

  • Minor nit on the name. I think "manual memory" isn't quite right and I don't think we use that anywhere. I've always been partial to "foreign memory" for this concept, but perhaps someone else has a better idea. I do not have a strong opinion on the name.
  • I'd like to propose a few amendments to the API.
    • First, I think checking if the region overlaps with any memory owned by the Go runtime is probably too prohibitive for the implementation. We currently have no tracking for all memory owned by the runtime other than a raw count, and I think actually tracking all of it would be complicated and expensive. This is already an unsafe API, so I think we should make it best-effort. If it's obviously goroutine stack or heap memory, sure. If it's some random sysAlloc in the runtime, I think we should just leave it alone.
    • Second, I think we should pass the pointer and size to the cleanup, so the cleanup doesn't need a heap-allocated closure value.
    • Third, and this is subtle, I think we need to pass ptr as a uintptr to the cleanup. There's a danger here of "reviving" a "dead" the foreign region (from the GC's perspective) if it's passed as an unsafe.Pointer, since that makes it visible to the GC again.

Below is a modified API with these updates.

// AddForeignCleanup attaches a cleanup function to a non-Go memory region described
// by ptr and size. It behaves similarly to [AddCleanup], but must be used on memory that
// was not created by Go.
//
// AddForeignCleanup may panic if the given memory region overlaps, but this is not
// guaranteed.
func AddForeignCleanup(ptr unsafe.Pointer, size int, cleanup func(ptr uintptr, size int))

The Linux kernel 'recently' switched its memory management over to the Maple Tree, a B-tree variant optimized for storing non-overlapping ranges, which happens to be exactly what we need as well. The kernel implementation is completely general-purpose and very complex, but we might be able to get away with a subset of it.

A single global tree is not a bad idea, but it's not clear to me that the Maple Tree would be the right data structure for us. There'd be frequent random access from the GC (so the fact that prev and succ are cache-efficient is somewhat pointless for us) and I'm a bit concerned about writers having to acquire a lock. A lock for the whole tree? Or just a subset? I'm concerned about adding a new scalability issue if it's the former.

For this sort of thing we tend to prefer a radix-style index over the heap, which is how things are currently structured. Could we do both and reuse the heapArena structure instead, and just put an optional concurrent tree in there directly instead? Heap arenas can't overlap with manually managed memory regions because we enforce that a heap arena corresponds to a single contiguous mapping, and that the mapping is aligned. If there's some other memory mapped there, there's no way the heap can get mapped there (unless it gets unmapped, but at that point it'll be cleaned up, from the runtime's perspective). Each heap arena's tree could be much smaller, and we would get an O(1) lookup to narrow down to a 64 MiB region of address space. This design would also allow us to short-circuit manually-managed pointers in the GC when we look up an arena and see there isn't one there at all.

(We have an advantage over Linux by having a GC, in that reclaiming memory from a manually managed tree structure in this case is much easier for us than in the Linux world. We can reap dead nodes after the sweep phase ends, since until the next sweep phase, no more nodes can die. No need for a bespoke RCU system.)

@mknyszek
Copy link
Contributor

How can we get this proposal to be formally accepted?

The next steps are to get feedback and iterate on the proposal. Once a consensus starts to form, it would be helpful to have a prototype and learn how well it works and see if we can find some unanticipated issues with the design or API.

Thanks for your efforts here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

8 participants