Source file src/runtime/mgc.go
1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 // Garbage collector (GC). 6 // 7 // The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple 8 // GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is 9 // non-generational and non-compacting. Allocation is done using size segregated per P allocation 10 // areas to minimize fragmentation while eliminating locks in the common case. 11 // 12 // The algorithm decomposes into several steps. 13 // This is a high level description of the algorithm being used. For an overview of GC a good 14 // place to start is Richard Jones' gchandbook.org. 15 // 16 // The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see 17 // Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978. 18 // On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978), 19 // 966-975. 20 // For journal quality proofs that these steps are complete, correct, and terminate see 21 // Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world. 22 // Concurrency and Computation: Practice and Experience 15(3-5), 2003. 23 // 24 // 1. GC performs sweep termination. 25 // 26 // a. Stop the world. This causes all Ps to reach a GC safe-point. 27 // 28 // b. Sweep any unswept spans. There will only be unswept spans if 29 // this GC cycle was forced before the expected time. 30 // 31 // 2. GC performs the mark phase. 32 // 33 // a. Prepare for the mark phase by setting gcphase to _GCmark 34 // (from _GCoff), enabling the write barrier, enabling mutator 35 // assists, and enqueueing root mark jobs. No objects may be 36 // scanned until all Ps have enabled the write barrier, which is 37 // accomplished using STW. 38 // 39 // b. Start the world. From this point, GC work is done by mark 40 // workers started by the scheduler and by assists performed as 41 // part of allocation. The write barrier shades both the 42 // overwritten pointer and the new pointer value for any pointer 43 // writes (see mbarrier.go for details). Newly allocated objects 44 // are immediately marked black. 45 // 46 // c. GC performs root marking jobs. This includes scanning all 47 // stacks, shading all globals, and shading any heap pointers in 48 // off-heap runtime data structures. Scanning a stack stops a 49 // goroutine, shades any pointers found on its stack, and then 50 // resumes the goroutine. 51 // 52 // d. GC drains the work queue of grey objects, scanning each grey 53 // object to black and shading all pointers found in the object 54 // (which in turn may add those pointers to the work queue). 55 // 56 // e. Because GC work is spread across local caches, GC uses a 57 // distributed termination algorithm to detect when there are no 58 // more root marking jobs or grey objects (see gcMarkDone). At this 59 // point, GC transitions to mark termination. 60 // 61 // 3. GC performs mark termination. 62 // 63 // a. Stop the world. 64 // 65 // b. Set gcphase to _GCmarktermination, and disable workers and 66 // assists. 67 // 68 // c. Perform housekeeping like flushing mcaches. 69 // 70 // 4. GC performs the sweep phase. 71 // 72 // a. Prepare for the sweep phase by setting gcphase to _GCoff, 73 // setting up sweep state and disabling the write barrier. 74 // 75 // b. Start the world. From this point on, newly allocated objects 76 // are white, and allocating sweeps spans before use if necessary. 77 // 78 // c. GC does concurrent sweeping in the background and in response 79 // to allocation. See description below. 80 // 81 // 5. When sufficient allocation has taken place, replay the sequence 82 // starting with 1 above. See discussion of GC rate below. 83 84 // Concurrent sweep. 85 // 86 // The sweep phase proceeds concurrently with normal program execution. 87 // The heap is swept span-by-span both lazily (when a goroutine needs another span) 88 // and concurrently in a background goroutine (this helps programs that are not CPU bound). 89 // At the end of STW mark termination all spans are marked as "needs sweeping". 90 // 91 // The background sweeper goroutine simply sweeps spans one-by-one. 92 // 93 // To avoid requesting more OS memory while there are unswept spans, when a 94 // goroutine needs another span, it first attempts to reclaim that much memory 95 // by sweeping. When a goroutine needs to allocate a new small-object span, it 96 // sweeps small-object spans for the same object size until it frees at least 97 // one object. When a goroutine needs to allocate large-object span from heap, 98 // it sweeps spans until it frees at least that many pages into heap. There is 99 // one case where this may not suffice: if a goroutine sweeps and frees two 100 // nonadjacent one-page spans to the heap, it will allocate a new two-page 101 // span, but there can still be other one-page unswept spans which could be 102 // combined into a two-page span. 103 // 104 // It's critical to ensure that no operations proceed on unswept spans (that would corrupt 105 // mark bits in GC bitmap). During GC all mcaches are flushed into the central cache, 106 // so they are empty. When a goroutine grabs a new span into mcache, it sweeps it. 107 // When a goroutine explicitly frees an object or sets a finalizer, it ensures that 108 // the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish). 109 // The finalizer goroutine is kicked off only when all spans are swept. 110 // When the next GC starts, it sweeps all not-yet-swept spans (if any). 111 112 // GC rate. 113 // Next GC is after we've allocated an extra amount of memory proportional to 114 // the amount already in use. The proportion is controlled by GOGC environment variable 115 // (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M 116 // (this mark is computed by the gcController.heapGoal method). This keeps the GC cost in 117 // linear proportion to the allocation cost. Adjusting GOGC just changes the linear constant 118 // (and also the amount of extra memory used). 119 120 // Oblets 121 // 122 // In order to prevent long pauses while scanning large objects and to 123 // improve parallelism, the garbage collector breaks up scan jobs for 124 // objects larger than maxObletBytes into "oblets" of at most 125 // maxObletBytes. When scanning encounters the beginning of a large 126 // object, it scans only the first oblet and enqueues the remaining 127 // oblets as new scan jobs. 128 129 package runtime 130 131 import ( 132 "internal/cpu" 133 "runtime/internal/atomic" 134 "unsafe" 135 ) 136 137 const ( 138 _DebugGC = 0 139 _ConcurrentSweep = true 140 _FinBlockSize = 4 * 1024 141 142 // debugScanConservative enables debug logging for stack 143 // frames that are scanned conservatively. 144 debugScanConservative = false 145 146 // sweepMinHeapDistance is a lower bound on the heap distance 147 // (in bytes) reserved for concurrent sweeping between GC 148 // cycles. 149 sweepMinHeapDistance = 1024 * 1024 150 ) 151 152 // heapObjectsCanMove always returns false in the current garbage collector. 153 // It exists for go4.org/unsafe/assume-no-moving-gc, which is an 154 // unfortunate idea that had an even more unfortunate implementation. 155 // Every time a new Go release happened, the package stopped building, 156 // and the authors had to add a new file with a new //go:build line, and 157 // then the entire ecosystem of packages with that as a dependency had to 158 // explicitly update to the new version. Many packages depend on 159 // assume-no-moving-gc transitively, through paths like 160 // inet.af/netaddr -> go4.org/intern -> assume-no-moving-gc. 161 // This was causing a significant amount of friction around each new 162 // release, so we added this bool for the package to //go:linkname 163 // instead. The bool is still unfortunate, but it's not as bad as 164 // breaking the ecosystem on every new release. 165 // 166 // If the Go garbage collector ever does move heap objects, we can set 167 // this to true to break all the programs using assume-no-moving-gc. 168 // 169 //go:linkname heapObjectsCanMove 170 func heapObjectsCanMove() bool { 171 return false 172 } 173 174 func gcinit() { 175 if unsafe.Sizeof(workbuf{}) != _WorkbufSize { 176 throw("size of Workbuf is suboptimal") 177 } 178 // No sweep on the first cycle. 179 sweep.active.state.Store(sweepDrainedMask) 180 181 // Initialize GC pacer state. 182 // Use the environment variable GOGC for the initial gcPercent value. 183 // Use the environment variable GOMEMLIMIT for the initial memoryLimit value. 184 gcController.init(readGOGC(), readGOMEMLIMIT()) 185 186 work.startSema = 1 187 work.markDoneSema = 1 188 lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters) 189 lockInit(&work.assistQueue.lock, lockRankAssistQueue) 190 lockInit(&work.wbufSpans.lock, lockRankWbufSpans) 191 } 192 193 // gcenable is called after the bulk of the runtime initialization, 194 // just before we're about to start letting user code run. 195 // It kicks off the background sweeper goroutine, the background 196 // scavenger goroutine, and enables GC. 197 func gcenable() { 198 // Kick off sweeping and scavenging. 199 c := make(chan int, 2) 200 go bgsweep(c) 201 go bgscavenge(c) 202 <-c 203 <-c 204 memstats.enablegc = true // now that runtime is initialized, GC is okay 205 } 206 207 // Garbage collector phase. 208 // Indicates to write barrier and synchronization task to perform. 209 var gcphase uint32 210 211 // The compiler knows about this variable. 212 // If you change it, you must change builtin/runtime.go, too. 213 // If you change the first four bytes, you must also change the write 214 // barrier insertion code. 215 var writeBarrier struct { 216 enabled bool // compiler emits a check of this before calling write barrier 217 pad [3]byte // compiler uses 32-bit load for "enabled" field 218 needed bool // identical to enabled, for now (TODO: dedup) 219 alignme uint64 // guarantee alignment so that compiler can use a 32 or 64-bit load 220 } 221 222 // gcBlackenEnabled is 1 if mutator assists and background mark 223 // workers are allowed to blacken objects. This must only be set when 224 // gcphase == _GCmark. 225 var gcBlackenEnabled uint32 226 227 const ( 228 _GCoff = iota // GC not running; sweeping in background, write barrier disabled 229 _GCmark // GC marking roots and workbufs: allocate black, write barrier ENABLED 230 _GCmarktermination // GC mark termination: allocate black, P's help GC, write barrier ENABLED 231 ) 232 233 //go:nosplit 234 func setGCPhase(x uint32) { 235 atomic.Store(&gcphase, x) 236 writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination 237 writeBarrier.enabled = writeBarrier.needed 238 } 239 240 // gcMarkWorkerMode represents the mode that a concurrent mark worker 241 // should operate in. 242 // 243 // Concurrent marking happens through four different mechanisms. One 244 // is mutator assists, which happen in response to allocations and are 245 // not scheduled. The other three are variations in the per-P mark 246 // workers and are distinguished by gcMarkWorkerMode. 247 type gcMarkWorkerMode int 248 249 const ( 250 // gcMarkWorkerNotWorker indicates that the next scheduled G is not 251 // starting work and the mode should be ignored. 252 gcMarkWorkerNotWorker gcMarkWorkerMode = iota 253 254 // gcMarkWorkerDedicatedMode indicates that the P of a mark 255 // worker is dedicated to running that mark worker. The mark 256 // worker should run without preemption. 257 gcMarkWorkerDedicatedMode 258 259 // gcMarkWorkerFractionalMode indicates that a P is currently 260 // running the "fractional" mark worker. The fractional worker 261 // is necessary when GOMAXPROCS*gcBackgroundUtilization is not 262 // an integer and using only dedicated workers would result in 263 // utilization too far from the target of gcBackgroundUtilization. 264 // The fractional worker should run until it is preempted and 265 // will be scheduled to pick up the fractional part of 266 // GOMAXPROCS*gcBackgroundUtilization. 267 gcMarkWorkerFractionalMode 268 269 // gcMarkWorkerIdleMode indicates that a P is running the mark 270 // worker because it has nothing else to do. The idle worker 271 // should run until it is preempted and account its time 272 // against gcController.idleMarkTime. 273 gcMarkWorkerIdleMode 274 ) 275 276 // gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes 277 // to use in execution traces. 278 var gcMarkWorkerModeStrings = [...]string{ 279 "Not worker", 280 "GC (dedicated)", 281 "GC (fractional)", 282 "GC (idle)", 283 } 284 285 // pollFractionalWorkerExit reports whether a fractional mark worker 286 // should self-preempt. It assumes it is called from the fractional 287 // worker. 288 func pollFractionalWorkerExit() bool { 289 // This should be kept in sync with the fractional worker 290 // scheduler logic in findRunnableGCWorker. 291 now := nanotime() 292 delta := now - gcController.markStartTime 293 if delta <= 0 { 294 return true 295 } 296 p := getg().m.p.ptr() 297 selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime) 298 // Add some slack to the utilization goal so that the 299 // fractional worker isn't behind again the instant it exits. 300 return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal 301 } 302 303 var work workType 304 305 type workType struct { 306 full lfstack // lock-free list of full blocks workbuf 307 _ cpu.CacheLinePad // prevents false-sharing between full and empty 308 empty lfstack // lock-free list of empty blocks workbuf 309 _ cpu.CacheLinePad // prevents false-sharing between empty and nproc/nwait 310 311 wbufSpans struct { 312 lock mutex 313 // free is a list of spans dedicated to workbufs, but 314 // that don't currently contain any workbufs. 315 free mSpanList 316 // busy is a list of all spans containing workbufs on 317 // one of the workbuf lists. 318 busy mSpanList 319 } 320 321 // Restore 64-bit alignment on 32-bit. 322 _ uint32 323 324 // bytesMarked is the number of bytes marked this cycle. This 325 // includes bytes blackened in scanned objects, noscan objects 326 // that go straight to black, and permagrey objects scanned by 327 // markroot during the concurrent scan phase. This is updated 328 // atomically during the cycle. Updates may be batched 329 // arbitrarily, since the value is only read at the end of the 330 // cycle. 331 // 332 // Because of benign races during marking, this number may not 333 // be the exact number of marked bytes, but it should be very 334 // close. 335 // 336 // Put this field here because it needs 64-bit atomic access 337 // (and thus 8-byte alignment even on 32-bit architectures). 338 bytesMarked uint64 339 340 markrootNext uint32 // next markroot job 341 markrootJobs uint32 // number of markroot jobs 342 343 nproc uint32 344 tstart int64 345 nwait uint32 346 347 // Number of roots of various root types. Set by gcMarkRootPrepare. 348 // 349 // nStackRoots == len(stackRoots), but we have nStackRoots for 350 // consistency. 351 nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int 352 353 // Base indexes of each root type. Set by gcMarkRootPrepare. 354 baseData, baseBSS, baseSpans, baseStacks, baseEnd uint32 355 356 // stackRoots is a snapshot of all of the Gs that existed 357 // before the beginning of concurrent marking. The backing 358 // store of this must not be modified because it might be 359 // shared with allgs. 360 stackRoots []*g 361 362 // Each type of GC state transition is protected by a lock. 363 // Since multiple threads can simultaneously detect the state 364 // transition condition, any thread that detects a transition 365 // condition must acquire the appropriate transition lock, 366 // re-check the transition condition and return if it no 367 // longer holds or perform the transition if it does. 368 // Likewise, any transition must invalidate the transition 369 // condition before releasing the lock. This ensures that each 370 // transition is performed by exactly one thread and threads 371 // that need the transition to happen block until it has 372 // happened. 373 // 374 // startSema protects the transition from "off" to mark or 375 // mark termination. 376 startSema uint32 377 // markDoneSema protects transitions from mark to mark termination. 378 markDoneSema uint32 379 380 bgMarkReady note // signal background mark worker has started 381 bgMarkDone uint32 // cas to 1 when at a background mark completion point 382 // Background mark completion signaling 383 384 // mode is the concurrency mode of the current GC cycle. 385 mode gcMode 386 387 // userForced indicates the current GC cycle was forced by an 388 // explicit user call. 389 userForced bool 390 391 // initialHeapLive is the value of gcController.heapLive at the 392 // beginning of this GC cycle. 393 initialHeapLive uint64 394 395 // assistQueue is a queue of assists that are blocked because 396 // there was neither enough credit to steal or enough work to 397 // do. 398 assistQueue struct { 399 lock mutex 400 q gQueue 401 } 402 403 // sweepWaiters is a list of blocked goroutines to wake when 404 // we transition from mark termination to sweep. 405 sweepWaiters struct { 406 lock mutex 407 list gList 408 } 409 410 // cycles is the number of completed GC cycles, where a GC 411 // cycle is sweep termination, mark, mark termination, and 412 // sweep. This differs from memstats.numgc, which is 413 // incremented at mark termination. 414 cycles atomic.Uint32 415 416 // Timing/utilization stats for this cycle. 417 stwprocs, maxprocs int32 418 tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start 419 420 pauseNS int64 // total STW time this cycle 421 pauseStart int64 // nanotime() of last STW 422 423 // debug.gctrace heap sizes for this cycle. 424 heap0, heap1, heap2 uint64 425 426 // Cumulative estimated CPU usage. 427 cpuStats 428 } 429 430 // GC runs a garbage collection and blocks the caller until the 431 // garbage collection is complete. It may also block the entire 432 // program. 433 func GC() { 434 // We consider a cycle to be: sweep termination, mark, mark 435 // termination, and sweep. This function shouldn't return 436 // until a full cycle has been completed, from beginning to 437 // end. Hence, we always want to finish up the current cycle 438 // and start a new one. That means: 439 // 440 // 1. In sweep termination, mark, or mark termination of cycle 441 // N, wait until mark termination N completes and transitions 442 // to sweep N. 443 // 444 // 2. In sweep N, help with sweep N. 445 // 446 // At this point we can begin a full cycle N+1. 447 // 448 // 3. Trigger cycle N+1 by starting sweep termination N+1. 449 // 450 // 4. Wait for mark termination N+1 to complete. 451 // 452 // 5. Help with sweep N+1 until it's done. 453 // 454 // This all has to be written to deal with the fact that the 455 // GC may move ahead on its own. For example, when we block 456 // until mark termination N, we may wake up in cycle N+2. 457 458 // Wait until the current sweep termination, mark, and mark 459 // termination complete. 460 n := work.cycles.Load() 461 gcWaitOnMark(n) 462 463 // We're now in sweep N or later. Trigger GC cycle N+1, which 464 // will first finish sweep N if necessary and then enter sweep 465 // termination N+1. 466 gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1}) 467 468 // Wait for mark termination N+1 to complete. 469 gcWaitOnMark(n + 1) 470 471 // Finish sweep N+1 before returning. We do this both to 472 // complete the cycle and because runtime.GC() is often used 473 // as part of tests and benchmarks to get the system into a 474 // relatively stable and isolated state. 475 for work.cycles.Load() == n+1 && sweepone() != ^uintptr(0) { 476 sweep.nbgsweep++ 477 Gosched() 478 } 479 480 // Callers may assume that the heap profile reflects the 481 // just-completed cycle when this returns (historically this 482 // happened because this was a STW GC), but right now the 483 // profile still reflects mark termination N, not N+1. 484 // 485 // As soon as all of the sweep frees from cycle N+1 are done, 486 // we can go ahead and publish the heap profile. 487 // 488 // First, wait for sweeping to finish. (We know there are no 489 // more spans on the sweep queue, but we may be concurrently 490 // sweeping spans, so we have to wait.) 491 for work.cycles.Load() == n+1 && !isSweepDone() { 492 Gosched() 493 } 494 495 // Now we're really done with sweeping, so we can publish the 496 // stable heap profile. Only do this if we haven't already hit 497 // another mark termination. 498 mp := acquirem() 499 cycle := work.cycles.Load() 500 if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) { 501 mProf_PostSweep() 502 } 503 releasem(mp) 504 } 505 506 // gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has 507 // already completed this mark phase, it returns immediately. 508 func gcWaitOnMark(n uint32) { 509 for { 510 // Disable phase transitions. 511 lock(&work.sweepWaiters.lock) 512 nMarks := work.cycles.Load() 513 if gcphase != _GCmark { 514 // We've already completed this cycle's mark. 515 nMarks++ 516 } 517 if nMarks > n { 518 // We're done. 519 unlock(&work.sweepWaiters.lock) 520 return 521 } 522 523 // Wait until sweep termination, mark, and mark 524 // termination of cycle N complete. 525 work.sweepWaiters.list.push(getg()) 526 goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceBlockUntilGCEnds, 1) 527 } 528 } 529 530 // gcMode indicates how concurrent a GC cycle should be. 531 type gcMode int 532 533 const ( 534 gcBackgroundMode gcMode = iota // concurrent GC and sweep 535 gcForceMode // stop-the-world GC now, concurrent sweep 536 gcForceBlockMode // stop-the-world GC now and STW sweep (forced by user) 537 ) 538 539 // A gcTrigger is a predicate for starting a GC cycle. Specifically, 540 // it is an exit condition for the _GCoff phase. 541 type gcTrigger struct { 542 kind gcTriggerKind 543 now int64 // gcTriggerTime: current time 544 n uint32 // gcTriggerCycle: cycle number to start 545 } 546 547 type gcTriggerKind int 548 549 const ( 550 // gcTriggerHeap indicates that a cycle should be started when 551 // the heap size reaches the trigger heap size computed by the 552 // controller. 553 gcTriggerHeap gcTriggerKind = iota 554 555 // gcTriggerTime indicates that a cycle should be started when 556 // it's been more than forcegcperiod nanoseconds since the 557 // previous GC cycle. 558 gcTriggerTime 559 560 // gcTriggerCycle indicates that a cycle should be started if 561 // we have not yet started cycle number gcTrigger.n (relative 562 // to work.cycles). 563 gcTriggerCycle 564 ) 565 566 // test reports whether the trigger condition is satisfied, meaning 567 // that the exit condition for the _GCoff phase has been met. The exit 568 // condition should be tested when allocating. 569 func (t gcTrigger) test() bool { 570 if !memstats.enablegc || panicking.Load() != 0 || gcphase != _GCoff { 571 return false 572 } 573 switch t.kind { 574 case gcTriggerHeap: 575 // Non-atomic access to gcController.heapLive for performance. If 576 // we are going to trigger on this, this thread just 577 // atomically wrote gcController.heapLive anyway and we'll see our 578 // own write. 579 trigger, _ := gcController.trigger() 580 return gcController.heapLive.Load() >= trigger 581 case gcTriggerTime: 582 if gcController.gcPercent.Load() < 0 { 583 return false 584 } 585 lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime)) 586 return lastgc != 0 && t.now-lastgc > forcegcperiod 587 case gcTriggerCycle: 588 // t.n > work.cycles, but accounting for wraparound. 589 return int32(t.n-work.cycles.Load()) > 0 590 } 591 return true 592 } 593 594 // gcStart starts the GC. It transitions from _GCoff to _GCmark (if 595 // debug.gcstoptheworld == 0) or performs all of GC (if 596 // debug.gcstoptheworld != 0). 597 // 598 // This may return without performing this transition in some cases, 599 // such as when called on a system stack or with locks held. 600 func gcStart(trigger gcTrigger) { 601 // Since this is called from malloc and malloc is called in 602 // the guts of a number of libraries that might be holding 603 // locks, don't attempt to start GC in non-preemptible or 604 // potentially unstable situations. 605 mp := acquirem() 606 if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" { 607 releasem(mp) 608 return 609 } 610 releasem(mp) 611 mp = nil 612 613 // Pick up the remaining unswept/not being swept spans concurrently 614 // 615 // This shouldn't happen if we're being invoked in background 616 // mode since proportional sweep should have just finished 617 // sweeping everything, but rounding errors, etc, may leave a 618 // few spans unswept. In forced mode, this is necessary since 619 // GC can be forced at any point in the sweeping cycle. 620 // 621 // We check the transition condition continuously here in case 622 // this G gets delayed in to the next GC cycle. 623 for trigger.test() && sweepone() != ^uintptr(0) { 624 sweep.nbgsweep++ 625 } 626 627 // Perform GC initialization and the sweep termination 628 // transition. 629 semacquire(&work.startSema) 630 // Re-check transition condition under transition lock. 631 if !trigger.test() { 632 semrelease(&work.startSema) 633 return 634 } 635 636 // In gcstoptheworld debug mode, upgrade the mode accordingly. 637 // We do this after re-checking the transition condition so 638 // that multiple goroutines that detect the heap trigger don't 639 // start multiple STW GCs. 640 mode := gcBackgroundMode 641 if debug.gcstoptheworld == 1 { 642 mode = gcForceMode 643 } else if debug.gcstoptheworld == 2 { 644 mode = gcForceBlockMode 645 } 646 647 // Ok, we're doing it! Stop everybody else 648 semacquire(&gcsema) 649 semacquire(&worldsema) 650 651 // For stats, check if this GC was forced by the user. 652 // Update it under gcsema to avoid gctrace getting wrong values. 653 work.userForced = trigger.kind == gcTriggerCycle 654 655 if traceEnabled() { 656 traceGCStart() 657 } 658 659 // Check that all Ps have finished deferred mcache flushes. 660 for _, p := range allp { 661 if fg := p.mcache.flushGen.Load(); fg != mheap_.sweepgen { 662 println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen) 663 throw("p mcache not flushed") 664 } 665 } 666 667 gcBgMarkStartWorkers() 668 669 systemstack(gcResetMarkState) 670 671 work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs 672 if work.stwprocs > ncpu { 673 // This is used to compute CPU time of the STW phases, 674 // so it can't be more than ncpu, even if GOMAXPROCS is. 675 work.stwprocs = ncpu 676 } 677 work.heap0 = gcController.heapLive.Load() 678 work.pauseNS = 0 679 work.mode = mode 680 681 now := nanotime() 682 work.tSweepTerm = now 683 work.pauseStart = now 684 systemstack(func() { stopTheWorldWithSema(stwGCSweepTerm) }) 685 // Finish sweep before we start concurrent scan. 686 systemstack(func() { 687 finishsweep_m() 688 }) 689 690 // clearpools before we start the GC. If we wait they memory will not be 691 // reclaimed until the next GC cycle. 692 clearpools() 693 694 work.cycles.Add(1) 695 696 // Assists and workers can start the moment we start 697 // the world. 698 gcController.startCycle(now, int(gomaxprocs), trigger) 699 700 // Notify the CPU limiter that assists may begin. 701 gcCPULimiter.startGCTransition(true, now) 702 703 // In STW mode, disable scheduling of user Gs. This may also 704 // disable scheduling of this goroutine, so it may block as 705 // soon as we start the world again. 706 if mode != gcBackgroundMode { 707 schedEnableUser(false) 708 } 709 710 // Enter concurrent mark phase and enable 711 // write barriers. 712 // 713 // Because the world is stopped, all Ps will 714 // observe that write barriers are enabled by 715 // the time we start the world and begin 716 // scanning. 717 // 718 // Write barriers must be enabled before assists are 719 // enabled because they must be enabled before 720 // any non-leaf heap objects are marked. Since 721 // allocations are blocked until assists can 722 // happen, we want enable assists as early as 723 // possible. 724 setGCPhase(_GCmark) 725 726 gcBgMarkPrepare() // Must happen before assist enable. 727 gcMarkRootPrepare() 728 729 // Mark all active tinyalloc blocks. Since we're 730 // allocating from these, they need to be black like 731 // other allocations. The alternative is to blacken 732 // the tiny block on every allocation from it, which 733 // would slow down the tiny allocator. 734 gcMarkTinyAllocs() 735 736 // At this point all Ps have enabled the write 737 // barrier, thus maintaining the no white to 738 // black invariant. Enable mutator assists to 739 // put back-pressure on fast allocating 740 // mutators. 741 atomic.Store(&gcBlackenEnabled, 1) 742 743 // In STW mode, we could block the instant systemstack 744 // returns, so make sure we're not preemptible. 745 mp = acquirem() 746 747 // Concurrent mark. 748 systemstack(func() { 749 now = startTheWorldWithSema() 750 work.pauseNS += now - work.pauseStart 751 work.tMark = now 752 memstats.gcPauseDist.record(now - work.pauseStart) 753 754 sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm) 755 work.cpuStats.gcPauseTime += sweepTermCpu 756 work.cpuStats.gcTotalTime += sweepTermCpu 757 758 // Release the CPU limiter. 759 gcCPULimiter.finishGCTransition(now) 760 }) 761 762 // Release the world sema before Gosched() in STW mode 763 // because we will need to reacquire it later but before 764 // this goroutine becomes runnable again, and we could 765 // self-deadlock otherwise. 766 semrelease(&worldsema) 767 releasem(mp) 768 769 // Make sure we block instead of returning to user code 770 // in STW mode. 771 if mode != gcBackgroundMode { 772 Gosched() 773 } 774 775 semrelease(&work.startSema) 776 } 777 778 // gcMarkDoneFlushed counts the number of P's with flushed work. 779 // 780 // Ideally this would be a captured local in gcMarkDone, but forEachP 781 // escapes its callback closure, so it can't capture anything. 782 // 783 // This is protected by markDoneSema. 784 var gcMarkDoneFlushed uint32 785 786 // gcMarkDone transitions the GC from mark to mark termination if all 787 // reachable objects have been marked (that is, there are no grey 788 // objects and can be no more in the future). Otherwise, it flushes 789 // all local work to the global queues where it can be discovered by 790 // other workers. 791 // 792 // This should be called when all local mark work has been drained and 793 // there are no remaining workers. Specifically, when 794 // 795 // work.nwait == work.nproc && !gcMarkWorkAvailable(p) 796 // 797 // The calling context must be preemptible. 798 // 799 // Flushing local work is important because idle Ps may have local 800 // work queued. This is the only way to make that work visible and 801 // drive GC to completion. 802 // 803 // It is explicitly okay to have write barriers in this function. If 804 // it does transition to mark termination, then all reachable objects 805 // have been marked, so the write barrier cannot shade any more 806 // objects. 807 func gcMarkDone() { 808 // Ensure only one thread is running the ragged barrier at a 809 // time. 810 semacquire(&work.markDoneSema) 811 812 top: 813 // Re-check transition condition under transition lock. 814 // 815 // It's critical that this checks the global work queues are 816 // empty before performing the ragged barrier. Otherwise, 817 // there could be global work that a P could take after the P 818 // has passed the ragged barrier. 819 if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) { 820 semrelease(&work.markDoneSema) 821 return 822 } 823 824 // forEachP needs worldsema to execute, and we'll need it to 825 // stop the world later, so acquire worldsema now. 826 semacquire(&worldsema) 827 828 // Flush all local buffers and collect flushedWork flags. 829 gcMarkDoneFlushed = 0 830 systemstack(func() { 831 gp := getg().m.curg 832 // Mark the user stack as preemptible so that it may be scanned. 833 // Otherwise, our attempt to force all P's to a safepoint could 834 // result in a deadlock as we attempt to preempt a worker that's 835 // trying to preempt us (e.g. for a stack scan). 836 casGToWaiting(gp, _Grunning, waitReasonGCMarkTermination) 837 forEachP(func(pp *p) { 838 // Flush the write barrier buffer, since this may add 839 // work to the gcWork. 840 wbBufFlush1(pp) 841 842 // Flush the gcWork, since this may create global work 843 // and set the flushedWork flag. 844 // 845 // TODO(austin): Break up these workbufs to 846 // better distribute work. 847 pp.gcw.dispose() 848 // Collect the flushedWork flag. 849 if pp.gcw.flushedWork { 850 atomic.Xadd(&gcMarkDoneFlushed, 1) 851 pp.gcw.flushedWork = false 852 } 853 }) 854 casgstatus(gp, _Gwaiting, _Grunning) 855 }) 856 857 if gcMarkDoneFlushed != 0 { 858 // More grey objects were discovered since the 859 // previous termination check, so there may be more 860 // work to do. Keep going. It's possible the 861 // transition condition became true again during the 862 // ragged barrier, so re-check it. 863 semrelease(&worldsema) 864 goto top 865 } 866 867 // There was no global work, no local work, and no Ps 868 // communicated work since we took markDoneSema. Therefore 869 // there are no grey objects and no more objects can be 870 // shaded. Transition to mark termination. 871 now := nanotime() 872 work.tMarkTerm = now 873 work.pauseStart = now 874 getg().m.preemptoff = "gcing" 875 systemstack(func() { stopTheWorldWithSema(stwGCMarkTerm) }) 876 // The gcphase is _GCmark, it will transition to _GCmarktermination 877 // below. The important thing is that the wb remains active until 878 // all marking is complete. This includes writes made by the GC. 879 880 // There is sometimes work left over when we enter mark termination due 881 // to write barriers performed after the completion barrier above. 882 // Detect this and resume concurrent mark. This is obviously 883 // unfortunate. 884 // 885 // See issue #27993 for details. 886 // 887 // Switch to the system stack to call wbBufFlush1, though in this case 888 // it doesn't matter because we're non-preemptible anyway. 889 restart := false 890 systemstack(func() { 891 for _, p := range allp { 892 wbBufFlush1(p) 893 if !p.gcw.empty() { 894 restart = true 895 break 896 } 897 } 898 }) 899 if restart { 900 getg().m.preemptoff = "" 901 systemstack(func() { 902 now := startTheWorldWithSema() 903 work.pauseNS += now - work.pauseStart 904 memstats.gcPauseDist.record(now - work.pauseStart) 905 }) 906 semrelease(&worldsema) 907 goto top 908 } 909 910 gcComputeStartingStackSize() 911 912 // Disable assists and background workers. We must do 913 // this before waking blocked assists. 914 atomic.Store(&gcBlackenEnabled, 0) 915 916 // Notify the CPU limiter that GC assists will now cease. 917 gcCPULimiter.startGCTransition(false, now) 918 919 // Wake all blocked assists. These will run when we 920 // start the world again. 921 gcWakeAllAssists() 922 923 // Likewise, release the transition lock. Blocked 924 // workers and assists will run when we start the 925 // world again. 926 semrelease(&work.markDoneSema) 927 928 // In STW mode, re-enable user goroutines. These will be 929 // queued to run after we start the world. 930 schedEnableUser(true) 931 932 // endCycle depends on all gcWork cache stats being flushed. 933 // The termination algorithm above ensured that up to 934 // allocations since the ragged barrier. 935 gcController.endCycle(now, int(gomaxprocs), work.userForced) 936 937 // Perform mark termination. This will restart the world. 938 gcMarkTermination() 939 } 940 941 // World must be stopped and mark assists and background workers must be 942 // disabled. 943 func gcMarkTermination() { 944 // Start marktermination (write barrier remains enabled for now). 945 setGCPhase(_GCmarktermination) 946 947 work.heap1 = gcController.heapLive.Load() 948 startTime := nanotime() 949 950 mp := acquirem() 951 mp.preemptoff = "gcing" 952 mp.traceback = 2 953 curgp := mp.curg 954 casGToWaiting(curgp, _Grunning, waitReasonGarbageCollection) 955 956 // Run gc on the g0 stack. We do this so that the g stack 957 // we're currently running on will no longer change. Cuts 958 // the root set down a bit (g0 stacks are not scanned, and 959 // we don't need to scan gc's internal state). We also 960 // need to switch to g0 so we can shrink the stack. 961 systemstack(func() { 962 gcMark(startTime) 963 // Must return immediately. 964 // The outer function's stack may have moved 965 // during gcMark (it shrinks stacks, including the 966 // outer function's stack), so we must not refer 967 // to any of its variables. Return back to the 968 // non-system stack to pick up the new addresses 969 // before continuing. 970 }) 971 972 systemstack(func() { 973 work.heap2 = work.bytesMarked 974 if debug.gccheckmark > 0 { 975 // Run a full non-parallel, stop-the-world 976 // mark using checkmark bits, to check that we 977 // didn't forget to mark anything during the 978 // concurrent mark process. 979 startCheckmarks() 980 gcResetMarkState() 981 gcw := &getg().m.p.ptr().gcw 982 gcDrain(gcw, 0) 983 wbBufFlush1(getg().m.p.ptr()) 984 gcw.dispose() 985 endCheckmarks() 986 } 987 988 // marking is complete so we can turn the write barrier off 989 setGCPhase(_GCoff) 990 gcSweep(work.mode) 991 }) 992 993 mp.traceback = 0 994 casgstatus(curgp, _Gwaiting, _Grunning) 995 996 if traceEnabled() { 997 traceGCDone() 998 } 999 1000 // all done 1001 mp.preemptoff = "" 1002 1003 if gcphase != _GCoff { 1004 throw("gc done but gcphase != _GCoff") 1005 } 1006 1007 // Record heapInUse for scavenger. 1008 memstats.lastHeapInUse = gcController.heapInUse.load() 1009 1010 // Update GC trigger and pacing, as well as downstream consumers 1011 // of this pacing information, for the next cycle. 1012 systemstack(gcControllerCommit) 1013 1014 // Update timing memstats 1015 now := nanotime() 1016 sec, nsec, _ := time_now() 1017 unixNow := sec*1e9 + int64(nsec) 1018 work.pauseNS += now - work.pauseStart 1019 work.tEnd = now 1020 memstats.gcPauseDist.record(now - work.pauseStart) 1021 atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user 1022 atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us 1023 memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS) 1024 memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow) 1025 memstats.pause_total_ns += uint64(work.pauseNS) 1026 1027 markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm) 1028 work.cpuStats.gcPauseTime += markTermCpu 1029 work.cpuStats.gcTotalTime += markTermCpu 1030 1031 // Accumulate CPU stats. 1032 // 1033 // Pass gcMarkPhase=true so we can get all the latest GC CPU stats in there too. 1034 work.cpuStats.accumulate(now, true) 1035 1036 // Compute overall GC CPU utilization. 1037 // Omit idle marking time from the overall utilization here since it's "free". 1038 memstats.gc_cpu_fraction = float64(work.cpuStats.gcTotalTime-work.cpuStats.gcIdleTime) / float64(work.cpuStats.totalTime) 1039 1040 // Reset assist time and background time stats. 1041 // 1042 // Do this now, instead of at the start of the next GC cycle, because 1043 // these two may keep accumulating even if the GC is not active. 1044 scavenge.assistTime.Store(0) 1045 scavenge.backgroundTime.Store(0) 1046 1047 // Reset idle time stat. 1048 sched.idleTime.Store(0) 1049 1050 // Reset sweep state. 1051 sweep.nbgsweep = 0 1052 sweep.npausesweep = 0 1053 1054 if work.userForced { 1055 memstats.numforcedgc++ 1056 } 1057 1058 // Bump GC cycle count and wake goroutines waiting on sweep. 1059 lock(&work.sweepWaiters.lock) 1060 memstats.numgc++ 1061 injectglist(&work.sweepWaiters.list) 1062 unlock(&work.sweepWaiters.lock) 1063 1064 // Increment the scavenge generation now. 1065 // 1066 // This moment represents peak heap in use because we're 1067 // about to start sweeping. 1068 mheap_.pages.scav.index.nextGen() 1069 1070 // Release the CPU limiter. 1071 gcCPULimiter.finishGCTransition(now) 1072 1073 // Finish the current heap profiling cycle and start a new 1074 // heap profiling cycle. We do this before starting the world 1075 // so events don't leak into the wrong cycle. 1076 mProf_NextCycle() 1077 1078 // There may be stale spans in mcaches that need to be swept. 1079 // Those aren't tracked in any sweep lists, so we need to 1080 // count them against sweep completion until we ensure all 1081 // those spans have been forced out. 1082 sl := sweep.active.begin() 1083 if !sl.valid { 1084 throw("failed to set sweep barrier") 1085 } 1086 1087 systemstack(func() { startTheWorldWithSema() }) 1088 1089 // Flush the heap profile so we can start a new cycle next GC. 1090 // This is relatively expensive, so we don't do it with the 1091 // world stopped. 1092 mProf_Flush() 1093 1094 // Prepare workbufs for freeing by the sweeper. We do this 1095 // asynchronously because it can take non-trivial time. 1096 prepareFreeWorkbufs() 1097 1098 // Free stack spans. This must be done between GC cycles. 1099 systemstack(freeStackSpans) 1100 1101 // Ensure all mcaches are flushed. Each P will flush its own 1102 // mcache before allocating, but idle Ps may not. Since this 1103 // is necessary to sweep all spans, we need to ensure all 1104 // mcaches are flushed before we start the next GC cycle. 1105 // 1106 // While we're here, flush the page cache for idle Ps to avoid 1107 // having pages get stuck on them. These pages are hidden from 1108 // the scavenger, so in small idle heaps a significant amount 1109 // of additional memory might be held onto. 1110 // 1111 // Also, flush the pinner cache, to avoid leaking that memory 1112 // indefinitely. 1113 systemstack(func() { 1114 forEachP(func(pp *p) { 1115 pp.mcache.prepareForSweep() 1116 if pp.status == _Pidle { 1117 systemstack(func() { 1118 lock(&mheap_.lock) 1119 pp.pcache.flush(&mheap_.pages) 1120 unlock(&mheap_.lock) 1121 }) 1122 } 1123 pp.pinnerCache = nil 1124 }) 1125 }) 1126 // Now that we've swept stale spans in mcaches, they don't 1127 // count against unswept spans. 1128 sweep.active.end(sl) 1129 1130 // Print gctrace before dropping worldsema. As soon as we drop 1131 // worldsema another cycle could start and smash the stats 1132 // we're trying to print. 1133 if debug.gctrace > 0 { 1134 util := int(memstats.gc_cpu_fraction * 100) 1135 1136 var sbuf [24]byte 1137 printlock() 1138 print("gc ", memstats.numgc, 1139 " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ", 1140 util, "%: ") 1141 prev := work.tSweepTerm 1142 for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} { 1143 if i != 0 { 1144 print("+") 1145 } 1146 print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev)))) 1147 prev = ns 1148 } 1149 print(" ms clock, ") 1150 for i, ns := range []int64{ 1151 int64(work.stwprocs) * (work.tMark - work.tSweepTerm), 1152 gcController.assistTime.Load(), 1153 gcController.dedicatedMarkTime.Load() + gcController.fractionalMarkTime.Load(), 1154 gcController.idleMarkTime.Load(), 1155 markTermCpu, 1156 } { 1157 if i == 2 || i == 3 { 1158 // Separate mark time components with /. 1159 print("/") 1160 } else if i != 0 { 1161 print("+") 1162 } 1163 print(string(fmtNSAsMS(sbuf[:], uint64(ns)))) 1164 } 1165 print(" ms cpu, ", 1166 work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ", 1167 gcController.lastHeapGoal>>20, " MB goal, ", 1168 gcController.lastStackScan.Load()>>20, " MB stacks, ", 1169 gcController.globalsScan.Load()>>20, " MB globals, ", 1170 work.maxprocs, " P") 1171 if work.userForced { 1172 print(" (forced)") 1173 } 1174 print("\n") 1175 printunlock() 1176 } 1177 1178 // Set any arena chunks that were deferred to fault. 1179 lock(&userArenaState.lock) 1180 faultList := userArenaState.fault 1181 userArenaState.fault = nil 1182 unlock(&userArenaState.lock) 1183 for _, lc := range faultList { 1184 lc.mspan.setUserArenaChunkToFault() 1185 } 1186 1187 // Enable huge pages on some metadata if we cross a heap threshold. 1188 if gcController.heapGoal() > minHeapForMetadataHugePages { 1189 systemstack(func() { 1190 mheap_.enableMetadataHugePages() 1191 }) 1192 } 1193 1194 semrelease(&worldsema) 1195 semrelease(&gcsema) 1196 // Careful: another GC cycle may start now. 1197 1198 releasem(mp) 1199 mp = nil 1200 1201 // now that gc is done, kick off finalizer thread if needed 1202 if !concurrentSweep { 1203 // give the queued finalizers, if any, a chance to run 1204 Gosched() 1205 } 1206 } 1207 1208 // gcBgMarkStartWorkers prepares background mark worker goroutines. These 1209 // goroutines will not run until the mark phase, but they must be started while 1210 // the work is not stopped and from a regular G stack. The caller must hold 1211 // worldsema. 1212 func gcBgMarkStartWorkers() { 1213 // Background marking is performed by per-P G's. Ensure that each P has 1214 // a background GC G. 1215 // 1216 // Worker Gs don't exit if gomaxprocs is reduced. If it is raised 1217 // again, we can reuse the old workers; no need to create new workers. 1218 for gcBgMarkWorkerCount < gomaxprocs { 1219 go gcBgMarkWorker() 1220 1221 notetsleepg(&work.bgMarkReady, -1) 1222 noteclear(&work.bgMarkReady) 1223 // The worker is now guaranteed to be added to the pool before 1224 // its P's next findRunnableGCWorker. 1225 1226 gcBgMarkWorkerCount++ 1227 } 1228 } 1229 1230 // gcBgMarkPrepare sets up state for background marking. 1231 // Mutator assists must not yet be enabled. 1232 func gcBgMarkPrepare() { 1233 // Background marking will stop when the work queues are empty 1234 // and there are no more workers (note that, since this is 1235 // concurrent, this may be a transient state, but mark 1236 // termination will clean it up). Between background workers 1237 // and assists, we don't really know how many workers there 1238 // will be, so we pretend to have an arbitrarily large number 1239 // of workers, almost all of which are "waiting". While a 1240 // worker is working it decrements nwait. If nproc == nwait, 1241 // there are no workers. 1242 work.nproc = ^uint32(0) 1243 work.nwait = ^uint32(0) 1244 } 1245 1246 // gcBgMarkWorkerNode is an entry in the gcBgMarkWorkerPool. It points to a single 1247 // gcBgMarkWorker goroutine. 1248 type gcBgMarkWorkerNode struct { 1249 // Unused workers are managed in a lock-free stack. This field must be first. 1250 node lfnode 1251 1252 // The g of this worker. 1253 gp guintptr 1254 1255 // Release this m on park. This is used to communicate with the unlock 1256 // function, which cannot access the G's stack. It is unused outside of 1257 // gcBgMarkWorker(). 1258 m muintptr 1259 } 1260 1261 func gcBgMarkWorker() { 1262 gp := getg() 1263 1264 // We pass node to a gopark unlock function, so it can't be on 1265 // the stack (see gopark). Prevent deadlock from recursively 1266 // starting GC by disabling preemption. 1267 gp.m.preemptoff = "GC worker init" 1268 node := new(gcBgMarkWorkerNode) 1269 gp.m.preemptoff = "" 1270 1271 node.gp.set(gp) 1272 1273 node.m.set(acquirem()) 1274 notewakeup(&work.bgMarkReady) 1275 // After this point, the background mark worker is generally scheduled 1276 // cooperatively by gcController.findRunnableGCWorker. While performing 1277 // work on the P, preemption is disabled because we are working on 1278 // P-local work buffers. When the preempt flag is set, this puts itself 1279 // into _Gwaiting to be woken up by gcController.findRunnableGCWorker 1280 // at the appropriate time. 1281 // 1282 // When preemption is enabled (e.g., while in gcMarkDone), this worker 1283 // may be preempted and schedule as a _Grunnable G from a runq. That is 1284 // fine; it will eventually gopark again for further scheduling via 1285 // findRunnableGCWorker. 1286 // 1287 // Since we disable preemption before notifying bgMarkReady, we 1288 // guarantee that this G will be in the worker pool for the next 1289 // findRunnableGCWorker. This isn't strictly necessary, but it reduces 1290 // latency between _GCmark starting and the workers starting. 1291 1292 for { 1293 // Go to sleep until woken by 1294 // gcController.findRunnableGCWorker. 1295 gopark(func(g *g, nodep unsafe.Pointer) bool { 1296 node := (*gcBgMarkWorkerNode)(nodep) 1297 1298 if mp := node.m.ptr(); mp != nil { 1299 // The worker G is no longer running; release 1300 // the M. 1301 // 1302 // N.B. it is _safe_ to release the M as soon 1303 // as we are no longer performing P-local mark 1304 // work. 1305 // 1306 // However, since we cooperatively stop work 1307 // when gp.preempt is set, if we releasem in 1308 // the loop then the following call to gopark 1309 // would immediately preempt the G. This is 1310 // also safe, but inefficient: the G must 1311 // schedule again only to enter gopark and park 1312 // again. Thus, we defer the release until 1313 // after parking the G. 1314 releasem(mp) 1315 } 1316 1317 // Release this G to the pool. 1318 gcBgMarkWorkerPool.push(&node.node) 1319 // Note that at this point, the G may immediately be 1320 // rescheduled and may be running. 1321 return true 1322 }, unsafe.Pointer(node), waitReasonGCWorkerIdle, traceBlockSystemGoroutine, 0) 1323 1324 // Preemption must not occur here, or another G might see 1325 // p.gcMarkWorkerMode. 1326 1327 // Disable preemption so we can use the gcw. If the 1328 // scheduler wants to preempt us, we'll stop draining, 1329 // dispose the gcw, and then preempt. 1330 node.m.set(acquirem()) 1331 pp := gp.m.p.ptr() // P can't change with preemption disabled. 1332 1333 if gcBlackenEnabled == 0 { 1334 println("worker mode", pp.gcMarkWorkerMode) 1335 throw("gcBgMarkWorker: blackening not enabled") 1336 } 1337 1338 if pp.gcMarkWorkerMode == gcMarkWorkerNotWorker { 1339 throw("gcBgMarkWorker: mode not set") 1340 } 1341 1342 startTime := nanotime() 1343 pp.gcMarkWorkerStartTime = startTime 1344 var trackLimiterEvent bool 1345 if pp.gcMarkWorkerMode == gcMarkWorkerIdleMode { 1346 trackLimiterEvent = pp.limiterEvent.start(limiterEventIdleMarkWork, startTime) 1347 } 1348 1349 decnwait := atomic.Xadd(&work.nwait, -1) 1350 if decnwait == work.nproc { 1351 println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc) 1352 throw("work.nwait was > work.nproc") 1353 } 1354 1355 systemstack(func() { 1356 // Mark our goroutine preemptible so its stack 1357 // can be scanned. This lets two mark workers 1358 // scan each other (otherwise, they would 1359 // deadlock). We must not modify anything on 1360 // the G stack. However, stack shrinking is 1361 // disabled for mark workers, so it is safe to 1362 // read from the G stack. 1363 casGToWaiting(gp, _Grunning, waitReasonGCWorkerActive) 1364 switch pp.gcMarkWorkerMode { 1365 default: 1366 throw("gcBgMarkWorker: unexpected gcMarkWorkerMode") 1367 case gcMarkWorkerDedicatedMode: 1368 gcDrain(&pp.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit) 1369 if gp.preempt { 1370 // We were preempted. This is 1371 // a useful signal to kick 1372 // everything out of the run 1373 // queue so it can run 1374 // somewhere else. 1375 if drainQ, n := runqdrain(pp); n > 0 { 1376 lock(&sched.lock) 1377 globrunqputbatch(&drainQ, int32(n)) 1378 unlock(&sched.lock) 1379 } 1380 } 1381 // Go back to draining, this time 1382 // without preemption. 1383 gcDrain(&pp.gcw, gcDrainFlushBgCredit) 1384 case gcMarkWorkerFractionalMode: 1385 gcDrain(&pp.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1386 case gcMarkWorkerIdleMode: 1387 gcDrain(&pp.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1388 } 1389 casgstatus(gp, _Gwaiting, _Grunning) 1390 }) 1391 1392 // Account for time and mark us as stopped. 1393 now := nanotime() 1394 duration := now - startTime 1395 gcController.markWorkerStop(pp.gcMarkWorkerMode, duration) 1396 if trackLimiterEvent { 1397 pp.limiterEvent.stop(limiterEventIdleMarkWork, now) 1398 } 1399 if pp.gcMarkWorkerMode == gcMarkWorkerFractionalMode { 1400 atomic.Xaddint64(&pp.gcFractionalMarkTime, duration) 1401 } 1402 1403 // Was this the last worker and did we run out 1404 // of work? 1405 incnwait := atomic.Xadd(&work.nwait, +1) 1406 if incnwait > work.nproc { 1407 println("runtime: p.gcMarkWorkerMode=", pp.gcMarkWorkerMode, 1408 "work.nwait=", incnwait, "work.nproc=", work.nproc) 1409 throw("work.nwait > work.nproc") 1410 } 1411 1412 // We'll releasem after this point and thus this P may run 1413 // something else. We must clear the worker mode to avoid 1414 // attributing the mode to a different (non-worker) G in 1415 // traceGoStart. 1416 pp.gcMarkWorkerMode = gcMarkWorkerNotWorker 1417 1418 // If this worker reached a background mark completion 1419 // point, signal the main GC goroutine. 1420 if incnwait == work.nproc && !gcMarkWorkAvailable(nil) { 1421 // We don't need the P-local buffers here, allow 1422 // preemption because we may schedule like a regular 1423 // goroutine in gcMarkDone (block on locks, etc). 1424 releasem(node.m.ptr()) 1425 node.m.set(nil) 1426 1427 gcMarkDone() 1428 } 1429 } 1430 } 1431 1432 // gcMarkWorkAvailable reports whether executing a mark worker 1433 // on p is potentially useful. p may be nil, in which case it only 1434 // checks the global sources of work. 1435 func gcMarkWorkAvailable(p *p) bool { 1436 if p != nil && !p.gcw.empty() { 1437 return true 1438 } 1439 if !work.full.empty() { 1440 return true // global work available 1441 } 1442 if work.markrootNext < work.markrootJobs { 1443 return true // root scan work available 1444 } 1445 return false 1446 } 1447 1448 // gcMark runs the mark (or, for concurrent GC, mark termination) 1449 // All gcWork caches must be empty. 1450 // STW is in effect at this point. 1451 func gcMark(startTime int64) { 1452 if debug.allocfreetrace > 0 { 1453 tracegc() 1454 } 1455 1456 if gcphase != _GCmarktermination { 1457 throw("in gcMark expecting to see gcphase as _GCmarktermination") 1458 } 1459 work.tstart = startTime 1460 1461 // Check that there's no marking work remaining. 1462 if work.full != 0 || work.markrootNext < work.markrootJobs { 1463 print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n") 1464 panic("non-empty mark queue after concurrent mark") 1465 } 1466 1467 if debug.gccheckmark > 0 { 1468 // This is expensive when there's a large number of 1469 // Gs, so only do it if checkmark is also enabled. 1470 gcMarkRootCheck() 1471 } 1472 1473 // Drop allg snapshot. allgs may have grown, in which case 1474 // this is the only reference to the old backing store and 1475 // there's no need to keep it around. 1476 work.stackRoots = nil 1477 1478 // Clear out buffers and double-check that all gcWork caches 1479 // are empty. This should be ensured by gcMarkDone before we 1480 // enter mark termination. 1481 // 1482 // TODO: We could clear out buffers just before mark if this 1483 // has a non-negligible impact on STW time. 1484 for _, p := range allp { 1485 // The write barrier may have buffered pointers since 1486 // the gcMarkDone barrier. However, since the barrier 1487 // ensured all reachable objects were marked, all of 1488 // these must be pointers to black objects. Hence we 1489 // can just discard the write barrier buffer. 1490 if debug.gccheckmark > 0 { 1491 // For debugging, flush the buffer and make 1492 // sure it really was all marked. 1493 wbBufFlush1(p) 1494 } else { 1495 p.wbBuf.reset() 1496 } 1497 1498 gcw := &p.gcw 1499 if !gcw.empty() { 1500 printlock() 1501 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 1502 if gcw.wbuf1 == nil { 1503 print(" wbuf1=<nil>") 1504 } else { 1505 print(" wbuf1.n=", gcw.wbuf1.nobj) 1506 } 1507 if gcw.wbuf2 == nil { 1508 print(" wbuf2=<nil>") 1509 } else { 1510 print(" wbuf2.n=", gcw.wbuf2.nobj) 1511 } 1512 print("\n") 1513 throw("P has cached GC work at end of mark termination") 1514 } 1515 // There may still be cached empty buffers, which we 1516 // need to flush since we're going to free them. Also, 1517 // there may be non-zero stats because we allocated 1518 // black after the gcMarkDone barrier. 1519 gcw.dispose() 1520 } 1521 1522 // Flush scanAlloc from each mcache since we're about to modify 1523 // heapScan directly. If we were to flush this later, then scanAlloc 1524 // might have incorrect information. 1525 // 1526 // Note that it's not important to retain this information; we know 1527 // exactly what heapScan is at this point via scanWork. 1528 for _, p := range allp { 1529 c := p.mcache 1530 if c == nil { 1531 continue 1532 } 1533 c.scanAlloc = 0 1534 } 1535 1536 // Reset controller state. 1537 gcController.resetLive(work.bytesMarked) 1538 } 1539 1540 // gcSweep must be called on the system stack because it acquires the heap 1541 // lock. See mheap for details. 1542 // 1543 // The world must be stopped. 1544 // 1545 //go:systemstack 1546 func gcSweep(mode gcMode) { 1547 assertWorldStopped() 1548 1549 if gcphase != _GCoff { 1550 throw("gcSweep being done but phase is not GCoff") 1551 } 1552 1553 lock(&mheap_.lock) 1554 mheap_.sweepgen += 2 1555 sweep.active.reset() 1556 mheap_.pagesSwept.Store(0) 1557 mheap_.sweepArenas = mheap_.allArenas 1558 mheap_.reclaimIndex.Store(0) 1559 mheap_.reclaimCredit.Store(0) 1560 unlock(&mheap_.lock) 1561 1562 sweep.centralIndex.clear() 1563 1564 if !_ConcurrentSweep || mode == gcForceBlockMode { 1565 // Special case synchronous sweep. 1566 // Record that no proportional sweeping has to happen. 1567 lock(&mheap_.lock) 1568 mheap_.sweepPagesPerByte = 0 1569 unlock(&mheap_.lock) 1570 // Sweep all spans eagerly. 1571 for sweepone() != ^uintptr(0) { 1572 sweep.npausesweep++ 1573 } 1574 // Free workbufs eagerly. 1575 prepareFreeWorkbufs() 1576 for freeSomeWbufs(false) { 1577 } 1578 // All "free" events for this mark/sweep cycle have 1579 // now happened, so we can make this profile cycle 1580 // available immediately. 1581 mProf_NextCycle() 1582 mProf_Flush() 1583 return 1584 } 1585 1586 // Background sweep. 1587 lock(&sweep.lock) 1588 if sweep.parked { 1589 sweep.parked = false 1590 ready(sweep.g, 0, true) 1591 } 1592 unlock(&sweep.lock) 1593 } 1594 1595 // gcResetMarkState resets global state prior to marking (concurrent 1596 // or STW) and resets the stack scan state of all Gs. 1597 // 1598 // This is safe to do without the world stopped because any Gs created 1599 // during or after this will start out in the reset state. 1600 // 1601 // gcResetMarkState must be called on the system stack because it acquires 1602 // the heap lock. See mheap for details. 1603 // 1604 //go:systemstack 1605 func gcResetMarkState() { 1606 // This may be called during a concurrent phase, so lock to make sure 1607 // allgs doesn't change. 1608 forEachG(func(gp *g) { 1609 gp.gcscandone = false // set to true in gcphasework 1610 gp.gcAssistBytes = 0 1611 }) 1612 1613 // Clear page marks. This is just 1MB per 64GB of heap, so the 1614 // time here is pretty trivial. 1615 lock(&mheap_.lock) 1616 arenas := mheap_.allArenas 1617 unlock(&mheap_.lock) 1618 for _, ai := range arenas { 1619 ha := mheap_.arenas[ai.l1()][ai.l2()] 1620 for i := range ha.pageMarks { 1621 ha.pageMarks[i] = 0 1622 } 1623 } 1624 1625 work.bytesMarked = 0 1626 work.initialHeapLive = gcController.heapLive.Load() 1627 } 1628 1629 // Hooks for other packages 1630 1631 var poolcleanup func() 1632 var boringCaches []unsafe.Pointer // for crypto/internal/boring 1633 1634 //go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup 1635 func sync_runtime_registerPoolCleanup(f func()) { 1636 poolcleanup = f 1637 } 1638 1639 //go:linkname boring_registerCache crypto/internal/boring/bcache.registerCache 1640 func boring_registerCache(p unsafe.Pointer) { 1641 boringCaches = append(boringCaches, p) 1642 } 1643 1644 func clearpools() { 1645 // clear sync.Pools 1646 if poolcleanup != nil { 1647 poolcleanup() 1648 } 1649 1650 // clear boringcrypto caches 1651 for _, p := range boringCaches { 1652 atomicstorep(p, nil) 1653 } 1654 1655 // Clear central sudog cache. 1656 // Leave per-P caches alone, they have strictly bounded size. 1657 // Disconnect cached list before dropping it on the floor, 1658 // so that a dangling ref to one entry does not pin all of them. 1659 lock(&sched.sudoglock) 1660 var sg, sgnext *sudog 1661 for sg = sched.sudogcache; sg != nil; sg = sgnext { 1662 sgnext = sg.next 1663 sg.next = nil 1664 } 1665 sched.sudogcache = nil 1666 unlock(&sched.sudoglock) 1667 1668 // Clear central defer pool. 1669 // Leave per-P pools alone, they have strictly bounded size. 1670 lock(&sched.deferlock) 1671 // disconnect cached list before dropping it on the floor, 1672 // so that a dangling ref to one entry does not pin all of them. 1673 var d, dlink *_defer 1674 for d = sched.deferpool; d != nil; d = dlink { 1675 dlink = d.link 1676 d.link = nil 1677 } 1678 sched.deferpool = nil 1679 unlock(&sched.deferlock) 1680 } 1681 1682 // Timing 1683 1684 // itoaDiv formats val/(10**dec) into buf. 1685 func itoaDiv(buf []byte, val uint64, dec int) []byte { 1686 i := len(buf) - 1 1687 idec := i - dec 1688 for val >= 10 || i >= idec { 1689 buf[i] = byte(val%10 + '0') 1690 i-- 1691 if i == idec { 1692 buf[i] = '.' 1693 i-- 1694 } 1695 val /= 10 1696 } 1697 buf[i] = byte(val + '0') 1698 return buf[i:] 1699 } 1700 1701 // fmtNSAsMS nicely formats ns nanoseconds as milliseconds. 1702 func fmtNSAsMS(buf []byte, ns uint64) []byte { 1703 if ns >= 10e6 { 1704 // Format as whole milliseconds. 1705 return itoaDiv(buf, ns/1e6, 0) 1706 } 1707 // Format two digits of precision, with at most three decimal places. 1708 x := ns / 1e3 1709 if x == 0 { 1710 buf[0] = '0' 1711 return buf[:1] 1712 } 1713 dec := 3 1714 for x >= 100 { 1715 x /= 10 1716 dec-- 1717 } 1718 return itoaDiv(buf, x, dec) 1719 } 1720 1721 // Helpers for testing GC. 1722 1723 // gcTestMoveStackOnNextCall causes the stack to be moved on a call 1724 // immediately following the call to this. It may not work correctly 1725 // if any other work appears after this call (such as returning). 1726 // Typically the following call should be marked go:noinline so it 1727 // performs a stack check. 1728 // 1729 // In rare cases this may not cause the stack to move, specifically if 1730 // there's a preemption between this call and the next. 1731 func gcTestMoveStackOnNextCall() { 1732 gp := getg() 1733 gp.stackguard0 = stackForceMove 1734 } 1735 1736 // gcTestIsReachable performs a GC and returns a bit set where bit i 1737 // is set if ptrs[i] is reachable. 1738 func gcTestIsReachable(ptrs ...unsafe.Pointer) (mask uint64) { 1739 // This takes the pointers as unsafe.Pointers in order to keep 1740 // them live long enough for us to attach specials. After 1741 // that, we drop our references to them. 1742 1743 if len(ptrs) > 64 { 1744 panic("too many pointers for uint64 mask") 1745 } 1746 1747 // Block GC while we attach specials and drop our references 1748 // to ptrs. Otherwise, if a GC is in progress, it could mark 1749 // them reachable via this function before we have a chance to 1750 // drop them. 1751 semacquire(&gcsema) 1752 1753 // Create reachability specials for ptrs. 1754 specials := make([]*specialReachable, len(ptrs)) 1755 for i, p := range ptrs { 1756 lock(&mheap_.speciallock) 1757 s := (*specialReachable)(mheap_.specialReachableAlloc.alloc()) 1758 unlock(&mheap_.speciallock) 1759 s.special.kind = _KindSpecialReachable 1760 if !addspecial(p, &s.special) { 1761 throw("already have a reachable special (duplicate pointer?)") 1762 } 1763 specials[i] = s 1764 // Make sure we don't retain ptrs. 1765 ptrs[i] = nil 1766 } 1767 1768 semrelease(&gcsema) 1769 1770 // Force a full GC and sweep. 1771 GC() 1772 1773 // Process specials. 1774 for i, s := range specials { 1775 if !s.done { 1776 printlock() 1777 println("runtime: object", i, "was not swept") 1778 throw("IsReachable failed") 1779 } 1780 if s.reachable { 1781 mask |= 1 << i 1782 } 1783 lock(&mheap_.speciallock) 1784 mheap_.specialReachableAlloc.free(unsafe.Pointer(s)) 1785 unlock(&mheap_.speciallock) 1786 } 1787 1788 return mask 1789 } 1790 1791 // gcTestPointerClass returns the category of what p points to, one of: 1792 // "heap", "stack", "data", "bss", "other". This is useful for checking 1793 // that a test is doing what it's intended to do. 1794 // 1795 // This is nosplit simply to avoid extra pointer shuffling that may 1796 // complicate a test. 1797 // 1798 //go:nosplit 1799 func gcTestPointerClass(p unsafe.Pointer) string { 1800 p2 := uintptr(noescape(p)) 1801 gp := getg() 1802 if gp.stack.lo <= p2 && p2 < gp.stack.hi { 1803 return "stack" 1804 } 1805 if base, _, _ := findObject(p2, 0, 0); base != 0 { 1806 return "heap" 1807 } 1808 for _, datap := range activeModules() { 1809 if datap.data <= p2 && p2 < datap.edata || datap.noptrdata <= p2 && p2 < datap.enoptrdata { 1810 return "data" 1811 } 1812 if datap.bss <= p2 && p2 < datap.ebss || datap.noptrbss <= p2 && p2 <= datap.enoptrbss { 1813 return "bss" 1814 } 1815 } 1816 KeepAlive(p) 1817 return "other" 1818 } 1819