Intel Patents Software Defined Cores to Boost Single Thread Performance

Quick Report

Intel's published patent EP4579444A1 (“Software defined super cores”) describes a technique called Software Defined Cores (SDC) that lets two or more physical cores be virtually fused to act as a single higher-IPC core for single-thread workloads. The approach splits a single-threaded program into instruction segments that run concurrently on multiple cores while preserving in-order retirement and architectural correctness.

Key points from the patent:

  • SDC splits a single-threaded program into instruction segments (chunks) and steers different segments to different physical cores so they execute in parallel while appearing as a single core to the OS or VM.

  • Hardware support includes a small SDC interface per core (memory execution cluster enhancements), a wormhole/shared address space for exchanging live-in/live-out register data, shadow store buffers, and mechanisms for low-overhead core-to-core communication.

  • Software support can come from static/JIT compilers or can be applied to legacy binaries with inserted flow-control markers; the split/steer can be performed at compile time or dynamically.

  • The patent describes techniques for handling branches, loops, speculative execution, register prediction/validation, and corrective recovery with low overhead (<~5% in examples).

  • A thread runtime telemetry unit (hardware guide scheduler / Thread Director-like circuitry) can recommend when to enter/exit SDC mode based on IPC, voltage/frequency, and workload characteristics.

  • Benefits: higher IPC for single-thread bursts without needing larger monolithic cores, improved energy efficiency versus raising frequency, and flexible use of neighboring cores to scale performance.

  • The patent covers a range of implementations, including homogeneous and heterogeneous cores, and describes entry/exit, throttling, and state-merge mechanisms to preserve architectural semantics.

SDC is a pragmatic way to boost single-thread performance by leveraging existing core counts rather than increasing core complexity or frequency. If realized in silicon + OS/compiler support, it could improve bursty single-thread workloads and gaming/legacy apps without larger die area or higher sustained power.

Written using GitHub Copilot GPT-4.1 in agentic mode instructed to follow current codebase style and conventions for writing articles.

Source(s)

  • Software defined super cores
  • EP patent PDF
  • TPU