This isn’t really a deep dive into the tech, it’s just an SEO placeholder for a company that sells memory modules.
That aside, people usually learn about RDIMM the hard way when they need to get ECC unbuffered RAM for something like HEDT/workstation or tower server Xeon/i3/Ryzen with ECC support but buy the cheaper RDIMM instead of the ECC UDIMM modules only to figure out they don’t work!
Not true, the W790 chipset is purely for HEDT/workstations, distinct from the sapphire rapids server line. AMD also has an entire separate HEDT socket, sTR5 for threadripper CPUs. HEDT is alive and well!
HEDT as a consumer product segment distinct from the workstation segment is pretty close to dead. Intel hasn't introduced a new HEDT socket since 2017 and hasn't launched any new CPUs for that platform since 2019.
AMD's Threadripper line has been inconsistent; the 5000 series was all Threadripper PRO parts, then last year's 7000 series brought back the non-PRO Threadripper options. But even the entry-level Threadripper CPUs and TRX50 motherboards available today are less affordable than HEDT systems were eg. 15 years ago. The high core counts available in mainstream desktop sockets have shifted the boundary between that segment and HEDT, and as a result there aren't good options to step up to a platform with more IO capability without also stepping up to really high CPU core counts.
Worked on building a new desktop this month. Most AMD consumer CPUs support unbuffered ECC RAM to help protect against inflight bit flipped. DDR5 standard has chip level built-in ECC to protect bit flipping on the RAM but not in-flight. Went with Kingston again because CORSAIR only sells quad-kits. Dual-kits should be a thing with CORSAIR because most iATX motherboards only have two slots for RAM. Kingston marketing and sales understand this.
My POV, ECC should be standard for inflight on all desktops. The majority of desktop (business and education) is used for content creation versus pure content consumption. And ECC needs to ditch 1 bit and move to 2 bit or more masking.
it just dawned on me how trivially simple it would be for memory controllers to implement ECC in UDIMMs, for every N words, reserve 1 word for parity. You gain ECC for a small decrease in capacity. Since the memory controller is on the CPU, it can easily abstract this away.
But you not only loose some capacity. Some bandwidth is also lost. And perhaps even some CPU cycles, since likely in-band ECC hasn't been implemented purely in a hard IP-block.
I think the bigger performance problem is that a read burst from one channel of RAM is no longer matched to the CPU cache line size when doing in-band ECC.
The chips with in-band ECC have a separate dedicated cache for storing ECC codes, which are stored in another part of the memory chip, not inline with the corresponding cache line that stores data.
So the burst transfers have the same size as when ECC is disabled.
Without the special cache, the number of memory accesses would double, for data and for the extra ECC bits, which would not be acceptable. With the ECC cache, in many cases the reading and writing of the extra ECC bits can be avoided.
There have been published a few benchmarks for inline ECC. The performance loss depends on the cache hit rates, so it varies a lot from program to program. In some cases the speed is lower by only a couple percent, but for some applications the performance loss can be as high as 20% or 30%.
This is true, however, with the readahead cpu's usually do anyway, I don't even think it's that bad..
There is definitely a performance and capacity cost, but again, technically, that capacity cost is also present in ECC memory, that extra memory is still there, it's just not printed on the label, and instead, the stick is more expensive..
The cpu cache won't be mismatched though, since the memory controller can mask this. The performance hit will be due to the memory controller having to do the extra reads for parity.
That will be a tiny mismatch, and I wonder if the performance implication of this won't more or less be equal to the performance difference we already have between buffered and unbuffered memory (more or less the same, simply, now that "extra work", moved from inside the dimm, to the memory controller)
Nvidia GPUs, that support ECC, do this. I believe it's called inline ECC and does cost latency, bandwidth, and memory capacity.
This helps, but ideally the entire path from CPU to Dimms is wider and covers not just what is being read or written, but also the address it's being written to. After all writing the correct bits to the wrong address is a serious failure.
This isn’t really a deep dive into the tech, it’s just an SEO placeholder for a company that sells memory modules.
That aside, people usually learn about RDIMM the hard way when they need to get ECC unbuffered RAM for something like HEDT/workstation or tower server Xeon/i3/Ryzen with ECC support but buy the cheaper RDIMM instead of the ECC UDIMM modules only to figure out they don’t work!
HEDT has disappeared. It's just server platforms with RDIMM now.
Not true, the W790 chipset is purely for HEDT/workstations, distinct from the sapphire rapids server line. AMD also has an entire separate HEDT socket, sTR5 for threadripper CPUs. HEDT is alive and well!
HEDT as a consumer product segment distinct from the workstation segment is pretty close to dead. Intel hasn't introduced a new HEDT socket since 2017 and hasn't launched any new CPUs for that platform since 2019.
AMD's Threadripper line has been inconsistent; the 5000 series was all Threadripper PRO parts, then last year's 7000 series brought back the non-PRO Threadripper options. But even the entry-level Threadripper CPUs and TRX50 motherboards available today are less affordable than HEDT systems were eg. 15 years ago. The high core counts available in mainstream desktop sockets have shifted the boundary between that segment and HEDT, and as a result there aren't good options to step up to a platform with more IO capability without also stepping up to really high CPU core counts.
Worked on building a new desktop this month. Most AMD consumer CPUs support unbuffered ECC RAM to help protect against inflight bit flipped. DDR5 standard has chip level built-in ECC to protect bit flipping on the RAM but not in-flight. Went with Kingston again because CORSAIR only sells quad-kits. Dual-kits should be a thing with CORSAIR because most iATX motherboards only have two slots for RAM. Kingston marketing and sales understand this.
My POV, ECC should be standard for inflight on all desktops. The majority of desktop (business and education) is used for content creation versus pure content consumption. And ECC needs to ditch 1 bit and move to 2 bit or more masking.
> Went with Kingston again because CORSAIR only sells quad-kits.
What?
https://www.corsair.com/us/en/c/memory/ddr5-ram
Plenty of two-stick kits there. Were you just trying to make up an excuse to pick Kingston over Corsair?
Related:
The Difference Between a Standard DIMM and a Cudimm or Csodimm [Crucial/Micron] (16.11.2024)
https://news.ycombinator.com/item?id=42102076
it just dawned on me how trivially simple it would be for memory controllers to implement ECC in UDIMMs, for every N words, reserve 1 word for parity. You gain ECC for a small decrease in capacity. Since the memory controller is on the CPU, it can easily abstract this away.
Indeed. Intel has recently implemented it in a low-cost CPU SoC: "in-band ECC".
https://news.ycombinator.com/item?id=41090956
But you not only loose some capacity. Some bandwidth is also lost. And perhaps even some CPU cycles, since likely in-band ECC hasn't been implemented purely in a hard IP-block.
I think the bigger performance problem is that a read burst from one channel of RAM is no longer matched to the CPU cache line size when doing in-band ECC.
The chips with in-band ECC have a separate dedicated cache for storing ECC codes, which are stored in another part of the memory chip, not inline with the corresponding cache line that stores data.
So the burst transfers have the same size as when ECC is disabled.
Without the special cache, the number of memory accesses would double, for data and for the extra ECC bits, which would not be acceptable. With the ECC cache, in many cases the reading and writing of the extra ECC bits can be avoided.
There have been published a few benchmarks for inline ECC. The performance loss depends on the cache hit rates, so it varies a lot from program to program. In some cases the speed is lower by only a couple percent, but for some applications the performance loss can be as high as 20% or 30%.
This is true, however, with the readahead cpu's usually do anyway, I don't even think it's that bad.. There is definitely a performance and capacity cost, but again, technically, that capacity cost is also present in ECC memory, that extra memory is still there, it's just not printed on the label, and instead, the stick is more expensive..
The cpu cache won't be mismatched though, since the memory controller can mask this. The performance hit will be due to the memory controller having to do the extra reads for parity.
That will be a tiny mismatch, and I wonder if the performance implication of this won't more or less be equal to the performance difference we already have between buffered and unbuffered memory (more or less the same, simply, now that "extra work", moved from inside the dimm, to the memory controller)
Nvidia GPUs, that support ECC, do this. I believe it's called inline ECC and does cost latency, bandwidth, and memory capacity.
This helps, but ideally the entire path from CPU to Dimms is wider and covers not just what is being read or written, but also the address it's being written to. After all writing the correct bits to the wrong address is a serious failure.
Yeah, but what about CAMM2? I need MORE ACRONYMS!! ;)
[dead]