First, noise is simply the variation of measurement between pixels in an image. When you see neighboring pixels with different brightness levels this is what we perceive as noise. In photographs this pixel-to-pixel variation comes from two primary sources - light itself and the electronics that measure that light.
Light is inherently noisy because it arrives in discrete, random packets, so each pixel in an exposure sees a different amount of light. Like all random events, the more events you can count in a given amount of time, the less variation you'll see between each event/pixel (
poisson distribution). This is why you want a sensor to capture as much light as possible.
The light captured on a sensor is represented as an electrical charge - the random photon packets arriving at the sensor are converted to photo-electrons, which is later measured and converted into digital brightness values. There is a ratio between the amount of photons captured and the resulting electrical charge it produces - this is based on the "conversion gain" of the pixel. When gain is low, a lot of light is required to produce a given electrical charge - when gain is high, the same electrical charge is achieved with less light. A pixel has a limit to how much charge it can hold, so the gain must be carefully selected to not overflow that capacity for a given shooting situation.
When light is abundant or you have the luxury of long shutter speeds, you want a pixel with a low conversion gain. That way it can capture a lot of photons before exceeding its electrical holding capacity.
When light is scarce, a large holding capacity doesn't provide a benefit because the pixels are not going to receive enough light anyway to fill that capacity.
So why not just always use a large holding capacity for both situations, with one providing a benefit and the other being neutral? It's because there's a cost associated with a large holding capacity - less precision in counting the exact number of photons, which means more noise due the rounding errors in counting fewer photons. But didn't I say above that more light means less noise? Yes, but when there's a lot of light, the additional noise from imprecise counting of photons is outweighed by the greater reduction of noise from lowering the variation in the number of arriving photons. As to why measuring a larger capacity is less precise, think of it like measuring rain water in a large bucket vs a measuring cup - which offers more precision, esp for smaller amounts?
The break-even point of when noise from the precision loss of measuring a large holding capacity outweighs the benefit from lower variance of counting more total photons is a function of the total light available to be captured in a scene. When available light is low you want a lower holding capacity, ie greater conversion gain between photons and charge, otherwise the noise from precision loss of counting a large holding capacity adds insult to the additional noise you get from total fewer photons counted.
So you want higher conversion gain in low-light situations, ie High ISO.
But real-life scenes are not binary between bright and dark. Many scenes have both bright and dark areas - in other words, high dynamic range. In that situation the same rules above apply but selectively to the specific tones in the image - for the midtones and higher the number of photons counted is more important for that end of the dynamic range (total charge capacity to hold more stops of light) - for the deep shadows the precision of photons counted is more important for the other end of dynamic range (to distinguish the small amounts of light captured from the counting noise).
Here's a chart to help visualize the balance between these two noise sources - lower "photon shot noise" for total light captured vs lower "read noise" for more precise counting of photons in the shadows. The x-axis represents the brightness level in the scene, left being pure black and right being pure white. The y-axis represents the noise, with the contribution from shot noise in red and read noise in blue.
Shot noise vs read noise for a base ISO image on the Sony A7 III sensor
The solution is a pixel which can function in both modes - a low-conversion gain (high capacity), when light is abundant. And a high-conversion gain (low capacity), when light is scarce. That is what a dual-gain sensor provides.