I don't think you can call it an image before demosaicing.
Unfortunately this view tends to mystify raw data more than it is adequate. Why not call it an image? It's just semantics, defining it not to be an image. Actually it is not a completely definied RGB image and demosaicing is the interpolation of the missing color information.
Here is an image, processed in RawTherapee with demosaicing mode "none". This assignes basically the value of a pixel to the channel it represents, while the other channels remain zero.
Another representation would be to assign the intensity value of the raw pixel to all RGB channels, resulting in a gray scale image:
Both versions may also be considered as a very specific but trivial kind of interpolation, since in both cases all RGB values are set. However, in particular the gray scale version can be seen as the direct representation of basically unaltered intentisity values saved by the sensor. And this is actually the input of of a demosaicing algorithm.