I had looked at the details for both images. It is a bit difficult to decide what factors are at play. I don't think the hand holding had much effect on either though. If you examine the small antennas on the building tops near the focal points of the images they seem to be clean single lines at 1:1. That would seem to dismiss camera movement from the factors, for me any way. The first is at a lower ISO but has more noise visible at 1:1 so that contributes to the 'not sharp' feeling although there is still ample detail visible at the focus point. I am assuming that the increased noise is from the processing since that was shot later than the second so likely less light. I think the f/1.8 aperture used for both and the depth of field at that aperture is a large contributor to the 'not sharp' feeling but that lens is also not known for being stellar for sharpness so I am going to go out on a limb and say that may be the largest contributor to that feeling of lack of critical sharpness. Still, I like the atmosphere and content of both images. I am not sure that there is any visible reason to choose on camera over the other from this pair of images. Shooting with the EM1 and the EM5 (both original versions) I greatly prefer the ergonomics of the EM1 over the 5. I fully can appreciate choosing to have two bodies that are the same, especially if you have any reason to prefer one over the other. Given your inclination for street shooting I think I would go with the Pen F over the other body myself.
Andrew