In practice, you won't find any processor where different pipeline stages take different amounts of time - that would be a nightmare to implement. In practice, every pipeline stage takes one clock cycle.
"Latency" is the time from the start of the instruction to the point where the result can be used. For example, it takes some time from starting execution of an instruction x = y * z until an instruction a = b + x can start, because the result of the first instruction must first be available. That is not necessarily the execution time of the first instruction. There may be additional time needed to move the result of the multiplication into the register x, if the processor is clever enough in a = b + x to use the result of the multiplier as soon as it is available, instead of insisting on reading the register x.
Latency is important when you have a long sequence of instructions where each is dependent of the result of the previous one, and you don't have any other instructions that can be executed at the same time.