Why float division is faster than integer division

来自WHY42

Floating point number division is faster than integer division because of the exponent part in floating point number representation. To divide one exponent by another one plain subtraction is used.

int32_t division requires fast division of 31-bit numbers, whereas float division requires fast division of 24-bit mantissas (the leading one in mantissa is implied and not stored in a floating point number) and faster subtraction of 8-bit exponents.

See an excellent detailed explanation how division is performed in CPU.

It may be worth mentioning that SSE and AVX instructions only provide floating point division, but no integer division. SSE instructions/intrinsincs can be used to quadruple the speed of your float calculation easily.

If you look into Agner Fog's instruction tables, for example, for Skylake, the latency of the 32-bit integer division is 26 CPU cycles, whereas the latency of the SSE scalar float division is 11 CPU cycles (and, surprisingly, it takes the same time to divide four packed floats).

Also note, in C and C++ there is no division on numbers shorter that int, so that uint8_t and uint16_t are first promoted to int and then the division of ints happens. uint8_t division looks faster than int because it has fewer bits set when converted to int which causes the division to complete faster.[1] [2]