Step 5 — Lowering arithmetic and comparisons

Binary integer operations

The mapping is essentially one-to-one with TAC:

TAC OpLLVM instruction
Addadd i64 a, b
Subsub i64 a, b
Mulmul i64 a, b
Divsdiv i64 a, b
Modsrem i64 a, b
Andand i64 a, b
Oror i64 a, b

Signed vs unsigned: we use sdiv and srem (signed) because MiniLang's Number is signed-ish in spirit. The s/u/f prefix on LLVM arithmetic is a frequent source of bugs:

  • add — no prefix; signedness doesn't matter (two's complement).
  • mul — no prefix; same reason.
  • sdiv / udiv — different result for negative operands.
  • srem / urem — likewise.
  • fadd / fmul / fdiv — floating point.
  • shl — no prefix; lshr (logical) / ashr (arithmetic) for right shift.

The nsw / nuw flags (no signed wrap / no unsigned wrap) on arithmetic let the optimiser assume overflow is impossible. We don't emit them — being conservative — but a real frontend should track this from the source language's overflow semantics.

Unary

  • Neg becomes sub i64 0, %a. There is no dedicated neg instruction.
  • Not (boolean negation) becomes icmp eq i64 %a, 0 followed by zext i1 ... to i64.

Comparisons

%v0 = icmp slt i64 %a, %b       ; signed less-than
%v1 = zext i1   %v0 to i64

icmp returns i1. To use the result as our uniform i64 value, we zext (zero-extend) to i64. If we stored booleans as i1 throughout we wouldn't need the zext — but every other operation would then need to widen back to i64 for arithmetic.

The condition mnemonics:

TACLLVM icmp cond
Eqeq
Nene
Ltslt
Lesle
Gtsgt
Gesge

The s prefix is for signed comparison. ult, ule, etc. are unsigned. eq and ne don't have a sign because they don't need one — bitwise equality is the same either way.

The zext / trunc dance

icmp always produces i1. Storing or arithmetic always wants i64. Branching on a value always wants i1 again.

%v0 = icmp slt i64 %a, %b      ; i1
%v1 = zext i1   %v0 to i64     ; i64

; ... later, used as a branch condition: ...
%v2 = icmp ne i64 %v1, 0       ; back to i1
br i1 %v2, label %T, label %F

This back-and-forth is what you pay for using i64 as the uniform value type. LLVM's instcombine cleans most of it up:

icmp ne (zext T to i64), 0   →   T

So after opt -O1 the i64 round-trip vanishes entirely.

Why we don't use fadd / fmul

MiniLang numbers are doubles in the interpreter, but we lower to i64 for simplicity. To handle floats properly:

  • Pick double as the uniform type instead of i64.
  • Replace addfadd, sdivfdiv, icmpfcmp.
  • fcmp predicates have an ordered/unordered distinction (oeq, ueq, olt, ult, ...) because NaN can fail every comparison.
  • Print with %g or %lf format.

cp-14 will introduce a tagged value type that handles both i64 and double, with a runtime dispatch on the tag bits.