Clang’s -O0 output: branch displacement and size increase

userbinator • 1 year ago

This reminds me of fasm, the only assembler immediately coming to mind that will do multi-pass branch optimisation by default. Most other assemblers either choose the long form always unless specified explicitly as "jmps" or "jmp short" (and then complain when the target turns out to be too far away), or the short form only if the destination is known when it's encountered (backwards jump).

I've long held the opinion that O0 on all the major compilers should be considered more like an O-1 because of the glaring stupidities it leaves in its output, which almost looks like it was pessimising instead of not optimising.

This article is also only the 2nd time I've seen "relaxation" used in this context. The first was https://news.ycombinator.com/item?id=10219007 over 8 years ago.

o11c • 1 year ago

Despite the fact that you say "all the major compilers", GCC and Clang make very different decisions for each optimization level.

In particular, GCC generates fairly debuggable code at all optimization levels, so there is less motivation for -O0 in the first place.

MaskRay • 1 year ago

There is ongoing work to improve debuggability for optimized code. https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-ne...

``` Mode | Execution Time | Debuggability | Compile Time O0 | 1.0000 | 1.0000 | 1.0000 Og | 0.3439 | 0.5357 | 1.8630 O1 | 0.3082 | 0.4241 | 1.7880 O2g | 0.2823 | 0.4845 | 3.0420 O2 | 0.2514 | 0.3908 | 2.9380 ```

pcwalton • 1 year ago

> This reminds me of fasm, the only assembler immediately coming to mind that will do multi-pass branch optimisation by default. Most other assemblers either choose the long form always unless specified explicitly as "jmps" or "jmp short" (and then complain when the target turns out to be too far away), or the short form only if the destination is known when it's encountered (backwards jump).

Not true, gas will do the relaxation by default.

> This article is also only the 2nd time I've seen "relaxation" used in this context.

It's been the standard term among toolchain developers for quite a while. I remember seeing it all over the linker in 2007 when working on ARM stuff.

ryukoposting • 1 year ago

I'll second this. As a firmware dev, I almost never encounter situations where O0 gets me anything in terms of asm readability/debugging that O1 didn't already give me.

mkup • 1 year ago

NASM has an option (-Ox) to specify how many passes it should take trying to optimize near jumps for short jumps. I usually specify -O9.

MaskRay • 1 year ago

Thanks for mentioning nasm.

Both GNU assembler and LLVM integrate assembler parse and match instructions only once. hey then store an internal representation in memory and perform fixed-point iteration. The section/fragment representation gives a lot of flexibility.

In contrast, nasm parses and matches instructions multiple times depending on the optimization level. It also assigns addresses during parsing and uses an ad-hoc method for JMP/JCC instructions. The end conditions of the fixed-point iteration algorithm (global_offset_changed and stall_count) seem unconventional. -O0 does not "relax all" short jumps to near jumps.

mati365 • 1 year ago

Not the only. Mine assembler (and C compiler), written in TypeScript, does the same despite being painfully slow and useless.

https://github.com/Mati365/ts-c-compiler

dataflow • 1 year ago

Are you aware of -Og? It might be what you want.

usefulcat • 1 year ago

I believe -Og is only meaningful for gcc. IIRC it’s the same as O1 for clang.

zhouzhouyi • 1 year ago

When I debug a program, the first thing I do is to compile with "-O0", very nice to remove "-mrelax-all" as the default for -O0, because "-mrelax-all" increases both VM size and the file size.

MaskRay • 1 year ago

Thanks for posting:)

ezekiel68 • 1 year ago

> people generally care less about -O0 code size.

Right. A ~5% additional increase in debug artifact size is really not a high tax.