Hello LLVM

LLVM dragon

Contents

What is LLVM?

Ingredients

Write Assembly Code

Build the Executable

Debrief

Conclusion

Extra Credit

Resources

What is LLVM?

LLVM1 is a lightweight, type safe, highly optimized, platform independent assembly language. LLVM can accept input from almost any high level language, including Ada, C, C++, Python, Ruby, and Haskell, and convert that to an executable. According to the LLVM Project Blog2, LLVM has reduced Haskell program runtimes by 30%. LLVM will soon replace the backend compilers in GHC, and GCC will be replaced by Clang. You have to see it to believe it.

Ingredients

You can write your first LLVM assembly program today. But first you need to install LLVM.

Mac OS X

  1. Install Xcode3.
  2. Install Homebrew4.
  3. Run brew install llvm.

Windows

  1. Install MinGW5.
  2. Open Start -> Programs -> MinGW -> MinGW Shell.
  3. Run mingw-get install binutils.
  4. Run mingw-get install gcc.
  5. Download LLVM Binaries for Mingw32/x866.
  6. Move llvm-2.9-x86-mingw32.tar.bz2 to C:\MinGW\bin\.
  7. Run cd c:/mingw/bin/.
  8. Run bunzip2 llvm-2.9-x86-mingw32.tar.bz2.
  9. Run tar xvf llvm-2.9-x86-mingw32.tar.

Linux

If you're not using Ubuntu or Debian, substitute your package manager.

  1. Run sudo apt-get install build-essential.
  2. Run sudo apt-get install llvm.

Write Assembly Code

Fire up your favorite text editor and save two files. Some browsers will automatically add .txt to the filename; simply rename the files to Makefile (no extension) and hello.ll.

The Makefile7 helps assemble, convert, and compile the LLVM code into an executable.

Makefiles require hard tabs, not spaces. If you get an error like Makefile:4: *** missing separator. Stop., rewrite the file to use hard instead of soft tabs.

EXECUTABLE=hello

all: hello.ll
	llvm-as hello.ll
	llc hello.bc
	gcc -o $(EXECUTABLE) hello.s
	./$(EXECUTABLE)

clean:
	rm $(EXECUTABLE)
	rm hello.s
	rm hello.bc

The program hello.ll8 will print the message "Hello World!" to the terminal.

@msg = internal constant [13 x i8] c"Hello World!\00"

declare i32 @puts(i8*)

define i32 @main() {
	call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @msg, i32 0, i32 0))
	ret i32 0
}

Build the Executable

If you're using Windows, open the MinGW Shell. Hello must be compiled inside a MinGW terminal, but the finished executable can be run normally from the Command Prompt.

Mac OS X and Linux users can compile from an ordinary terminal.

Replace HELLO-FILES with the directory containing Makefile and hello.ll.

$ cd HELLO-FILES
$ make
llvm-as hello.ll
llc hello.bc
gcc -o hello hello.s
./hello
Hello World!

The resulting binary is small and fast.

$ ls -l hello
-rwxr-xr-x  1 andrew  staff  8720 Mar  4 01:04 hello

$ time ./hello
Hello World!

real	0m0.003s
user	0m0.000s
sys	0m0.002s

The Mac OS X executable is 8720 bytes, or 8.5KB.

And you read the zeroes correctly. On a MacBook Pro 5,1 the operating system spent real - (user + sys) = 0.001 seconds switching between hello and other processes: the terminal, the web browser, and anything else running on the computer. Hello took user + sys = 0.002 seconds to run, or 2 milliseconds. That's a new class of speed, in a world where most programming environments take a few seconds just to boot up. Hello LLVM, Goodbye JVM!

Debrief

Let's examine hello.ll in detail.

Local variables begin with %, and global variables begin with @, so @msg is a global variable. Global meaning, "a local symbol in the object file", because it is set to an internal constant. The docs9 compare it to a static variable in C.

LLVM has syntax for zero-, one-, two-, and multi-dimensional arrays. [13 x i8] is a one-dimensional array of 13 8-bit integers. An 8-bit integer is also a byte, or an ASCII character. That makes an array of 13 characters. The array is occupied by c"Hello World!\00", a C string with 13 characters, including the \00 null byte at the end. It is there to signal string functions to stop processing; otherwise they would continue looking for bytes past the string and interpret any random data there as more characters. It's a double 00 because LLVM allows escaped hexadecimal in strings. \0A would be a newline.

That's a lot of code just to set the string "Hello World!". LLVM is verbose, but no more verbose than any assembly language. LLVM is type safe; it won't let you compile a program that sends the wrong type of data to functions, because that could crash. Specifying types takes a little time, but it makes programs more robust. And more readable too—assembly languages typically have long10 series of uniform, ambiguous instructions like:

mov edx, len
mov ecx, msg
mov ebx, 1
mov eax, 4

If you don't know what eax, ebx, ecx, and edx are, it's hard to guess. Not so in a typed language; at least you'd know whether a variable was an i8 (byte), i32 (int), or i1942652 (integer with over 1M bits). Yeah, LLVM allows integers of crazy-long bit widths11.

puts is not defined in hello.ll. It's an external function defined in C's stdio library12, referenced by the syntax declare i32 @puts(i8*). puts takes a byte pointer as input, prints the string to the terminal, and returns a 32-bit integer. If the string could not be printed, puts returns EOF.

define i32 @main() { ... } defines the main function to be called when the program is run. Main takes no input and returns a 32-bit integer to be described shortly.

call i32 @puts executes the puts function. If we wanted to save the output of puts, we could write %output = call i32 @puts.

i8* getelementptr inbounds ([13 x i8]* @msg, i32 0, i32 0) is somewhat difficult to read. The getelementptr() function is notoriously dense, so misunderstood that the LLVM docs have a manual13 to unravel its mysteries. i8* is the return type. inbounds adds some restrictions to computing addresses; it does not protect against array-index-out-of-bounds errors. i32 0, i32 0 are the offsets for computing field addresses.

LLVM's type system14 allows us to declare our own types. Here is hello.ll with more C-like syntax:

%char = type i8
%int = type i32

@msg = internal constant [13 x %char] c"Hello World!\00"

declare %int @puts(%char*)

define %int @main() {
	call %int @puts(%char* getelementptr inbounds ([13 x %char]* @msg, %int 0, %int 0))
	ret %int 0
}

More complicated types can be declared, including arrays, vectors, unions, and structs15.

Back to hello.ll. The last thing main() does is ret i32 0, equivalent to C's exit(0). This informs the terminal that the program succeeded. If we wanted to signal an error, we would write ret i32 1. Try it, and rerun make. You'll see:

$ make
llvm-as hello.ll
llc hello.bc
gcc -o hello hello.s
./hello
Hello World!
make: *** [all] Error 1

Conclusion

You've now:

You used more time and effort than a Python Hello World, print "Hello World!", would require, but you learned about programming internals. And you made an executable that runs 1250% faster. A stich in time saves runtime.

Extra Credit

If you want to learn more, start by writing C programs and reading the resulting assembly.

$ cat hello.c
#import 

int main() {
   printf("Hello World\n");

   return 0;
}
$ llvm-gcc -emit-llvm -S hello.c
$ cat hello.s
; ModuleID = 'hello.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-apple-darwin10.6"

@.str = private constant [12 x i8] c"Hello World\00", align 1 ; <[12 x i8]*> [#uses=1]

define i32 @main() nounwind ssp {
entry:
  %retval = alloca i32                            ; <i32*> [#uses=2]
  %0 = alloca i32                                 ; <i32*> [#uses=2]
  %"alloca point" = bitcast i32 0 to i32          ; <i32> [#uses=0]
  %1 = call i32 @puts(i8* getelementptr inbounds ([12 x i8]* @.str, i64 0, i64 0)) nounwind ; <i32> [#uses=0]
  store i32 0, i32* %0, align 4
  %2 = load i32* %0, align 4                      ; <i32> [#uses=1]
  store i32 %2, i32* %retval, align 4
  br label %return

return:                                           ; preds = %entry
  %retval1 = load i32* %retval                    ; <i32> [#uses=1]
  ret i32 %retval1
}

declare i32 @puts(i8*)

Resources

  1. http://llvm.org/
  2. http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html
  3. http://developer.apple.com/TOOLS/Xcode/
  4. http://mxcl.github.com/homebrew/
  5. http://chocolatey.org/packages/mingw
  6. http://llvm.org/releases/download.html
  7. https://github.com/mcandre/hello-llvm/raw/master/Makefile
  8. https://github.com/mcandre/hello-llvm/raw/master/hello.ll
  9. http://llvm.org/docs/LangRef.html#linkage_internal
  10. http://asm.sourceforge.net/intro/hello.html
  11. http://llvm.org/docs/LangRef.html#t_integer
  12. http://www.elook.org/programming/c/puts.html
  13. http://llvm.org/docs/GetElementPtr.html
  14. http://llvm.org/docs/LangRef.html#namedtypes
  15. http://llvm.org/docs/LangRef.html#t_struct