Re-added imcc optimizations

Summary

-O1 and -O2 were previously disabled with some more imcc options. I enabled and fixed all of them. Even -O2 is now stable.

In the refactoring of the imcc, the internally used pir and pasm compiler, to a better api and the switch from the old parrot frontend to the winxed based parrot we lost most of the previous imcc command-line arguments, and esp. the imcc optimizations -O1 and -O2.
-O1 was considered stable and -O2 unstable and pretty broken.

I added an api call to set the old imcc debugging and optimizer flags again, imcc still can use them, and started fixing the optimizer last week.
As it turned out, -O1 failed one test, and -O2 had 4 major problems.

The -O1 failure was only related to nci calls with a strange side-effect of get_global affecting the branch_cond_loop_swap optimization. See https://github.com/parrot/parrot/issues/1037

-O2 does more dynamic optimizations, and was broken in
used_once elimination and constant propagation, which I eventually fixed.

used_once elimination is now only allowed if the register which is used only once in a basic block is only part of a pure functional op. Any side effecting op will need this register, even if it's used only once. So the parser adds now a new op type ITPUREFUNC to the functional ops, which are basically all arithmetic and logical un- and binops. See https://github.com/parrot/parrot/issues/1036

Fixing constant propagation was trickier. The bigger issues were missing type checks (I vs N) in the setters https://github.com/parrot/parrot/issues/1042, and esp. non-local side-effects by exception handlers.
Effectively push_eh can store the value of a const register, and pop_eh will revert any later changes. As push_eh/pop_eh can occur inside function calls we need to
stop propagating constants over all yield or invokecc calls. See https://github.com/parrot/parrot/issues/1044.
Note that "constants" in this context are just literal values, not compile-time readonly values.

I also improved some debug functions, see --help-debug.

What is missing is better constant folding, replace ops with only constants args by a constant.

And store results as .pasm, -o file.pasm. Currently only -d10 prints the ops in pasm-like fashion.

Benchmarks:

https://github.com/parrot/parrot/issues/1037#issuecomment-36274524 contains my current benchmarks.
The speedup is not big, about 3-5%. Improving method calls and re-adding the jit in the run-time will gain more improvements.

-O1 benchmarks / time `make test` vs `make testO1`

This includes the longer compile times:

time perl t/harness t/benchmark/\*.t; time perl t/harness -O1 t/benchmark/\*.t;

0m33.498s - 0m32.306s

Without compile-time:

for t in t/benchmark/*.pir; do 
  ./parrot -O2 -o $t.O2.pbc $t; 
  ./parrot -O1 -o $t.O1.pbc $t; 
  ./parrot -o $t.O0.pbc $t;
done
$ time for t in t/benchmark/*.O0.pbc; do ./parrot $t >/dev/null; done
real    0m17.428s
$ time for t in t/benchmark/*.O1.pbc; do ./parrot $t >/dev/null; done
real    0m16.269s
$ time for t in t/benchmark/*.O2.pbc; do ./parrot $t >/dev/null; done
real    0m16.235s

perl t/harness --gc-debug --runcore-tests
1m17.267s

perl t/harness --gc-debug -O1 --runcore-tests
1m18.012s

perl t/harness -f --runcore-tests
1m16.640s

perl t/harness -f -O1 --runcore-tests
1m17.022s

perl t/harness -f -O2 --runcore-tests
1m16.902s