Re-added imcc optimizations


-O1 and -O2 were previously disabled with some more imcc options. I enabled and fixed all of them. Even -O2 is now stable.

In the refactoring of the imcc, the internally used pir and pasm compiler, to a better api and the switch from the old parrot frontend to the winxed based parrot we lost most of the previous imcc command-line arguments, and esp. the imcc optimizations -O1 and -O2.
-O1 was considered stable and -O2 unstable and pretty broken.

I added an api call to set the old imcc debugging and optimizer flags again, imcc still can use them, and started fixing the optimizer last week.
As it turned out, -O1 failed one test, and -O2 had 4 major problems.

The -O1 failure was only related to nci calls with a strange side-effect of get_global affecting the branch_cond_loop_swap optimization. See

-O2 does more dynamic optimizations, and was broken in
used_once elimination and constant propagation, which I eventually fixed.

used_once elimination is now only allowed if the register which is used only once in a basic block is only part of a pure functional op. Any side effecting op will need this register, even if it's used only once. So the parser adds now a new op type ITPUREFUNC to the functional ops, which are basically all arithmetic and logical un- and binops. See

Fixing constant propagation was trickier. The bigger issues were missing type checks (I vs N) in the setters, and esp. non-local side-effects by exception handlers.
Effectively push_eh can store the value of a const register, and pop_eh will revert any later changes. As push_eh/pop_eh can occur inside function calls we need to
stop propagating constants over all yield or invokecc calls. See
Note that "constants" in this context are just literal values, not compile-time readonly values.

I also improved some debug functions, see --help-debug.

What is missing is better constant folding, replace ops with only constants args by a constant.

And store results as .pasm, -o file.pasm. Currently only -d10 prints the ops in pasm-like fashion.

Benchmarks: contains my current benchmarks.
The speedup is not big, about 3-5%. Improving method calls and re-adding the jit in the run-time will gain more improvements.

-O1 benchmarks / time `make test` vs `make testO1`

This includes the longer compile times:

time perl t/harness t/benchmark/\*.t; time perl t/harness -O1 t/benchmark/\*.t;

0m33.498s - 0m32.306s

Without compile-time:

for t in t/benchmark/*.pir; do 
  ./parrot -O2 -o $t.O2.pbc $t; 
  ./parrot -O1 -o $t.O1.pbc $t; 
  ./parrot -o $t.O0.pbc $t;
$ time for t in t/benchmark/*.O0.pbc; do ./parrot $t >/dev/null; done
real    0m17.428s
$ time for t in t/benchmark/*.O1.pbc; do ./parrot $t >/dev/null; done
real    0m16.269s
$ time for t in t/benchmark/*.O2.pbc; do ./parrot $t >/dev/null; done
real    0m16.235s

perl t/harness --gc-debug --runcore-tests

perl t/harness --gc-debug -O1 --runcore-tests

perl t/harness -f --runcore-tests

perl t/harness -f -O1 --runcore-tests

perl t/harness -f -O2 --runcore-tests