You can have Dynamic C generate a .LST file showing the generated assembly for those statements, and the Instruction Set Reference from the Help menu will show you how many clock cycles each assembly op code requires.
With the BitWrPortI() function, you're loading a byte from memory (the shadow register), updating a bit in it, saving the byte, and writing it out to an I/O register.
If you're running within the Dynamic C debugger, the gaps in activity could be due to communication between the debug kernel and Dynamic C. You also have a periodic interrupt updating MS_TIMER and SEC_TIMER and hitting a watchdog for you.
You could move your test code to a function labeled "__nodebug" and see how that improves performance.
I'm not sure of the electrical issues, other than to look at PA1CR to control drive strength, output slew and whether the pin is configured with a pullup/down resistor.
I assume you're just doing some benchmarks at this point, but if you actually need fast digital I/O you should consider using bare assembly, or configuring an I/O pin for PWM output.