are you running the STM also at 40MHZ?
What I normally do is I make a while 1 loop which set the pin High then 2 NOP cycles the set the Pin low and again 2 NOP cycles
only this and you can more or less see the what actually is going on ,for the exact performance you can not actually measure this way as the NOP plus the code to toggle the pins does use time
Use plain C not Flowcode components or even assembler if you want to compare performance ,search the MCP forum on this topic you will get some good info .
I am quite sure at the same clock speeds the STM will not out power the MIPS core
