Making ByteArray faster

The actionscript bytecode generated by FlasCC (previously known as Alchemy) is generally more performant than what you could accomplish using the actionscript compiler. Apart from usage of better data types and instructions, one of the primary reasons behind this was the usage of domain memory by FlasCC, which result in faster read-writes in a memory buffer. Flash and AIR Runtimes already had support to understand these special memory opcodes to make use of the domain memory and result in faster memory access.

Starting AIR 3.6, the Actionscript compiler (ASC2) is also able to generate these fast memory opcodes directly from AS3 code (it was previously only available through FlasCC).

To make a bytearray faster, you just need to assign it to the domainmemory.

 ApplicationDomain.currentDomain.domainMemory = myByteArray; 

And a new package, named avm2.intrinsics.memory, has been added in the compiler to provide instructions to store and load data from the fast bytearray.

package avm2.intrinsics.memory {

        public function li8(addr:int):int; // load 8 bit int
        public function li16(addr:int):int; // load 16 bit int
        public function li32(addr:int):int; // load 32 bit int
        public function lf32(addr:int):Number; // load 32 bit float
        public function lf64(addr:int):Number; // load 64 bit float

        public function si8(value:int, addr:int):void; // store 8 bit integer
        public function si16(value:int, addr:int):void; // store 16 bit integer
        public function si32(value:int, addr:int):void; // store 32 bit integer
        public function sf32(value:Number, addr:int):void; // store 32 bit float
        public function sf64(value:Number, addr:int):void; // store 64 bit float

        public function sxi1(value:int):int;
                  // sign extend a 1 bit value to 32 bits
        public function sxi8(value:int):int;
                  // sign extend an 8 bit value to 32 bits
        public function sxi16(value:int):int;
                  // sign extend a 16 bit value to 32 bits
}

Some important points to note

  • At a time, only one byte array can be assigned to domain memory.
  • You need to set the length of bytearray before assigning it to domain memory.
  • To ensure maximum performance gain, try using the load/store instructions directly instead of creating actionscript wrapper functions on top of them.
  • You should always assign the result of load instruction to a variable. Not doing so results in a verify error and that’s a known bug at the moment.

The following sample code shows how a regular byte array is used, and how one could make it faster.

package
{
	import flash.display.Sprite;
	import flash.system.ApplicationDomain;
	import flash.text.TextField;
	import flash.text.TextFieldAutoSize;
	import flash.utils.ByteArray;
	import flash.utils.getTimer;

	import avm2.intrinsics.memory.li8;
	import avm2.intrinsics.memory.si8;

	public class DomainMemoryTest extends Sprite
	{
		private var BYTE_ARRAY_SIZE:Number = 10000000;
		private var resultsDisplay:TextField = new TextField();
		private var normalByteArrayTime:uint = 0;
		private var fastByteArrayTime:uint = 0;

		public function DomainMemoryTest()
		{
			super();	
			setupDisplay();
			timeNormalByteArray();
			timeFastByteArray();

			resultsDisplay.appendText("\nFast ByteArray is " + (normalByteArrayTime - fastByteArrayTime).toString() + " milliseconds faster.");
		}

		// Write and Read 'numInts' integers in/from the bytearray

		private function timeNormalByteArray() : void
		{
			var ba:ByteArray = new ByteArray();

			var i:int = 0;

			var startTimer:int = getTimer();

			for(i=0; i<BYTE_ARRAY_SIZE; i++)
			{
				ba.writeByte(100);
			}

			ba.position = 0;

			for(i=0; i<BYTE_ARRAY_SIZE; i++)
			{
				ba.readByte();
			}

			var endTimer:int = getTimer();

			normalByteArrayTime = endTimer - startTimer;
			resultsDisplay.appendText("Time to write and read normal ByteArray: " + normalByteArrayTime.toString());
		}

		private function timeFastByteArray () : void
		{
			var ba:ByteArray = new ByteArray();

			// Set the size of bytearray
			ba.length = BYTE_ARRAY_SIZE;

			ApplicationDomain.currentDomain.domainMemory = ba;

			var i:int = 0;
			var valueLoadedFromByteArray:int = 0;			

			var startTimer:int = getTimer();

			for(i=0; i<BYTE_ARRAY_SIZE; i++)
			{
				si8(100, i);
			}
			for(i=0; i<BYTE_ARRAY_SIZE; i++)
			{
				valueLoadedFromByteArray = li8(i);
			}

			var endTimer:int = getTimer();

			fastByteArrayTime = endTimer - startTimer;
			resultsDisplay.appendText("\nTime to write and read fast ByteArray: " + fastByteArrayTime.toString());
		}

		private function setupDisplay():void
		{
			resultsDisplay.x = 10;
			resultsDisplay.y = 10;
			resultsDisplay.border = true;
			resultsDisplay.multiline = true;
			resultsDisplay.wordWrap = true;
			resultsDisplay.autoSize = TextFieldAutoSize.LEFT;
			resultsDisplay.width = 400;
			addChild(resultsDisplay);
		}

	}
}

Running the above code to compare performance showed the following results –

Normal ByteArray Fast ByteArray
Macbook Pro (2.6 GHz, 16GB) 3510 ms 1755 ms
iPhone 3GS 53600 ms 23800 ms

Overall, it indicates that using domain memory turns out to be almost twice as fast as a regular byte array.

Advertisements

24 comments

  1. Woowoo!

    Thanks for sharing! Are the overheads of setting the current working memory significant or are they pretty quick too? By that of course I don’t expect you should change it between every operation but could you swap it over a few hundred times a frame? A few thousand? What I am trying to work out is if it can be abused for smaller data structures like a matrix or something. Will be great for 3D data buffers though, bytearrays upload faster I believe so this will allow for real time manipulations on the CPU. Any ideas on a speed comparison with a Vector.?

    Thanks again for the post. Look forward to having a play in as3.

  2. @cuentapruebas01 are you sueing ASC2, I don’t think FlashDevelop supports it officially yet but I think you can still work around it.
    @phendrax I doubt you can set a socket to the working memory, but you can probably use the writeBytes function and pass in the fast bytearray. You would have to benchmark it though to check if it is worth it.

  3. >> [@ben w] Are the overheads of setting the current working memory significant or are they pretty quick too?

    Not significant, but still not as low as the load/store instructions themselves. And that’s because setting the domain memory is a runtime API, while load/store instructions result in abc opcodes itself.

  4. Pingback: Can AS3 and Alchemy be friends? | Coding on acid.

  5. @sHTiF I know, I’ve done a few Flash libraries using fastmem and still in production thanks to Haxe. However, my understanding is that if your Flash version is bigger to 15, you cannot use Alchemy 1 opcodes. It’s been some time since I last used Haxe, so I don’t know if Haxe sets a version smaller to 16, or if this restriction was removed.

    I’ve never seen an Alchemy 1 vs “2” benchmark either… would love to see it. The same goes for MXMLC vs ASC 2 vs Haxe over several scenarios…

  6. I didn’t use haXe fastmem before and I have no problem using it now with latest flash players requiring stuff above version 16. So I think its actually the same opcodes or they changed it 🙂

  7. Oh noes, it was for workers: “When compiled using the default SWF version of 18 FlasCC will attempt to run the code in a background worker”

  8. Pingback: Anonymous

  9. Pingback: An ASC 2.0 Domain Memory Opcodes Primer « JacksonDunstan.com


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s