pON: Penguin's Object Notation (Developer Release)

pON: Penguin’s Object Notation
What is it?
pON was born out of my frustration with existing encoders which I found to be too complicated for their own good, inefficient, or simply prone to errors and unable to properly resent many data structures. With pON I attempted to fix all of this. pON is a new encoder capable of representing almost any lua data structure. It even supports pointers so if you have two references to the same string or table it will only be encoded once and a pointer will be encoded the second time. It is NOT designed to be human readable though it does avoid use of binary characters meaning it is MySQL compatible.

FEATURES

  • Fast: pON is faster than both JSON and vON when encoding the test case provided in vercas’ thread aswell as a few test cases I designed myself (including worst case scenario cases)
  • Short: pON uses signifigantly less code than vON making it less of a hastle to include in your project. The code is carefully structured for readability and simplicity, it should be fairly easy to understand if you know what you are doing.
  • Pointers: pON can handle pointers. This means that if it finds multiple references to the same value the value will only be encoded once and a pointer will be put in it’s place the second time. This is immensely advantageous for long table keys since they will be automatically shortened. It can also be used to create extremely complex data structures with table references.
  • Tables for keys: well i say tables but I really mean anything. pON can encode ANY lua data type as a key matched to a value pair.

SPECIFICATIONS
All tables are wrapped with {} brackets. The { marks the beginning and the } marks the end.
Strings begin with the " symbol and end with ";. The algorithm is actually scanning for the ; the quote is just there for string escaping purposes and because it looks nice.
Pointers are formatted as (id) eg: (1) would be the pointer to the first object.
Booleans (true/false) are represented by two separate data types. One for true and one for false, this allows a single char to be used for each state. t = true, f = false.
Numbers are formatted as nvalue; eg: n1000;
Angles are formatted as apitch,yaw,roll; eg: a0,0,0;
Vectors are formatted as vx,y,z; eg: v0,0,0;
Entities are formatted as Eentindex; eg: E1; Null entities are denoted as E#
Players are formatted as Pentindex eg: P1; player with ent index 1.

DATA TYPES SUPPORTED
TABLES - keys and values [pointers supported]
STRINGS - keys and values [pointers supported]
**NUMBERS **- keys and values
**BOOLEANS **- keys and values
**ANGLES **- keys and values
**VECTORS **- keys and values
**ENTITIES **- keys and values
**PLAYERS **- keys and values

USAGE

VIEW ON GITHUB HERE

COMPARISON
Running the following test case (same used by vercas in his benchmarking)



local test6 = {
		1, -1337, -99.99, 2, 3, 100, 101, 121, 143, 144, "ma\"ra", "are", "mere",
		{
			500,600,700,800,900,9001,
			TROLOLOLOLOLOOOO = 666,
			[true] = false,
			[false] = "lol?",
			pere = true,
			[1997] = "vasile",
			[{ [true] = false, [false] = true }] = { [true] = "true", ["false"] = false }
		},
		true, false, false, true, false, true, true, false, true,
		[1337] = 1338,
		mara = "are",
		mere = false,
		[true] = false,
		[{ [true] = false, [false] = true }] = { [true] = "true", ["false"] = false }
	}
	
	local last, start, length = test6, SysTime(), 0;
	for i = 1, 5000 do
		local s = pon.encode(last)
		length = string.len( s );
		last = pon.decode(s)
	end
	local finish = SysTime( );
	print("pON encoded/decoded test6 5000 times in "..( finish - start ).." seconds.");
	print(" ENCODED LENGTH: "..length );
	
	local last, start, length = test6, SysTime(), 0;
	for i = 1, 5000 do
		local s = von.encode(last)
		length = string.len( s );
		last = von.decode(s)
	end
	local finish = SysTime( );
	print("vON encoded/decoded test6 5000 times in "..( finish - start ).." seconds.");
	print(" ENCODED LENGTH: "..length );


the result…



pON encoded/decoded test6 5000 times in 0.26037518128624 seconds.
 ENCODED LENGTH: 248
vON encoded/decoded test6 5000 times in 0.30716586786775 seconds.
 ENCODED LENGTH: 287


showing a speed improvement of approximately 0.04 seconds or 15% over vON which was the fastest alternative I know of for testing against.

Encoded using pON the actual outputted string looks something like this…


{n1;n-1337;n-99.99;n2;n3;n100;n101;n121;n143;n144;"ma"ra";"are";"mere";{n500;n600;n700;n800;n900;n9001;~f"lol?";tf"pere";tn1997;"vasile";{~fttf}{~"false";ft"true";}"TROLOLOLOLOLOOOO";n666;}tfftfttft~tfn1337;n1338;{~fttf}{~(11)ft(12)}(4)f"mara";(3)}

DOWNLOAD RECOMMENDED VERSION HERE
DOWNLOAD DEVELOPMENTAL VERSION HERE

If you use this in a project please post about so I can check it out. If you have any suggestions for optimizations, general improvements, or data types I should support let me know!
btw if there is enough demand for it I could make a binary version of this with number compression to base 256.

Gorgeous, simply gorgeous! I will however need to stick with JSON due to the fact that php can decode JSON into an array.

Keep up the good work!

If you use any of the BLOB datatypes in MySQL you can store binary characters, just not in VARCHAR/CHAR/TEXT datatypes. This is what I did for saving GLON values in MySQL.

Other than that, good work! Might have to swap out vON for this if you do decide to make the swap to binary.

A baby dies every time you store encoded information in a MySQL database.

I could do this… Would you like to see binary implemented as a spectate lib maybe bON (binary object notation) using the same general structure in terms of the way it handles encoding but taking advantage of binary base 256 for greater data efficiency? (I cold do stuff like store the types for both the key and the value in a single char. 4 bits for the first and 4 bits for the second) I’ll also imement binary number encoding (the biggest feature I imagine your looking for) I’m thinking this would be best implemented as a set of data types somewhat like the ones used in MySQL actually, different formats for int, tinyint, float, and double. Doing this would allow me to predict the exact number f chars used to represent each value and thereby avoid unnecessary string iteration.

If you guys have any ideas for further optimizations I can make let me know, I want to make this the fastest and most robust serialization library available for Gmod lua. Also if like it please comment with your thoughts as it encourages my development and helps others find this.

Touché.
You get a ‘Winner’ 'til I manage to beat your performance.

I hope you tested with the latest version of vON, by the way. :wink:

Ran the same tests on the most recent version (literally just downloaded it xD



pON encoded/decoded test6 5000 times in 0.26008640942746 seconds.
 ENCODED LENGTH: 248
vON encoded/decoded test6 5000 times in 0.30657166423771 seconds.
 ENCODED LENGTH: 256


pON still in the lead though your most recent version is shorter than before still 8 chars longer.
Let the encoder competitions begin! xD

I am very interested in this… This sort of “competition” is awesome.

Can’t wait for the info relating to the latest version of vON.

good thing facepunch isn’t communist…

Tbh a lot of the european userbase actually are

You may want to add support for other entity types like vON has.
Bare in mind all entities including players should be able to go through the same encode/decode function since they all use Entity.

I wasn’t aware I would need to do anything more than simply encoding the entity index and then retrieving the entity with the same index at a later point? I think that process should work for any entity or am I missing something here?

EDIT:
Nvm realized what I was missing. I’ll add support for those extra types in a commit to the SVN later tonight. Thanks for pointing this out to me!

[editline]23rd February 2014[/editline]

UPDATE

  • added support for extra entity types as requested.
  • removed unnecessary ~ character when k/v pair section is empty.

You shouldn’t need the player decode now, there also seems to be an error decoding since the last update.



[ERROR] lua/includes/modules/pon.lua:179: bad argument #2 to 'sub' (number expected, got table)


Sorry about that, wow that was a stupid mistake on my part. It is now fixed in the most recent version. (Forgot to return the current index pointer, missed it in testing somehow)

Also though the player decoder isn’t needed I’m leaving it in there for backwards compatibility. Maybe I’ll remove it a few versions down the line.

Update:

  • improved string serialization adding a separate data type that is used when string escaping isn’t needed (saves one char, it’s rather significant on single letter keys/values)
  • added better number support by removing type prefix on numbers and instead recognizing them by the first letter of the number (0-9 or -)
  • added separate data type for handling of tables with key value pair component only (no array part)

Now approximately 4/5ths the length of Von and still 15% or more faster (heading towards 20 though I havn’t done exact benchmark comparisons recently)

This is brilliant, but unless someone makes a php version it is kinda useless to me. :frowning: