Hello all,
Just wanted to show off something I'm working on.
I've been creating a custom virtual machine which runs inside Unity (inspired by the Gameboy emulator someone made a while back).
One issue of it, particularly when testing it, is that the machine code is non-standard (meaning I can't re-purpose existing compilers) and of course that it's very hard to program in raw assembly.
Therefore, I've been designing a custom language and compiler to go with it (I'm writing it in C#/.NET). The language is called Kaleedee (for now). Syntactically it looks a lot like Javascript, but it only has one data type (32 bit integers) and also has memory manipulation features (reading, writing, I'm also toying with the idea of writing a heap implementation).
Anyway, at the moment I'm working on a parser. It's a Top Down Operator Precedence "Pratt" parser (Pratt parsers, by the way, are GODDAMN COOL). This is the test code I have it parsing:
[code]
import( "lib_mem.kld" ); // this doesn't do anything yet, but is meant as a test for the parser and will eventually serve as a "compiler hint" to tell it to fetch the given file, parse it, and substitute that resulting syntax tree.
import( "lib_gfx.kld" );
var player_spriteID = 0;
var player_x = 0;
var player_y = 0;
set_hw_sprite( 0, player_spriteID );
update_hw_sprite( 0, player_x, player_y );
var testResID = get_hw_sprite_res( 0 );
var ptr_test = memalloc( 4 ); // allocate a 4-byte block of memory
// ... do something with it ...
memfree( ptr_test, 4 ); // de-allocate the 4-byte block
// Initialize the given hardware sprite
function set_hw_sprite( hwSpriteNum, resID )
{
var ptr_sprite = VRAM_SPRITE_0 + ( hwSpriteNum * 6 );
memwrite( ptr_sprite, resID, 4 );
}
// Update the given hardware sprite's position
function update_hw_sprite( hwSpriteNum, posX, posY )
{
var ptr_sprite = VRAM_SPRITE_0 + ( hwSpriteNum * 6 );
memwrite( ptr_sprite + 4, posX + 8, 1 );
memwrite( ptr_sprite + 8, posY + 8, 1 );
}
// Clear the given hardware sprite
function clear_hw_sprite( hwSpriteNum )
{
set_hw_sprite( hwSpriteNum, 0, 0, 0 );
}
// Get the resource ID of the given hardware sprite
function get_hw_sprite_res( hwSpriteNum )
{
var ptr_sprite = VRAM_SPRITE_0 + ( hwSpriteNum * 6 );
return memread( ptr_sprite );
}
[/code]
And this is the result of calling ToString on the syntax tree my parser generates:
[code]
import("lib_mem.kld", )
import("lib_gfx.kld", )
var player_spriteID = 0
var player_x = 0
var player_y = 0
set_hw_sprite(0, player_spriteID, )
update_hw_sprite(0, player_x, player_y, )
var testResID = get_hw_sprite_res(0, )
var ptr_test = memalloc(4, )
memfree(ptr_test, 4, )
function set_hw_sprite( hwSpriteNum, resID, ){
var ptr_sprite = (VRAM_SPRITE_0 ADD (hwSpriteNum MULTIPLY 6))
memwrite(ptr_sprite, resID, 4, )
}
function update_hw_sprite( hwSpriteNum, posX, posY, ){
var ptr_sprite = (VRAM_SPRITE_0 ADD (hwSpriteNum MULTIPLY 6))
memwrite((ptr_sprite ADD 4), (posX ADD 8), 1, )
memwrite((ptr_sprite ADD 8), (posY ADD 8), 1, )
}
function clear_hw_sprite( hwSpriteNum, ){
set_hw_sprite(hwSpriteNum, 0, 0, 0, )
}
function get_hw_sprite_res( hwSpriteNum, ){
var ptr_sprite = (VRAM_SPRITE_0 ADD (hwSpriteNum MULTIPLY 6))
RETURN memread(ptr_sprite, )
}
[/code]
Which, really, just looks like a harder-to-read version of the source code, but it was to prove to myself that it was indeed parsing correctly.
It also has minimal error reporting. It will only report one syntax error at a time and then stop parsing, and I'm feeling lazy so I don't think I'll work on error recovery.
So, for instance, if I add this to the end of the above source code:
[code]
this will surely drive the parser crazy ;) but don't worry, it will tell me exactly on which line and column the syntax error is...
[/code]
The parser generates the following error message:
[code]
Expected SEMICOLON, got IDENTIFIER <line: 48, column: 6>
[/code]
There's a lot of things missing still. I'd still like to add if, for, and while loops. Luckily, this is quite a simple matter with Pratt Parsers, as I just have to write some new plugin classes for those expressions and register them with my grammar.
I also still have to compile the damn thing. My custom instruction set is very x86-like, so there should be a wealth of information on how things are done in x86 that should be easy enough to translate over. Syntax trees also simplify this a lot, due to the recursive and modular nature of them.
Anyway, thoughts? Comments?
[editline]16th August 2014[/editline]
Just added support for if statements. This involved adding new token types for >, >=, <, <=, and == symbols to my lexer, as well as plugging in two new classes into my grammar for parsing if expressions.
So now it can parse this code:
[code]
if( val <= 10 )
{
// stub
}
else if( val < 20 )
{
// stub
}
else if( val >= 100 )
{
// stub
}
else if( val < 50 )
{
// stub
}
else if( val == 65 )
{
// stub
}
else
{
// stub
}
[/code]
The output of the Abstract Syntax Tree for this is:
[code]
if( (val LESSTHANOREQUALTO 10) ) {
} else if( (val LESSTHAN 20) ) {
} else if( (val GREATERTHANOREQUALTO 100) ) {
} else if( (val LESSTHAN 50) ) {
} else if( (val EQUALTO 65) ) {
} else {
}
[/code]
EDIT: Just added for loop support. Also appear to have accidentally wiped the "Programming King" rating I got, oh noes!
Also added while loop support.
EDIT 2: Come on somebody respond... this post is getting huge! lol
Working on compiling my code to a custom assembler now.
Legend:
[] = value at memory address (think pointer dereference)
%xxx = CPU register 'xxx'
-y(%xxx) = Value of register 'xxx' minus y
Currently I can compile this code:
[code]
var i = 0;
var x = 10;
var c = x;
i = c;
[/code]
To this:
[code]
push 0
push 10
push [-8(%ebp)]
mov [-4(%ebp)] [-12(%ebp)]
[/code]
EDIT
Hm, that third push operation bugs me. I think I'll have to change it so it first loads the value into a register, then pushes the register value onto the stack. Same with the mov - should first mov memory to register, then register back to memory (don't think it's common for CPUs to be able to directly move memory to memory)
It's late and I'm tired though, so I'll tackle it in the morning.
========
OK, so it's morning and I've been tackling this again.
I've now got it compiling this:
[code]
var x = 5;
var i = 10 + (x * 2);
{
var c = 0;
i = c + 1;
}
[/code]
Which compiles to:
[code]
push 5
push 10
mov %eax [-4(%ebp)]
push %eax
push 2
pop %eax
mul %eax [%esp]
mov [%esp] %eax
pop %eax
add %eax [%esp]
mov [%esp] %eax
pop %eax
push %eax
push 0
mov %eax [-12(%ebp)]
push %eax
push 1
pop %eax
add %eax [%esp]
mov [%esp] %eax
pop %eax
mov [-8(%ebp)] %eax
add %esp 4
[/code]
It's got support for some simple scoping rules, so for example a code block like { ... } will push a nested scope onto the symbol table's scope stack (so it inherits whatever symbols were defined in the previous scope), and will also clean up after itself (deallocate any local variables) which you can see in the very last instruction above - it added 4 bytes to the stack pointer, to essentially "erase" the c variable it pushed onto the stack.
So this is valid code:
[code]
{
var c = 0;
var i = c + 5;
}
[/code]
But this is not:
[code]
{
var c = 0;
}
var i = c + 5;
[/code]
And will throw the following error: "Symbol not defined in this scope: c <line: 4, column: 1>"
This should prove useful for functions, as the code block { ... } will automatically clean up any local variables (and then the calling code will clean up the function parameters, X86 style)
========
I've now got it compiling functions, function calls, and constants (actually, it doesn't |compile constants at all - that happens during the parsing phase)
So now it's compiling this code:
[code]
const var TEST = 30;
const var TEST2 = -( TEST + 1 );
function foo( x, y, z )
{
var c = x + y + z;
return c;
}
var i;
i = foo( 10, 20, TEST2 );
[/code]
Into this monstrosity:
[code]
_FUNC_foo
push %ebp
mov %ebp %esp
mov %eax [8(%ebp)]
push %eax
mov %eax [12(%ebp)]
push %eax
pop %eax
add %eax [%esp]
mov [%esp] %eax
mov %eax [16(%ebp)]
push %eax
pop %eax
add %eax [%esp]
mov [%esp] %eax
pop %eax
mov %eax %eax
push %eax
mov %eax [-4(%ebp)]
add %esp 4
pop %ebp
ret %eax
add %esp 4
pop %ebp
ret 0
push 0
push -31
push 20
push 10
call _FUNC_foo
add %esp 12
mov [-4(%ebp)] %eax
[/code]
========
Just added BEGINASM/ENDASM keywords, in between those keywords you can hand-write assembly and it will essentially be "pasted" directly into the output assembly. Like, for instance, if I wanted memory access functions for reading from and writing to locations in memory:
[code]
BEGINASM
; function which writes a value to memory
_FUNC_memwrite:
push %ebp
mov %ebp %esp
pop %eax ;pop the pointer off of the stack
pop %ebx ;pop the value off of the stack
mov [%eax] %ebx ;copy value to memory location
pop %ebp
ret 0
; function which reads a value from memory
_FUNC_memread:
push %ebp
mov %ebp %esp
pop %ebx ;pops the pointer off of the stack
mov %eax [%ebx] ;copy value from memory to register
pop %ebp
ret %eax
ENDASM
var i = 10;
var ptr_test = 0;
memwrite( i, ptr_test );
var x = memread( ptr_test );
var i_equals_x = ( x == i ); // if i was written to memory correctly, and x was read from memory correctly, this will be TRUE, otherwise FALSE
[/code]
Very Interesting, this is pretty cool.
Haha, forgot I even made this thread.
I actually made a crapton of progress on this before sort of abandoning it. The final version I actually modified to spit out C code instead of assembly for the purpose of Sega Genesis homebrew for use with SGDK. It also had a ton of extra features like classes and structs, variable type information, modules (which sort of act like namespaces, kinda), and more.
So for example the syntax now looks like this:
[code]
include( "genesis.h" );
// structs can store variables, they're like simple containers for data
struct Vector2
{
var x : fix32;
var y : fix32;
}
// classes are like structs, but they can also have methods.
// additionally, they are allocated via 'new' syntax, and are allocated on the heap rather than the stack (internally, this is done via MEM_alloc function)
class TestClass
{
var position : Vector2;
// Initialize game entity
function virtual Init() : void
{
// do stuff
}
}
// classes can inherit from other classes
class TestSubclass : TestClass
{
function override Init() : void
{
super(); // <-- this is used to call the parent method implementation
this->position.x = 50;
}
}
// modules look a lot like static classes - you can put fields and functions in here and access them via Module.[..] syntax.
// at the moment, modules are more like classes than namespaces - so for instance you cannot put structs or classes inside of them (only fields and functions).
module SomeModule
{
var someField : u32;
}
function main() : int
{
var testSubclass : TestSubclass* = new TestSubclass();
testSubclass->Init();
// this is how you access module fields and functions
SomeModule.someField = 42;
while( true )
{
VDP_waitVSync();
}
return 0;
}
[/code]
And "compiles" into this:
[code]
#include <genesis.h>
typedef struct Vector2_
{
fix32 x;
fix32 y;
} Vector2;
typedef struct TestClass_
{
Vector2 position;
void (*Init) ( struct TestClass_* this );
} TestClass;
void TestClass_Init( TestClass* this )
{
}
void TestClass_AssignFuncPointers( TestClass* this )
{
this->Init = TestClass_Init;
}
TestClass* new_TestClass()
{
TestClass* newObj = MEM_alloc( sizeof( TestClass ) );
TestClass_AssignFuncPointers( newObj );
return newObj;
}
typedef struct TestSubclass_
{
TestClass base;
} TestSubclass;
void TestSubclass_Init( TestSubclass* this )
{
TestClass_Init(this);
this->base.position.x = 50; // <-- note that it automatically inserts "base." when accessing position. The compiler understands and keeps track of inheritance, unlike C.
}
void TestSubclass_AssignFuncPointers( TestSubclass* this )
{
TestClass_AssignFuncPointers( this );
this->Init = TestSubclass_Init;
}
TestSubclass* new_TestSubclass()
{
TestSubclass* newObj = MEM_alloc( sizeof( TestSubclass ) );
TestSubclass_AssignFuncPointers( newObj );
return newObj;
}
// "module" was sort of meant to be used as an alternative to naming conventions. So for instance if you were to rewrite the VDP_* functions in this language, instead of, say,
// VDP_waitVSync(), you'd have a module named "VDP", and a function inside "waitVSync()", and you'd call it via "VDP.waitVSync();", which internally just translates to VDP_waitVSync();
u32 SomeModule_someField;
int main( )
{
TestSubclass* testSubclass = new_TestSubclass();
testSubclass->Init(testSubclass);
SomeModule_someField = 42;
while( true ){
VDP_waitVSync();
}
return 0;
}
[/code]
EDIT: Facepunch is being weird, ignore the ="keyword"> stuff in there.
That's very cool! Are you gonna release it as open source one day? I'm kinda interested in making this compile to lua bytecode.
For me, technical support doesn't have to be great but when it is, I do find it beneficial. At the same time, cost does come into it because I'm not made of money. I find a happy medium of both works for me.
[url=http://www.esixsigma.co]esixsigma.co[/url]
[highlight](User was permabanned for this post ("Spam" - BANNED USER))[/highlight]
Sorry, you need to Log In to post a reply to this thread.