Har, a new toy!

S. P. Gardebiter · Aug 2, 2008

Ahh great :3
Seems hard though D:
Will this thing be able to compile Z80 or 68k code too?

RuneLancer · Aug 3, 2008

Edit 1: screenshot.
Edit 2: release.
Edit 3: updaet lol.

ANYTHIIING. o_O

Seriously.

I haven't posted about this in a while, and things have been quickly changing. This project has now become a universal editor. I'm putting together a scripting language that allows you to easily define the format the data is in (be it assembly, some obscure packed binary format, or even image data) and how to represent it.

Currently I'm laying down the official syntax for this scripting language. I expect to be more or less done by tonight, and then I can write the one single parser powering this thing (as opposed to the 6 different parsers I would've had to write before: assembly/script/data compiler/decompiler.)

BASICALLY, it goes a little like this. Pretty much all steps are optional, but if you don't write a certain script, you can't use that functionality. There are no "default" scripts.

EDIT: I'm done writing down the official syntax. That was faster than expected. Progress is good. Time to write the parser! Anyway. I had an example of each section, but that's no longer valid. Here's the manual.

Code:

FORMAT

Syntax:	"format"
	"{"
		{ type name }
	"}"

A format is used to define what data is going to be extracted or inserted
into the file. Format variables are static and must be prefixed by a star
("*") when referenced.

The following types are available.

	byte	1 byte
	word	2 bytes
	long	4 bytes

The order of the entries is respected. The first item will be output in the
file first and the last item will be output in the file last.

	Ex:	format
		{
			byte health
			byte max_health
			word experience
		}

		This structure contains a byte (health), another byte
		(max health), and a word (experience.)

Formats are generally used by data-based scripts. They have little use with
the other formats.



HEADER

Syntax:	"header"
	"{"
		{ parameter value }
	"}"

The header section is used to define certain script parameters.

	base_offset	The suggested starting offset, if the script has a
			specific use.
	comment_char	Which character to use for script comments.
			Defaults to ";"
	definition	A definition file to import. Multiple definitions can
			be specified.
	entry_count	How many entries are editable. If 0 or absent,
			no limit is set.
	entry_size	How big an entry is.
	entry_label	A label name to use instead of an index
			(ex, "Dagger" instead of 0x4C.)
	ptr_table	See below.
	ptr_size	How big a pointer is, in bytes. The default is 4 bytes,
			if unspecified.
	ptr_start	The starting offset of a pointer. If specified, the
			pointer is added to this.

The entry size, if specified, will make Injector calculate the offset to
load an index from by multiplying it with the size (ex, index 4 for a
10 byte structure starting at 001150A8 would be (001150A8 + 4 x 10) at
001150D0.) If unspecified, various other mechanism will be used and
entries will have to be parsed one by one if the address of a pointer
table (ptr_table) cannot be provided.

Most parameters can be accessed in other scripts by prefixing their names
with an &. Definition is the exception to this.

	Ex:	header
		{
			definition		"Final Fantasy V.def"
			base_offset		001158B0
			entry_count		22
			entry_size		4
			entry_label		job_names
		}

While headers are optional, their use is highly suggested.



LABELS

Syntax:	"labels"
	"{"
		{ name "{" [{ parameter value }] [{ value ":" label }] "}" }
	"}"

A label is a mapping between a value and a string. The chief use of this is
to label indexes (ex, "entry 0 is 'Fire 1', entry 1 is 'Ice 1', entry 2 is 'Bolt
1'...")

When using a label, the value being matched is looked up in the table. If
an entry for the value exists, the corresponding string is output.

The following optional parameters exist.

	lbl_offset	If specified, labels are loaded directly from the file at this
			offset.
	lbl_length	If offset was specified, this indicates how long each
			string is.
	lbl_table	Which table to use, if any.

If loading the labels from a file, it is not necessary to specify value:label pairs.

	Ex:	labels
		{
			names
			{
				00: "Bob"
				01: "John"
				02: "Rick"
				03: "Steve"
			}

			job_names
			{
				offset	00115800
				lenght	8
				table		main
			}
		}

		This creates two labels. The first, names, maps 0x00
		to "Bob", 0x01 to "John", so forth. The second,
		job_names, loads the strings from the file at
		0x00115800. Each string is 8 bytes in length and
		mapped onto the "main" table.

		No parameters are necessary for the first labels, just
		like no data is necessary for the second labels.

	Labels can be used in a variety of places. Most string-displaying
	routines have some means of outputting them.



TABLE

Syntax:	"tables"
	"{"
		{ name "{" { value ":" character } "}" }
	"}"

A table is a mapping between a byte and a character or series of
characters. This is frequently necessary when working with ROMs as
the text is rarely stored directly as ASCII.

When parsing a string with a table, every byte in the string is looked up in
the table. If an entry for the byte exists, the corresponding string is
output.

	Ex:	tables
		{
			numbers
			{
				00:"0"  01:"1"  02:"2"  03:"3"  04:"4"
				05:"5"  06:"6"  07:"7"  08:"8"  09:"9"
			}
		}

		This creates a table named "numbers." This table says
		that when a 0x00 is read, "0" will be output.
		When 0x01 is read, "1" will be output. So forth until 0x09
		and "9".

Tables can be used in a variety of places. Most string-loading routines
permit the optional use of a table (by default, every byte maps to ASCII.)



COMPILE

Syntax:	"compile"
	"{"
		"root" "{" ... "}" { mode_name "{" ... "}" }
	"}"

Compile is called when a script written by the user is being compiled by
Injector. It is only available in script mode.

The compile script is different from the other script formats. It's a
collection of scripts, one of which must be named "root". When first
called, a system variable (&mode) is set to root. The root script is
parsed start to finish. If there is no data left to be parsed, it stops.
Otherwise it calls whichever script is specified in &mode and starts over
again.

	Ex:	compile
		{
			root
			{
				# Read the next identifier.
				$command: get_token();

				# Handle the commands.
				if $command = "Strength" then &mode: strength ;
				if $command = "Agility"  then &mode: agility  ;
				if $command = "Vitality" then &mode: vitality ;
				if $command = "MagicPow" then &mode: magic_pow;

				# If we're still in root, there's a problem.
				if &mode = root then error("Unrecognized token: " $command);
			}

			strength  { write_byte(0x00); write_byte(get_value()); &mode: root; }
			agility   { write_byte(0x01); write_byte(get_value()); &mode: root; }
			vitality  { write_byte(0x02); write_byte(get_value()); &mode: root; }
			magic_pow { write_byte(0x03); write_byte(get_value()); &mode: root; }
		}

This script reads a token and compares it to the four instructions it
supports: "Strength", "Agility", "Vitality", and "MagicPow". If it matches
one, it changes &mode to that instruction's mode. If none of the tests
passed and &mode is still root, the user is informed they have used an
unknown command. Since root ends, the script will call itself again to
parse with the next character.

Once root is done, if no error occured and &mode is set to something
else, we call that mode. In this case, all four modes just read a value
from the user's script, write the instruction and value, then return to
root mode.

An example script to be parsed by this would be as follows...

	Strength 2
	Agility  3
	Vitality 2
	MagicPow 4

Note that this script is a very simple one. See some of the ones
included with Injector for better examples.



DECOMPILE
DUMP

Syntax:	["decompile" | "dump"]
	"{"
		script
	"}"

The decompile and dump scripts are used to extract information from the
source file and render it in a format easier for a human to work with. The
script is run on each entry to be decompiled/dumped once. It is meant
to treat a single entry, not the entire set at once.

Decompile is the exact opposite of compile. Only available for script
mode, it produces (or at least, should produce) a script which can be
recompiled immediately without any changes. This script is produced
directly from the source file.

Dump is a more flexible version of decompile. It is meant for display
purposes only and can be used with both scripts and data. Typically,
this would be used to export data in a readable format to a file. This
data should come from the format variables.

	Ex:	dump
		{
			$name: label(job_names &current_entry);

			output(&comment_char " ID " &current_entry ": " $name "\n");
			output("Strength " *strength  "\n");
			output("Agility  " *agility   "\n");
			output("Vitality " *vitality  "\n");
			output("MagicPow " *magic_pow "\n");
		}

The above would simply output the character's name and all four stats.



FUNCTIONS

Syntax:	"functions"
	"{"
		{ function_name "{" ... "}" }
	"}"

Like compile, functions is also slightly different from the standard script
format.

Any function defined in functions can be called using the call() function.
They are best used in definition files, where multiple scripts can call and
reuse them.

	Ex:	functions
		{
			# Takes what's in %param1 and %param2 and calculates the sum.
			sum
			{
				%sum: %param1 + %param2
			}

			# Takes what's in %param1 and %param2 and calculates the average.
			average
			{
				call(sum);
				return(%sum / 2);
			}
		}

These two simple functions both use %param1 and %param2 as their
parameters. They can be called from any script.



READ

Syntax:	"read"
	"{"
		script
	"}"

A read script (only available in data mode) is used to pull the data from the
source file. Normally, you would fill up format's data with this.

	Ex:	read
		{
			*strength : read_byte(0);
			*agility  : read_byte(1);
			*vitality : read_byte(2);
			*magic_pow: read_byte(3);
		}

Here, we read 4 bytes (0, 1, 2, and 3) and write them into the appropriate
format variable.



WRITE

Syntax:	"write"
	"{"
		script
	"}"

The write script, only available in data mode, writes back the format
variables into the source file.

	Ex:	inject
		{
			# Just write everything.
			write_all();
		}

Most scripts will simply do the above, which outputs every format variable
sequentially back into the source file. However, if data needs to be
packed using some special format, this is where it would be re-packed.

Again, this is just the definition file. The file that's used to tell Injector which format things are in. When you throw this thing into Injector, it creates the interface it'll need for you and lets you just write script, assembly, or edit numbers and other values like if it were an editor.

These files can be distributed to anyone who has Injector: you can just
download a script file and use it to edit events without having to know
how events are stored, for instance.

Edit: Har har! A new screenshot!

Edit 2 - A release
Nothing parser-related yet. I finished writing the whole load/save routines just now, so they may still be a bit buggy.

Injector.zip

Edit 3
<farnsworth>Good news, everyone!</farnsworth> The loader is about 20-30% complete! After that, I just have to code the interpretor and bam! 100% done! (Yep, everything! 2-3 weeks tops and this bad boy's out on the streets.)

RuneLancer · Aug 5, 2008

This is the last update on this project I'll be making here, as this is no longer Cave Story-related and fairly off-topic. Expect to hear again from it eventually though.

(No, no releases yet. It's still a good week from being ready; that's display-only, it doesn't write things back so don't get too excited. As I've said, I'll eventually post about this in a more appropriate location when I'm ready and willing to release it. It's not going to be ready for another week or two nor am I very much in the mood to distribute it here at the moment, either way.)