CSE2 - The Cave Story decompilation project

Sep 13, 2019 at 10:20 PM
Senior Member
CSE Discord Admin
"Fly, Fly, Fly!"
Join Date: Jan 13, 2016
Location:
Posts: 132
Nope. If you make a debug build, and run it through gdb, you might be able to track down the cause.
 
Sep 13, 2019 at 10:56 PM
Junior Member
"It's dangerous to go alone!"
Join Date: Aug 5, 2019
Location: Hell
Posts: 41
Pronouns: he/him
Sep 14, 2019 at 1:08 AM
Junior Member
"It's dangerous to go alone!"
Join Date: Aug 5, 2019
Location: Hell
Posts: 41
Pronouns: he/him
Because I don't have a debug build, here's what happened when I ran the current build through:

Code:
(gdb) run
Starting program: /home/lazil/Cave Story/CSE2/Vanilla CSE2 Enhanced/game/CSE2 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffec849700 (LWP 29867)]
[New Thread 0x7ffff7fa4700 (LWP 29869)]
[New Thread 0x7fffc3fff700 (LWP 29871)]
*** buffer overflow detected ***: /home/lazil/Cave Story/CSE2/Vanilla CSE2 Enhanced/game/CSE2 terminated

Thread 1 "CSE2" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Any thoughts?
 
Sep 14, 2019 at 3:11 PM
Still at it
"All your forum are belong to us!"
Join Date: Sep 22, 2012
Location: Hell
Posts: 612
Pronouns: She/It
I have a few very dumb questions about CSE2:
In this folder, create another folder called 'build', then switch to the command-line (Visual Studio users should open the Developer Command Prompt) and cd into it. After that, generate the files for your build system with
This makes no sense to me, never touched a developer command prompt in my life, and I don't know what 'cd into it' means
An automated build.bat and dependencies downloader would be awesome.

Is there a project file? I cannot find one anywhere and am really used to having those available when working in VS (Terraria modding has spoiled me a little)
 
Sep 14, 2019 at 3:22 PM
Senior Member
CSE Discord Admin
"Fly, Fly, Fly!"
Join Date: Jan 13, 2016
Location:
Posts: 132
Google's your friend. The Developer Command Prompt can be found in Windows's start menu, in the Visual Studio folder. 'cd' is a command for changing directory - by default, I think the command prompt opens in the root of the C drive, so you use cd to switch to the build folder that you made earlier.

Will-do.

Edit: I have no idea how to make a debug build.
The enhanced building directions lists no options for making a debug build, and I have no idea where else to look.

You can create a debug build by changing '-DCMAKE_BUILD_TYPE=Release' to '-DCMAKE_BUILD_TYPE=Debug'.
 
Last edited:
Sep 14, 2019 at 3:26 PM
Senior Member
"Wahoo! Upgrade!"
Join Date: Dec 22, 2018
Location: Sand Zone Residence
Posts: 55
Pronouns: he/him
I don't know what 'cd into it' means
cd = change directory
basically you can write "cd", a space, and then an absolute or a relative directory path and you'll switch your current working directory (cwd) to that directory

examples (assuming you're on Windows):
Code:
cd C:\path\to\CSE2\build - if your current directory is somewhere random
cd build - if you're in a directory that has a child directory called "build", you can list the files and directories of your cwd using the "dir" command
 
Sep 14, 2019 at 3:41 PM
Still at it
"All your forum are belong to us!"
Join Date: Sep 22, 2012
Location: Hell
Posts: 612
Pronouns: She/It
cd = change directory
basically you can write "cd", a space, and then an absolute or a relative directory path and you'll switch your current working directory (cwd) to that directory
Seems a little pointless since you can just use the 'open command window here' feature in windows by holding shift whilst you right click (might also be ctrl, I forgot, I have a registry change that makes it appear by default)
 
Sep 14, 2019 at 3:47 PM
Senior Member
CSE Discord Admin
"Fly, Fly, Fly!"
Join Date: Jan 13, 2016
Location:
Posts: 132
If I remember right, the Developer Command Prompt is different from the regular command prompt, extending the PATH and defining some environment variables. They're needed for CMake to detect the compiler properly.
 
Sep 14, 2019 at 11:13 PM
Junior Member
"It's dangerous to go alone!"
Join Date: Aug 5, 2019
Location: Hell
Posts: 41
Pronouns: he/him
You can create a debug build by changing '-DCMAKE_BUILD_TYPE=Release' to '-DCMAKE_BUILD_TYPE=Debug'.

I assume that the compiling command should be changed to 'cmake --build . --config Debug' too?

edit: compiled with 'cmake --build . --config Debug'.
game now works?

odd.

Seems a little pointless since you can just use the 'open command window here' feature in windows by holding shift whilst you right click (might also be ctrl, I forgot, I have a registry change that makes it appear by default)
I'm pretty unsure, but I think it was grandfathered in from MS-DOS.
 
Last edited:
Sep 15, 2019 at 4:00 AM
Senior Member
CSE Discord Admin
"Fly, Fly, Fly!"
Join Date: Jan 13, 2016
Location:
Posts: 132
Heisenbug... typical. You could replace 'Debug' with 'RelWithDebInfo' too. That makes a release build, but with enough debug info for gdb to be useful.
 
Oct 17, 2019 at 12:07 AM
Senior Member
CSE Discord Admin
"Fly, Fly, Fly!"
Join Date: Jan 13, 2016
Location:
Posts: 132
CSE2 v2.0

Geez, has there really not been a release since May?

Well, here's v2.0. Why v2.0 and not v1.3? Because we've finally reached the second milestone for the project: ASM-accuracy!

The first milestone was making the game completable - playable from beginning to end. Once that was reached back in February, we released v1.0.

But this first release was far from ideal: a lot of the code was visibly the output of a decompiler, and not at all something a normal programmer would write. Take this for example:
Code:
//Get 4 digit number from TSC data
int GetTextScriptNo(int a)
{
    return gTS.data[a + 3] - 48
        + 10 * gTS.data[a + 2] - 480
        + 100 * gTS.data[a + 1] - 4800
        + 1000 * gTS.data[a] - 48000;
}
Do you have any idea what this thing's trying to do? Well, here's the code from v2.0:
Code:
//Get 4 digit number from TSC data
int GetTextScriptNo(int a)
{
    int b = 0;
    b += (gTS.data[a++] - '0') * 1000;
    b += (gTS.data[a++] - '0') * 100;
    b += (gTS.data[a++] - '0') * 10;
    b += gTS.data[a] - '0';
    return b;
}
But, even worse, v1.0 wasn't even trying to be accurate: during the initial decompilation process, me and Cucky ported each part of the engine to SDL2. While this was nice for portability, it kind of goes against the point of a decompilation.

With v2.0, we've killed two birds with one stone! The original DirectDraw/DirectInput/DirectSound/WinAPI code has been restored, and the code in general has been cleaned-up to better match the original source code.

Decompiling the DirectX code is straightforward enough, but some of you might be wondering how exactly we know what the original source code looked like, and what 'ASM accurate' is supposed to mean. Well, it all started with figuring out what compiler Pixel used to create the original EXE...

Many EXE files come with what's known as a "Rich Header", which details various properties of the compilation process, such as how many C or C++ object files were linked, but the part we're interested in is where it says what compiler was used. From this, we learned that it was MSVC .NET 2003.

By using the same compiler as Pixel, we could compile CSE2, and then check if the generated assembly code matches that of the original EXE. If any of the code doesn't match, then we can tweak the C/C++ code until it does match (making it "ASM-accurate").

So that's what I've been up to for the past few months: going over each and every function in the game, and tweaking them to produce the same assembly. Work on this actually started before CSE2 v1.1, but the lack of the original DirectX/WinAPI code caused unfixable code-inaccuracies throughout the entire game, which made a lot of it infeasible at the time.

Some of you might be thinking 'wait, but replacing SDL2 with DirectX/WinAPI means the game can't be compiled for Linux or Mac', and you're right. That's why the SDL2 port was split off to its own subproject: the portable branch.

With this release, CSE2 has been further split into three branches: accurate, portable, and enhanced. The accurate branch focuses on being accurate to the original EXE, down to all the bugs, limitations, and platform-dependencies; the portable branch aims to port the game to other platforms while otherwise keeping the experience as close to the original as possible; and the enhanced branch is the same as it's always been - providing a modified version of the engine with extra features.

Ironically, the accurate branch has become the easiest form of CSE2 for Windows users to compile, since it doesn't depend on any middleware, and it provides a Visual Studio 2017 solution file instead of a CMake file. Fun side-note: there was only one incompatibility stopping this 2004-era code from compiling in modern Visual Studio - DirectInput5 support was dropped in 2008.

Once the migration back to DirectX/WinAPI was complete, I figured I'd re-port the game to SDL2 from scratch. Along the way, I made various improvements to the replacement video, audio, and input backends. The audio and video backends in particular should be substantially faster.

Speaking of the video backend, the portable and enhanced branches now sport multiple video backends: along with a much faster renderer based on SDL2's hardware-accelerated Texture API, there's now a software renderer, a renderer that uses SDL2's software-rendered Surface API (portable branch only), and an OpenGL 3.2 renderer.

On top of all this, CSE2's received a few minor improvements as well: the enhanced branch now allows the user to resize the window, the DoConfig clone has had its broken button mappings fixed (they've always been broken), and Organya should no longer miss beats in the portable/enhanced branches, now that it's been synchronised with the audio callback routine.

Here are the GitHub releases:
CSE2 - Accurate
CSE2 - Portable
CSE2 - Enhanced

I guess this begs the question of what the next milestone is. Well, there are a few things I'd like to see done: for one, producing the exact same EXE would be nice, but right now global variable arrangement and stack frame sorting issues are stopping that. It would also be good to document the game's code more thoroughly - Gabe's been doing a good job of that lately. Then there's the possibility of making the enhanced branch more modder-friendly - maybe add some common community-made TSC commands or hackinator patches. I guess we'll just have to wait and see.
 
Last edited:
Oct 17, 2019 at 1:58 AM
me when bro says be holly and jolly for $20
"Life begins and ends with Nu."
Join Date: Jun 27, 2013
Location:
Posts: 2854
Age: 30
Pronouns: She/Her
maybe add some common community-made TSC commands or hackinator patches.
I would love to see <MS4, <MIM, <PHY, <BUY, and <VAR in CSE2. They're the most go-to custom commands a mod could ever use, and could definitely help benefit CSE2 modding.
 
Nov 5, 2019 at 2:26 AM
me when bro says be holly and jolly for $20
"Life begins and ends with Nu."
Join Date: Jun 27, 2013
Location:
Posts: 2854
Age: 30
Pronouns: She/Her
I'm quite new to this - what do <MS4, <MIM, <PHY, <BUY, and <VAR do?
<MS4 - MeSsage 4
Invisible textbox but on the bottom screen. Cuz why didn't Pixel do this???

<MIM - MImiga Mask (<MIMXXXX)
You can have multiple player skins upon command rather than having two dependent on an Equip flag.

<PHY - PHYsics (<PHYXXXX:YYYY)
A player physics toggler through TSC. The Hackinator guide in Booster's Lab best defines how to modify them.

The <BUY pack
A hack that not only creates a new enemy drop for currency, but it also comes with two new TSC commands, <BUY (<BUYXXXX:YYYY) and <SEL (<SELXXXX). This thread best explains how to use it.

The <VAR pack
An absolutely necessary TSC pack that allows for more complex flags and statistics through five commands:

<VAR - VARiable (<VARXXXX:YYYY)
Sets a number to a variable.

<VAO - VAriable Operation (<VARXXXX$YYYY, $ can be replaced with +, -, *, or /, the number for YYYY cannot equal 0 when dividing)
Performs a basic mathematical operation to a variable.

<VAJ - VAriable Jump (<VARWWWW:XXXX:YYYY:ZZZZ)
Sets an inequality equation between a variable and a number. If the inequality is true, then the TSC script will jump to another event.

<VAZ - VAriable Zero (<VAZXXXX:YYYY)
Zeroes out a set amount of variables starting with the first variable you start it at.

<RND - RaNDom (<RNDXXXX:YYYY:ZZZZ)
Sets a range of numbers between a minimum value and a maximum value for a variable to use for a random number generator. Can work well for TSC trickery and pseudo-RPG elements.

For more information on <VAR, the hackinator guide in Booster's Lab also provides a few more pointers.

I hope the descriptions for these commands are good enough to understand. They're the most commonly used custom TSC commands on vanilla, and they would make good candidates to be included in CSE2. There is no <IMG command, moving on.
 
Last edited:
Nov 5, 2019 at 3:11 AM
Senior Member
"Huzzah!"
Join Date: Jul 6, 2019
Location: United States
Posts: 214
I think that would be pretty cool, I would love to have that to make in a mod of mine
 
Nov 5, 2019 at 4:33 AM
Junior Member
"It's dangerous to go alone!"
Join Date: Aug 5, 2019
Location: Hell
Posts: 41
Pronouns: he/him
<MS4 - MeSsage 4
Invisible textbox but on the bottom screen. Cuz why didn't Pixel do this???

<MIM - MImiga Mask (<MIMXXXX)
You can have multiple player skins upon command rather than having two dependent on an Equip flag.

<PHY - PHYsics (<PHYXXXX:YYYY)
A player physics toggler through TSC. The Hackinator guide in Booster's Lab best defines how to modify them.

The <BUY pack
A hack that not only creates a new enemy drop for currency, but it also comes with two new TSC commands, <BUY (<BUYXXXX:YYYY) and <SEL (<SELXXXX). This thread best explains how to use it.

The <VAR pack
An absolutely necessary TSC pack that allows for more complex flags and statistics through five commands:

<VAR - VARiable (<VARXXXX:YYYY)
Sets a number to a variable.

<VAO - VAriable Operation (<VARXXXX$YYYY, $ can be replaced with +, -, *, or /, the number for YYYY cannot equal 0 when dividing)
Performs a basic mathematical operation to a variable.

<VAJ - VAriable Jump (<VARWWWW:XXXX:YYYY:ZZZZ)
Sets an inequality equation between a variable and a number. If the inequality is true, then the TSC script will jump to another event.

<VAZ - VAriable Zero (<VAZXXXX:YYYY)
Zeroes out a set amount of variables starting with the first variable you start it at.

<RND - RaNDom (<RNDXXXX:YYYY:ZZZZ)
Sets a range of numbers between a minimum value and a maximum value for a variable to use for a random number generator. Can work well for TSC trickery and pseudo-RPG elements.

For more information on <VAR, the hackinator guide in Booster's Lab also provides a few more pointers.

I hope the descriptions for these commands are good enough to understand. They're the most commonly used custom TSC commands on vanilla, and they would make good candidates to be included in CSE2. There is no <IMG command, moving on.

Those sound pretty neat. I don't know C or its variants (I mainly code Javascript, but because Javascript is like 13 different languages under a trenchcoat I can mostly understand the syntax), but I've been looking through the game files and I'm pretty sure that to add custom TSC commands you would edit the huge IF chain in TextScr.cpp

I have no idea how you would implement <VAR - perhaps with a JSON object, with each var being a key?

i.e. you make a variable named "mimigasCannibalized" with <VAR\[SPECIALESCAPECODE]mimigasCannibalized\[SPECIALESCAPECODE - I'm thinking $?]["Jack", "Sue", "King"]\[SPECIALESCAPECODE]
(you could probably make named variables, I think)
inside the game's code (I'm writing this with Javascript syntax because that's what I know) it would go and do
varObject["mimigasCannibalized"] = ["Jack", "Sue", "King"];

i don't know - it seems simple enough as long as C++ has native JSON support - otherwise, you may need a library or something

in the game's code, the IS_COMMAND function is defined as
IS_COMMAND(c1, c2, c3) gTS.data[gTS.p_read + 1] == c1 && gTS.data[gTS.p_read + 2] == c2 && gTS.data[gTS.p_read + 3] == c3
(pretty-printed and in JavaScript form:

Code:
function IS_COMMAND(c1, c2, c3) {
  if ((gTS.data[gTS.p_read + 1] == c1) && (gTS.data[gTS.p_read + 2] == c2) && (gTS.data[gTS.p_read + 3] == c3)) {
    return true;
  } else {
    return false;
  }
}

)

maybe modify it (for longer command names) to this?
Code:
function IS_COMMAND(c1, c2, c3) {
  if (params.length > 3) {
   for (let i = o; i < params.length; i++) {
      if (gTS.data[gTS.p_read + i] === params[i]) { 
        return true;
      }
    }
    return false;
  } else {
    if ((gTS.data[gTS.p_read + 1] == c1) && (gTS.data[gTS.p_read + 2] == c2) && (gTS.data[gTS.p_read + 3] == c3)) {
      return true;
    } else {
      return false;
    }
  }
}

I have literally 0 clue how I would do <MS4 -
for reference, here's the game's code for <MS3 (""illuminating"" comments by me):
Code:
ClearTextLine(); // obvious
gTS.flags &= ~0x10; // set flags
gTS.flags |= 0x23; // set more flags
if (gTS.flags & 0x40) // if a flag is on
	gTS.flags |= 0x10; // set a DIFFERENT flag (I have no idea what the |= or &= operators do - bit-shifting?)
gTS.p_read += 4; // skip forward past the command
bExit = TRUE; // exit the while loop

thoughts?
 
Last edited:
Nov 5, 2019 at 5:45 AM
me when bro says be holly and jolly for $20
"Life begins and ends with Nu."
Join Date: Jun 27, 2013
Location:
Posts: 2854
Age: 30
Pronouns: She/Her
You can't give names to variables in <VAR. Only 008-123.
 
Nov 5, 2019 at 5:50 AM
me when bro says be holly and jolly for $20
"Life begins and ends with Nu."
Join Date: Jun 27, 2013
Location:
Posts: 2854
Age: 30
Pronouns: She/Her
Doesn't it also use flag memory?
That can probably just have its own memory in CSE2, but you'd still have to use numbers, sadly :(
 
Back
Top