This project is read-only.

(de-)tokenizer implemetation

Developer
Jan 3, 2011 at 8:32 PM
Edited Jan 3, 2011 at 8:37 PM

where and how should I implement my tokenizer? you can download it here.

 

why can't I change this topic to private?

Developer
Jan 8, 2011 at 1:16 PM

Hey Lord, I need to be honest right? Yeah, I'm pretty sure so. Your detokenizer is very confusing, I downloaded it and tried it, installed it but couldn't use it very well, since I didn't understand how to.

 

I don't think we need a de-tokenizer in tiDE, I mean, not one like yours. We need to directly implement one in the "Open" file. This way the user can open .8xp files and these will be immediately de-tokenized, wether they're Basic, Axe or Assembly. Assemblex does this, but only for Assembly programs. Also, Assemblex is Python.

 

I'll convert to C# as soon as I can and then give the code to Sir, so he can tell me if he likes the code, since it is quite big.

Coordinator
Jan 8, 2011 at 4:12 PM

The disassembler should be pretty simple.  We need a method, Disassemble(byte[] Code) that returns a string.  We need to keep an xml file that has all of the opcodes and their hex equivalents, then the disassembler should use Linq to search for the proper hex code and disassemble it.  Here's some sample XML:

<opcode identifier="C9" value="ret" ixiyprefix="false" length="1" signed="false" />

The identifier is the first byte of the opcode, except in cases of IX/IY opcodes.  The disassembler should recognize the IX/IY prefix, and look for opcodes that have ixiyprefix set to true.  If it is multiple bytes, then it should grab the next couple bytes and parse them as well.  Certain instructions, such as jr, accept signed inputs, and the signed attribute should mark them as so.

 

As for the (de)tokenizer, I have gotten Merth's permission to use Tokens2.xml, which I have added to the project in TFS.  Here's some example XML:

<Token byte="$01" string=">DMS">
        <Alt string="►DMS" />
</Token>

When tokenizing (ASCII → TI-Basic), we should use both the string and the alt string.  When detokenizing, we should use the alt string (if available, some don't have it).  Some two byte tokens look like this:

    <Token byte="$5C">
        <Token byte="$00" string="[A]" />
        <Token byte="$01" string="[B]" />
    </Token>

We should just combine values and parse them into the dictionary (so we'd have two entries for $5C00/[A] and $5C01/[B]).

Developer
Jan 9, 2011 at 5:15 PM
Edited Jan 9, 2011 at 5:15 PM
SirCmpwn wrote:

The disassembler should be pretty simple.  We need a method, Disassemble(byte[] Code) that returns a string.  We need to keep an xml file that has all of the opcodes and their hex equivalents, then the disassembler should use Linq to search for the proper hex code and disassemble it.  Here's some sample XML:

<opcode identifier="C9" value="ret" ixiyprefix="false" length="1" signed="false" />

The identifier is the first byte of the opcode, except in cases of IX/IY opcodes.  The disassembler should recognize the IX/IY prefix, and look for opcodes that have ixiyprefix set to true.  If it is multiple bytes, then it should grab the next couple bytes and parse them as well.  Certain instructions, such as jr, accept signed inputs, and the signed attribute should mark them as so.

 

As for the (de)tokenizer, I have gotten Merth's permission to use Tokens2.xml, which I have added to the project in TFS.  Here's some example XML:

<Token byte="$01" string=">DMS">
        <Alt string="►DMS" />
</Token>

When tokenizing (ASCII → TI-Basic), we should use both the string and the alt string.  When detokenizing, we should use the alt string (if available, some don't have it).  Some two byte tokens look like this:

    <Token byte="$5C">
        <Token byte="$00" string="[A]" />
        <Token byte="$01" string="[B]" />
    </Token>

We should just combine values and parse them into the dictionary (so we'd have two entries for $5C00/[A] and $5C01/[B]).

Oh, we're sing Merth's XML files? They are very organized :)


Developer
Jan 10, 2011 at 8:23 AM

I saw that, and also used that data. I converted it to SQL Queries, and updated it to the database. I converted that data to XML, which is included in the download. it's called tokens02.xml. please take a look at it... For tokens which can't be typed in, I created the file alts01.xml, in which some alternative ways of typing in some tokens are defined. those tokens are replaced when you press Shift-Space, when the cursor is standing right after the token to be replaced, for example > will become ►. currently there are just only 4 or 5 alts defined.

Please take a look at it, and I'll make a new version. please don't jsut trow away this couple of hours if work :O

Coordinator
Jan 10, 2011 at 6:50 PM

If you post where it is, or part of the file, we can look at it, but I don't know where to find it.

Developer
Jan 10, 2011 at 8:39 PM

the link is in my first post in this topic : here you can find (de-)tokenizer

Developer
Jan 11, 2011 at 9:43 PM

how should I now implement it? Just the way I have it now? or the way you suggested it with the setup of the Basic project?

Coordinator
Jan 11, 2011 at 9:49 PM

Let me get the basic framework fleshed out and you can add the details.

Developer
Jan 11, 2011 at 9:51 PM

ok, so when can I start coding?

Coordinator
Jan 11, 2011 at 9:53 PM

Can you work on some of the other tasks for a bit, until I have a chance to get that taken care of?  The assembler needs some love, and the emulator could always use working on.  None of the emulator instructions properly emulate flags, which needs to be done.

Developer
Jan 11, 2011 at 9:58 PM

I'll try something, but I've never worked before on emulators or something like that. I'll give the assembler some love ;)

Coordinator
Jan 11, 2011 at 9:59 PM

Make sure you read AssemblyProcess.txt, or something like that.