Early Copy Protection


    Early Copy Protection on the Apple II

For many years, a war has been waged between those who publish software
and those who don't want to pay for it (or want to profit by selling
illegal copies).  In recent years the war has expanded to video tapes,
DVDs, and audio CDs.  Nobody believes that a copy-proof format is
possible, but many publishers believe that they can increase their sales
by discouraging illegal copies.

The copy protection techniques employed on Apple II 5.25" diskettes are
the stuff of legend.  The drive was completely under software control,
requiring timing-critical loops to read and write data, so there were
few limitations on what the copy protectors could do.  A scheme called
Spiradisc, mentioned in Steven Levy's book _Hackers_, wrote tracks in a
spiral pattern.  The disc was essentially impossible to copy directly,
though that couldn't stop someone from cracking it <computist.htm>.

Before the Apple II had floppy drives, however, it had an audio cassette
interface for storing programs and data.  This was a very primitive
system, requiring you to hook up a cassette recorder to your computer
and fiddle with the volume knob until things started working.  To read
data from tape, you specified a range of memory to fill, and hit the
"play" button on your tape recorder.  If all went well, the computer
cheerfully beeped at you and off you went.  Loading BASIC programs was
even easier, because the start location was pre-determined, and the
length was stored on the tape.  All you had to do was type "LOAD".

I recently found myself extracting software from cassette tapes
purchased on eBay.  At the start of the project, I thought to myself,
"it's awkward to get at the data, but at least there's no copy
protection."  As it turns out, I was wrong.


      Possibilities

You read a set of bytes from tape.  You can write them out to a new tape
the same way.  How can you stop somebody from copying your software?

Before we ponder that, consider something even simpler.  Why not just
hook up two tape recorders and copy the tape directly?  This approach is
foiled by the same issue that kept the music industry cozy and warm for
many years: after a couple of generations, the quality degrades to the
point where the data is no longer readable.  The only way to avoid this
problem is to write a new copy of the cassette data from the computer,
creating a new analog copy from a digital master.

So how do we prevent the user from just reading and writing bytes?  One
possible approach is to use a two-stage loader.  To load binary data
from an Apple II cassette you need to know its length in bytes.  If we
write a short program whose purpose is to load the main part of the
application, we never have to tell the user the length of the main
part.  This seems like a small victory, and it is; it's pretty easy to
come up with a way to load the data without knowing the length (e.g. try
to read more data than is on the tape, see where it stops looking
reasonable, reduce the length and iterate until the checksum is valid). 
As with all copy protection, however, the goal isn't to defeat a skilled
and determined attacker, but rather discourage the casual user. 
Besides, if we have our own loader, there are other tricks we can play.

We're going to take a detailed look at a trio of Apple II games that
sought to confound the pirates.  The programs are arranged in order of
increasing complexity of protection.  Before we can do that, however, we
need to understand a few things about how an Apple II works.


      Apple II Innards

The original Apple II used a 6502 processor running at 1MHz.  Not a real
speed demon by today's standards, but pretty good at the time.  It could
access 64KB of memory, of which up to 48K was RAM.  The upper 16K was
ROM, memory-mapped I/O locations, and firmware for installed peripheral
cards.  (Expansion cards, such as the Apple Language Card, provided
additional RAM through bank-switching.)  Some of the locations in RAM
had specific purposes, defined by the currently-running version of
BASIC, or by the system monitor.

The 6502 has three 8-bit registers, the accumulator (A) and two index
registers (X and Y).  The instruction set doesn't even pretend to be
orthogonal.  Because there are so few registers, and no way to hold a
16-bit address in them, the 6502 includes some "zero page" address
modes.  These allow indirect access to 16-bit addresses, and most
zero-page operations area faster than their more general counterparts.

The Apple II convention for displaying hexadecimal numbers is '$', so
instead of "0x2000" or "2000h" we write "$2000".  With that in mind,
here are some 6502 instruction examples:

    * |LDA #$7A| - load the hexadecimal value 7A into the A register.
    * |STX $36| - store the value held in the X register into zero-page
      address $0036.
    * |LDA $2000,X| - load the A register with the value held at address
      $2000 + X.  That is, if X is $15, it would load the value at $2015
      into A.
    * |STA ($36),Y| - get the little-endian 16-bit address from location
      $0036 and $0037, add Y to it, and store the value of the
      accumulator there.  If 36/37 hold $2000, and Y is $10, then the A
      register will be stored in address $2010.
    * |JSR $FDED| - jump to a subroutine at address $FDED.  When the
      code there finishes, it issues an |RTS| instruction, and control
      returns to the instruction following the |JSR|.

The "monitor" was part of the "F8 ROM", which occupied the last 2K of
the address space (0xF800 to 0xFFFF).  It provided a way to enter,
disassemble, and execute code.  On early Apple II models, hitting the
reset key would leave you in the monitor.  Later models with the
"autostart ROM" would leave you in BASIC, but you could access the
monitor with "|CALL -151|".

The memory layout was usually discussed in terms of "pages".  Each
"page" was 256 bytes long.  The 6502 doesn't have a page-oriented
architecture, but this provided a convenient way to talk about sections
of memory.  The first few pages looked like this:

    * Page 0: a/k/a "zero page".  The monitor, BASIC, DOS 3.3, and
      ProDOS all staked out territory here, so applications needed to
      avoid touching certain locations.
    * Page 1: the CPU stack lives here, starting at 0x01FF and moving
      downward.
    * Page 2: keyboard input buffer, by convention.  Anything placed
      here gets partially overwritten as soon as the user regains
      control of the keyboard.
    * Page 3: mostly free space, but later versions of the Apple II
      firmware put vectors here for software breakpoints and the reset key.
    * Page 4-7: text page 1.  The text on the screen is stored here. 
      Sort of a text frame buffer.

After that comes a second text page, some open space, and then two
"hi-res" graphics frame buffers.

One trick employed by cassette publishers, which wasn't copy protection
so much as an attempt to do something cool, was to start loading the
program at address 0x0200 (page two).  The act of loading the tape would
place commands in the keyboard input buffer so that, when loading
completed, the program would start automatically, just as if the user
had typed them.  The program typically started at $0800, which meant the
tape would also load data onto the text page, allowing a "please wait"
start-up banner.

Cassette tapes have a 10-second 770Hz lead-in, followed by cycles at
1KHz or 2KHz, representing '1's and '0's, respectively.  For a program
of moderate size this means an average speed of about 1200bps, or about
1000 times slower than a 1x CD-ROM drive.  The 6502 code for the
cassette read/write routines, including subroutines shared with other
code, only requires about 180 bytes of space in ROM.

With this in mind, let's explore the first of the three programs.


      Personal Software - Microchess 2.0

a2-microchess.jpg (61947 bytes) <images/a2-microchess.jpg>

Microchess was a very early chess program, released in 1978.  It used
the Apple II's high-resolution graphics screen (280x192, six colors) to
display the chess board.  The complete game fit in 7.5K of RAM, making
it easy to load from tape.  Running it in a 16K machine alongside an 8K
graphics frame buffer was a bit of a squeeze, but they managed it.

The game's instructions provide the following system monitor command to
load the game:

    |2000.2200R 2000G|

This means, "load memory locations $2000 through $2200 (inclusive) with
data from the tape, then start executing the code at $2000."  The manual
further says to leave the tape running until the graphical chess board
appears.  The code at $2000, then, is our stage 1 loader.  It starts off
pretty simply:

    2000-   20 84 FE    JSR   $FE84    F8ROM:SETNORM
    2003-   20 2F FB    JSR   $FB2F    F8ROM:INIT
    2006-   20 93 FE    JSR   $FE93    F8ROM:SETVID
    2009-   D8          CLD
    200A-   20 58 FC    JSR   $FC58    F8ROM:HOME
    200D-   A2 FF       LDX   #$FF
    [...]

The code above calls some F8 ROM routines to perform basic system
initializations, then the next part (removed for brevity -- it's long
and not very interesting) prints a "game starts in two minutes" message
with a short delay, and puts address $0200 into zero page memory
location $02-03.  Then things start to get interesting:

    204E-   A9 02       LDA   #$02
    2050-   85 3D       STA   $3D
    2052-   A9 20       LDA   #$20
    2054-   85 3F       STA   $3F
    2056-   A9 00       LDA   #$00
    2058-   85 3C       STA   $3C
    205A-   85 3E       STA   $3E
    205C-   EA          NOP
    205D-   EA          NOP
    205E-   EA          NOP
    205F-   EA          NOP
    2060-   EA          NOP
    2061-   EA          NOP
    2062-   EA          NOP
    2063-   EA          NOP
    2064-   20 58 21    JSR   $2158
    2067-   20 4B 21    JSR   $214B
    206A-   20 3A FF    JSR   $FF3A    F8ROM:BELL
    206D-   20 FD FE    JSR   $FEFD    F8ROM:READ

The values stuffed into $3C-3D and $3E-3F define the start and end of a
range used in a system monitor command.  While a command like "read from
tape" is being executed, the address at $3C is incremented until it
becomes equal to the address in $3E.  The first seven lines of the code
above are therefore equivalent to typing "200.2000".  The "NOP"s are "no
operation" statements, meaning they do nothing but eat a couple of
cycles.  (Most likely there was some other code in there before the
software shipped.)

The bottom part of the code calls $2158, which erases RAM from $4000 up,
including hi-res page 2, and then calls $214B, which turns on display of
hi-res page 2.  This leaves the user staring at a blank screen, or (on
systems with only 16K of RAM) a semi-random pattern.  It emits a "beep"
via the BELL routine and then calls the monitor cassette read function. 
Showing a blank graphics page is nice because the tape overwrites text
page 1 with executable code, which isn't much fun to look at.  Continuing:

    2070-   AD 80 04    LDA   $0480
    2073-   C9 C5       CMP   #$C5
    2075-   D0 03       BNE   $207A
    2077-   4C 59 FF    JMP   $FF59    F8ROM:OLDRST

The above code checks the return value from the tape read function.  The
tape read function, unfortunately, doesn't actually return anything --
it just prints "ERR" to the screen if something goes wrong.  So, the
code checks to see if the letter 'E' appears at a certain location on
the text page.  If so, it gives up and jumps into the monitor.

At this point, the code does something slightly odd:

    207A-   20 4B 21    JSR   $214B
    207D-   A2 00       LDX   #$00
    207F-   A1 02       LDA   ($02,X)
    2081-   49 A5       EOR   #$A5
    2083-   81 02       STA   ($02,X)
    2085-   E6 02       INC   $02
    2087-   D0 02       BNE   $208B
    2089-   E6 03       INC   $03
    208B-   A5 03       LDA   $03
    208D-   C9 20       CMP   #$20
    208F-   D0 EC       BNE   $207D
    2091-   4C 00 06    JMP   $0600

It calls $214B a second time, which is unnecessary, since we're already
looking at hi-res page 2.  It then loads a byte from the address held in
$02-03 (which was initialized to $0200 earlier), performs an
exclusive-OR with the constant value $A5, and puts it back.  This is
repeated for every byte from $0200 to $1FFF, after which the game is
executed with a jump to location $0600.  This is a fairly common trick,
used to disguise sections of code or data.  If you exclusive-OR a byte
with a non-zero value, you get a new value.  If you exclusive-OR it with
the same value a second time, you get the original byte back.

The code stored on the tape is exclusive-ORed so that, if you try to
load the second stage directly, you'll end up with what appears to be
unreadable junk.  If you want to copy the tape, you have to figure out
how it's encoded, or you have to copy both stages.

Copying this to a new tape or adapting it for use on a disk-based system
is straightforward.  The easiest way to make a copy is to simply copy
both stages to a new tape, without modifying either.  For a disk-based
system, decode part 2, and add a simple memory-move function.


      Softape - Module 6

a2-module6.jpg (90785 bytes) <images/a2-module6.jpg>

A company called Softape published a large number of programs for the
Apple II on cassette.  In 1978, they published "Module 6", part of a
series of games.  This particular one was an Integer BASIC
implementation of the card game "Blackjack".

The game didn't use a two-stage loader, but it did have slightly
peculiar instructions for loading from tape:

    |30.3FFFR|

At first glance, the seems like it must be shorthand notation for "read
from $3000 to $3FFF", or perhaps it's a typographical error.  In fact,
the program really does start loading on the direct page, and continues
through the system stack, input buffer, text page, and so on.  Note
there is no "xxxxG" command here, which means the software uses some
other means to start itself running.

Why is this copy protection?  It should be easy to simply load the
program at a different address (e.g. |1030.4FFFR|) and save it to a new
tape.  The problem with this approach is that most Apple II systems
being sold in 1978 had at most 16K of RAM.  Anything more than that was
a luxury.  The tape was designed to completely fill RAM on most
systems.  Just as CD-ROMs and audio CDs went from "pretty secure" to
"wide open" as hard drive capacity and Internet bandwidth increased,
this scheme became worthless once larger configurations became common.

If you can fit a hi-res chess program in 7.5K, though, why does it take
nearly 16K for a text Blackjack game?  It doesn't.  Much of the data on
the tape -- more than half, as it turns out -- is either temporary
"splash screen" stuff or is filler code from other programs, inserted to
prevent us from paring the code down to the core.  The real program is
in the last part of the tape.

So, where do we start?  We know that, once the tape read function
finishes, it will return to the system monitor command line.  However,
the tape has overwritten the system stack, so the return address is no
longer there.  Because of the way the tape read function works, this
doesn't actually interfere with loading data from tape, but where does
it go when it's done?

Looking at a hex dump of the stack area, we find:

    00000100: 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
    00000110: 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
    00000120: 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
    [...]

This means the 16-bit value pulled off of the stack will be $0303, no
matter what our stack pointer happened to be.  This translates to a
return address of $0304, because the 6502 RTS instruction adds one to
the value pulled off of the stack.  Since we also loaded that chunk of
memory from the tape, we can find it in the hex dump.  The results are
somewhat baffling:

    0304-   FF          ???
    0305-   FF          ???
    0306-   FF          ???
    0307-   FF          ???

There's no code there.  Eventually, after falling through lots of
nonsense, we hit a software break instruction (|BRK|).  This seems like
an accident.  When we hit a software break, we go through the monitor's
handler at $FA4C, which decides it's a software interrupt and jumps
through a vector at $3F0.  The data at $3F0 is $FFFF, but if we start
executing there we wrap around to $0000 -- which was not loaded from the
tape -- and most likely hit a BRK before long, which leaves us in an
infinite loop.

What did we miss?  Well, a copy protection scheme that assumes a 16K
machine might also make other assumptions.  In this case, it's assuming
the old version of the monitor ROM, which did not go through a vector at
$3F0.  Instead, it just dumps us into the monitor at $FF65.  So we're
out of the infinite loop, but we've stopped moving.  Now what?

The key here is a popular F8 ROM function called "COUT".  If a program
wanted to put text on the screen, it could either stuff the values there
directly, or it could use a firmware function called "COUT" at $FDED.  A
program would load a character into the accumulator, then "|JSR $FDED|"
to print it.  The firmware function would perform an indirect jump
through the little-endian 16-bit address at location $36-37, which by
default would jump to $FDF0 ("COUT1"), which would use some other
zero-page values to determine the proper location to output the
character.  Every time you call COUT1, the horizontal position advances
by one, until you hit the right edge and it wraps around to a new line.

The tape started loading at $0030.  The first 16 bytes look like this:

    00000030: ff 00 ff aa 13 28 40 08 00 08 00 40 3c 00 00 40  ...*.(@....@<..@

The COUT vector at $36 has been set to $0840, so when the BRK handler
outputs a character, control transfers there.  It's also worth pointing
out that the value at $3C-3D is $003C, which is important because that
holds the address where data is being loaded from tape.  Setting it to
$003C means the data loads from $30 to $3FFF without skipping around.

Here's what happens at $0840:

    0840-   A9 00       LDA   #$00
    0842-   85 36       STA   $36
    0844-   A9 09       LDA   #$09
    0846-   85 37       STA   $37
    0848-   00          BRK

Same thing all over again.  The COUT vector is changed to point at
$0900, and we do another software break.  When we try to output a
character, we end up here:

    0900-   60          RTS

As mentioned earlier, |RTS| is "return to subroutine".  The upshot of
the new state of things is that, whenever somebody tries to output a
character through COUT, we just return without doing anything.  The
monitor output has been suppressed.  This seems to leave us in an
awkward place though, because once again we're left without an active
thread of execution.

Something must happen, though, and when we return from our various
shenanigans we find ourselves back in the warm embrace of the system
monitor, which still wants to output some information about our last
software break, and then give us a command line to type stuff in on.  It
tries to input a character by calling the monitor RDKEY1 function. 
RDKEY1 works much the same way that COUT does, doing an indirect jump
through a zero-page vector, in this case $38-39.

Checking back to the hex dump of $0030, we see that address $38 holds
$0800, which is the next stage in the process:

    0800-   A9 8E       LDA   #$8E
    0802-   85 CA       STA   $CA
    0804-   A9 22       LDA   #$22
    0806-   85 CB       STA   $CB
    0808-   86 02       STX   $02
    080A-   A9 30       LDA   #$30
    080C-   85 00       STA   $00
    080E-   A9 08       LDA   #$08
    0810-   85 01       STA   $01
    0812-   A2 00       LDX   #$00
    0814-   A9 EA       LDA   #$EA
    0816-   8D 0C 08    STA   $080C
    0819-   8D 0D 08    STA   $080D
    081C-   8D 10 08    STA   $0810
    081F-   8D 11 08    STA   $0811
    0822-   A1 00       LDA   ($00,X)
    0824-   E6 00       INC   $00
    0826-   D0 02       BNE   $082A
    0828-   E6 01       INC   $01
    082A-   A6 02       LDX   $02
    082C-   60          RTS

It starts off nicely enough.  Putting $228E into $CA-CB tells Integer
BASIC where to find the start of the program.  It then saves the X
register in location $02, puts $0830 into address $00-01, and then does
some self-modifying code that NOPs out the instructions at $080C and
$0810.  This prevents it from re-initializing $00-01 on subsequent
calls.  It then reads a byte from the indirect address at $00-01,
increments $00-01, restores X, and returns with the byte we loaded in
the accumulator.

This is a rather complicated way of making the system think that the
user is typing the string of characters at $830.  These, as it turns
out, are:

    00000830: 9b c0 83 8d d2 d5 ce 8d ff ff ff ff ff ff ff ff  .@..RUN.........

Translated, that's "<Esc> @ <Ctrl-C> <Return> R U N <Return>".  Escape-@
clears the screen, Ctrl-C starts Integer BASIC, and "RUN" starts the
BASIC program running.  The first thing the BASIC program does is delete
part of itself:

        0 DIM D(52),R$(41),Q$(9),S$(3),C$(3),F$(4),A$(10),DEBT(4),CASH(4),U(4),H(144
    ),INSR(4): GOTO 32000
        0 REM   ************************
        1 REM   *      SOFTAPE *       * 
        2 REM   *  SOFTWARE EXCHANGE   *
        3 REM   *.===.===.===.===.===.===.===.=*
        4 REM   *     MODULE  #6       *
        5 REM   *                      *
        6 REM   *COPYRIGHT 1978-SOFTAPE*
        7 REM   *----------------------*
        8 REM   *  DUPLICATION OF THIS *
        9 REM   *PROGRAM OR ANY PORTION*
       10 REM   * THEREOF  CONSTITUTES *
       11 REM   *   INFRINGEMENT OF    *
       12 REM   *      COPYRIGHT       *
       13 REM   *.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
       14 REM 
       15 POKE 204,63: POKE 205,10
       30 TEXT : CALL -936: VTAB 4: TAB 14: PRINT "S O F T A P E": TAB 4: PRINT "S O
     F T W A R E  E X C H A N G E": PRINT 
    [...]
    32000 DEL 0: IN# 0: PR# 0: GOTO 15

Yes, there are two different version of line zero.  The code in line
32000 deletes the first one, uses "|IN#0:PR#0|" to reset the vectors at
$36-37 and $38-39 to defaults, and then jumps to line 15 to start the
program running.  Without the "DIM" statements in the first line 0, the
program will not execute if saved and reloaded, though somebody looking
at the program after it has executed once might have trouble deciding why.

(Incidentally, the '.'s in lines 3 and 13 are actually Ctrl-Gs,
invisible characters that make the speaker "beep".  The formatting of
the REM statements is actually consistent.)

In this case the copy protection was not only ineffectual, it was also a
nuisance because it prevented the program from running on later
machines.  The implementation appears slightly flawed as well: there's
an explicit BRK statement at address $0303, suggesting that the
programmer expected execution to continue there with his stack full of
3s, but the 6502 jumps to address+1 when an RTS instruction is found.

Transferring this to disk is easy, assuming a 48K machine: load the code
at a higher address, substitute a memory move in place of the cassette
load function, replace the BRK vector at $3F0 with something more
reasonable, and launch it.  Better yet, stop right before the program
starts running, and just save it as a BASIC program.

On a 16K cassette-only system, though, you'd need to write a custom tape
load routine to capture the image in two pieces.  There's no other way
to get at the last half of the program.  For a brief period, this
technique was reasonably effective.


      Hayden - Sargon II

a2-sargon-a.jpg (27111 bytes) <images/a2-sargon-a.jpg> a2-sargon-b.jpg
(46648 bytes) <images/a2-sargon-b.jpg>

The Sargon chess program was one of the earliest developed for
microcomputers.  It was written in Z-80 assembly language by Dan and
Kathe Spracklen, and won the all-microcomputer tournament at the 1978
West Coast Computer Faire.  An Apple II version was developed by Gary
Shannon, who published some early Apple II titles through Softape (e.g.
Othello and Jupiter Express), and also happened to be Kathe Spracklen's
brother.  Subsequent releases were Sargon II (Hayden 1979), Sargon III
(Hayden 1983), Sargon IV (Spinnaker 1989), and Sargon V (Activision
1991).  A book detailing the way Sargon works, /Sargon: A Computer Chess
Program/, can sometimes be found in used book stores (and occasionally
online <http://www.madscientistroom.org/chm/Sargon.html>).

The Apple II version of Sargon II featured text and hi-res display and
required 24K of RAM.  The copy protection used the best features found
in the previous two examples, and took them a step further.

The instructions for loading the game are quite simple:

    |30.3FFR|

This is the same low-memory start as "Module 6", but this time it's just
a small stage 1 loader.  There's no "xxxxG" command, so this is another
self-starting program.  Looking at the first 16 bytes, we see:

    00000030: 01 00 ff aa 05 00 3e 02 1b fd 00 00 3c 02 20 89  ...*..>..}..<. .

As in "Module 6", the text output vector has been altered, though the
input vector is set to the default ($FB1D).  Looking a little closer, we
see a new trick.  The value at $3C-3D, which holds the address used by
the cassette read/write functions, is $023C instead of $003c.  This
means that the data from the cassette will fill locations $0030-003D,
skip forward to $023E, and continue until $03FF.  If you try to be
clever and load the stage one program with "|1030.13FFR|", you will
fail, because there isn't that much data on the tape.

Skipping forward like this means the stack and the start of the input
buffer are left unmodified.  When the monitor finishes loading the data,
it will output a beep by writing a Ctrl-G through COUT, which sends us
through the vector at $36 to address $023E.  Here we find this:

    023E-   20 89 FE    JSR   $FE89    F8ROM:SETKBD
    0241-   20 93 FE    JSR   $FE93    F8ROM:SETVID
    0244-   20 60 03    JSR   $0360
    0247-   A9 03       LDA   #$03
    0249-   48          PHA
    024A-   A9 0E       LDA   #$0E
    024C-   48          PHA
    024D-   A9 1C       LDA   #$1C
    024F-   8D F2 03    STA   $03F2
    0252-   A9 17       LDA   #$17
    0254-   8D F3 03    STA   $03F3
    0257-   49 A5       EOR   #$A5
    0259-   8D F4 03    STA   $03F4
    025C-   20 39 FB    JSR   $FB39    F8ROM:SETTXT
    025F-   20 58 FC    JSR   $FC58    F8ROM:HOME
    0262-   A9 77       LDA   #$77
    0264-   8D FF 5F    STA   $5FFF
    0267-   AD FF 5F    LDA   $5FFF
    026A-   C9 77       CMP   #$77
    026C-   D0 01       BNE   $026F
    026E-   60          RTS

It calls F8 ROM routines to reset the vectors at $36-37 and $38-39, does
some stuff at $0360 (discussed next), pushes $030E onto the stack with
|PHA| instructions, points the autostart ROM reset key vector at $03F2
at itself, and clears the screen by calling HOME.  The code at $0262
writes and reads a byte into $5FFF to see if the machine has at least
24K of RAM, and jumps to $026F if it doesn't.  The RTS at $026E jumps to
the address we just pushed on, plus one ($030F); we'll come back to this
later.

The code at $0360 writes a copyright message to the text screen, then
jumps to $02B3.  The code there handles the stage two loading:

    02B3-   A9 00       LDA   #$00
    02B5-   85 3C       STA   $3C
    02B7-   A9 08       LDA   #$08
    02B9-   85 3D       STA   $3D
    02BB-   A9 FF       LDA   #$FF
    02BD-   85 3E       STA   $3E
    02BF-   A9 2F       LDA   #$2F
    02C1-   85 3F       STA   $3F
    02C3-   A2 00       LDX   #$00
    02C5-   20 FA FC    JSR   $FCFA
    02C8-   A9 16       LDA   #$16
    02CA-   20 C9 FC    JSR   $FCC9    F8ROM:HEADR
    02CD-   85 1F       STA   $1F
    02CF-   20 FA FC    JSR   $FCFA
    02D2-   A0 24       LDY   #$24
    02D4-   20 FD FC    JSR   $FCFD
    02D7-   B0 F9       BCS   $02D2
    02D9-   20 FD FC    JSR   $FCFD
    02DC-   A0 3B       LDY   #$3B
    02DE-   20 EC FC    JSR   $FCEC
    02E1-   81 3C       STA   ($3C,X)
    02E3-   45 1F       EOR   $1F
    02E5-   85 1F       STA   $1F
    02E7-   20 BA FC    JSR   $FCBA    F8ROM:NXTA1
    02EA-   A0 35       LDY   #$35
    02EC-   90 F0       BCC   $02DE
    02EE-   20 EC FC    JSR   $FCEC
    02F1-   49 A5       EOR   #$A5
    02F3-   C5 1F       CMP   $1F
    02F5-   F0 03       BEQ   $02FA
    02F7-   4C 2D FF    JMP   $FF2D    F8ROM:PRERR
    02FA-   60          RTS

As you can see (if you have been following along carefully), this sets
up the monitor start and end address to be |800.2FFF|.  It then provides
its own, slightly modified, tape read routine.  The code is identical to
what the system monitor does, with one exception: before comparing the
checksum for correctness, it exclusive-ORs the value with $A5.

This means that the checksum stored on the cassette is deliberately
wrong.  If you tried to load stage two with "|800.2FFFR|", the monitor
will report a failure, even though the data was read correctly.  (The
author could have taken this a step farther and employed different
timings, or perhaps reversed the meaning of '0' and '1', but for
whatever reason they stuck with the standard Apple II format.)  You
could look at the data at $0800 to see if it looks okay, but as we're
about to see that won't work.

Assuming the data loaded correctly, we RTS our way back to the first bit
of code, which RTSs us to the address pushed on the stack ($030F).  Here
we hit the obfuscation layers:

    030F-   A9 00       LDA   #$00
    0311-   85 00       STA   $00
    0313-   A9 08       LDA   #$08
    0315-   85 01       STA   $01
    0317-   A0 00       LDY   #$00
    0319-   B1 00       LDA   ($00),Y
    031B-   49 AD       EOR   #$AD
    031D-   91 00       STA   ($00),Y
    031F-   C8          INY
    0320-   D0 F7       BNE   $0319
    0322-   E6 01       INC   $01
    0324-   A5 01       LDA   $01
    0326-   C9 30       CMP   #$30
    0328-   90 EF       BCC   $0319

This uses the technique we saw earlier in Microchess, where the code is
exclusive-ORed with a value, so that the data read from tape looks like
gibberish until this function decodes it.  In this case, the value is
$AD, and everything from $0800 through $2FFF (the entire second stage)
is altered.

This next part doesn't make sense at first:

    032A-   18          CLC
    032B-   A0 13       LDY   #$13
    032D-   B9 9F 02    LDA   $029F,Y
    0330-   79 FB 02    ADC   $02FB,Y
    0333-   99 00 01    STA   $0100,Y
    0336-   88          DEY
    0337-   10 F4       BPL   $032D
    0339-   20 00 01    JSR   $0100

It's reading some values from one location, adding them to values from
another location, storing them at $0100-$0113, and then executing them. 
Instead of simply concealing the code with exclusive-ORs, it's actually
assembling a subroutine from two different places, and dropping it onto
the stack page.  The code, once assembled, looks like this:

    0100-   A9 33       LDA   #$33
    0102-   8D 10 03    STA   $0310
    0105-   A9 9A       LDA   #$9A
    0107-   85 3C       STA   $3C
    0109-   A9 03       LDA   #$03
    010B-   8D 7A 40    STA   $407A
    010E-   A9 27       LDA   #$27
    0110-   8D FE 5F    STA   $5FFE
    0113-   60          RTS

This subtly sabotages the exclusive-OR routine and the cassette start
address, and leaves a couple of values in seemingly random locations in
memory.  The first two are attempts to throw a red herring at us.  It's
not clear what the last two do, but it's a good bet that the chess
application won't work correctly without them.  When this little gem
returns, we're back here:

    033C-   A0 13       LDY   #$13
    033E-   A9 FF       LDA   #$FF
    0340-   99 00 01    STA   $0100,Y
    0343-   88          DEY
    0344-   10 FA       BPL   $0340
    0346-   A0 00       LDY   #$00
    0348-   98          TYA
    0349-   99 00 02    STA   $0200,Y
    034C-   C8          INY
    034D-   D0 FA       BNE   $0349
    034F-   A9 97       LDA   #$97
    0351-   8D 1C 03    STA   $031C
    0354-   20 3A FF    JSR   $FF3A    F8ROM:BELL
    0357-   6C F2 03    JMP   ($03F2)

This erases the code at $0100 and $0200, and stores $97 at $031C. 
Looking at the code above, we see that $031C holds the value used to
exclusive-OR the data from tape.  Rather than erasing the code, the
authors replaced the value with a slightly different one, in an attempt
to lead the unwary on a chase down the wrong path.

After all that, the code emits the traditional post-cassette-load
speaker beep (which some enterprising individuals with EPROM burners
might have trapped to cause a modified software break), and jumps
through the reset vector at $03F2.  Earlier this was set to $171C, which
is the stage two entry point.

Transferring this to disk requires loading the stages at a higher
address, disabling the custom tape load function, and starting the
first-stage loader with some code that memory-moves the code into place
and sets up the output vector at $36-37.  Making a tape copy is hard
because there's no easy way to create the modified checksum.  You can
write a decoded copy of the main program, but you also need to set
things up the way the code at $0100 does.


      Closing Notes

The goals of and approaches used in copy protection on audio cassettes
in 1978 aren't much different from those used on CD-ROMs in 2004.  Any
program can have its copy protection removed.  The skill and effort
employed in protecting a program determines how much knowledge and
determination is required to strip away the protection.  The goal
remains deterrence of illegal copying by making the material difficult
for a casual user to duplicate faithfully.  Because legal users must be
allowed access to the material, approaches to copy protection rely on
obfuscation and minor format alterations.

Some technological advances -- strong encryption built into televisions
and headphones -- may change the rules in the future.  It's clear from
this examination, though, that the face of copy protection hasn't really
changed in over 25 years.

It's unfortunate that so few kids these days have the opportunity to
solve problems of a similar nature.  Some genuinely clever people worked
on copy protection for the Apple II, and I learned a great deal by
disassembling code while I was growing up.  It motivated me to acquire a
greater understanding of system-level programming than I would have
developed otherwise.  The desire to understand how things work, so
fundamental to a larval-stage engineer, is tremendously stimulated by
"forbidden" challenges.

The data was recovered from original cassette tapes purchased on eBay. 
The audio was captured as WAV files on a PC, and converted to Apple II
files on a disk image with a program I wrote called CiderPress.  The
lengths of the programs on tape are determined automatically, which
greatly simplified the process of extracting them.  The annotated
disassemblies, BASIC listings, and hex dumps above were also generated
with CiderPress.  The screen shots were captured while running the KEGS
Apple II emulator.

------------------------------------------------------------------------

The above is Copyright (C) 2004 by Andy McFadden
<http://www.fadden.com/>.  All Rights Reserved.