Welcome to a new experiment. I am going to try and explain why programmers tend to be concerned (one way or another) about copyright law, to the level where we compare it to slavery or tyranny. This is not going to be easy, since I intend this to be readable by people who are not programmers, and who never programmed. I am going to start from the very beginning, and it’s going to take a while.

Before we even start, I would like to point out an excellent description for why we usually expect explanations to be simpler then they really are. Please do read this. I’ll wait, really. If you think you do not need to read it, that’s actually evidence that you do — you expect my explanation to be simpler than it is, and that you do not need that bit of background knowledge…

Welcome back! One important concept introduced in the article linked above is “Word of Power”. (If you didn’t read it, now is the time to fix that issue…) I will try to introduce the new concepts using Words of Power, saying the word, and then linking it to the power behind it.

The first Word of Power will be “Universal Turing Machine”. You may remember a previous post of mine about Alan Turing, one of the greatest giants on whose shoulders we have the privilege to be standing on. But I want to start by talking about another great giant, John McCarthy. McCarthy wrote a paper with the somewhat unassuming name, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I. A more appropriate title would have been “paving the way forward for programming for the next 50 years”, you see. What McCarthy did in the paper was to lay down a simple explanation of a system for defining computation. Nowadays, a lot of computation is done, appropriately, on electronic computers. Before we had those, however, “computer” was a job title of a person who applied computation rules to symbols on paper to get results — much like the rules we learned in primary school for doing long multiplication.

In both sense of the word “computer”, the goal is the same: apply rules to manipulate symbols. McCarthy defined a specific set of rules to manipulate symbols that he called “Lisp”. Lisp used S-expressions (short for “symbolic expressions”) to define computations. Read this sentence over again, because it might be the most important sentence in here: even though Lisp was a set of rules to manipulate S-expressions, those S-expressions define *other computations*. If you only learned to manipulate S-expressions, someone could write an S-expression that did, say, long multiplication. Then, if you manipulated the S-expression for long multiplication followed by S-expressions representing numbers, you would *do long multiplication*. Now comes the fun part — you could even *write the rules for manipulating S-expressions using S-expressions*. You might think that for this to work, the rules for manipulating S-expressions would be really complicated. This is not true — you can see the manipulation rules here. Sure, it’s a bit long, but doesn’t look like more than you needed to learn to do arithmetic, right?

Well, what use would rules for manipulating S-expressions when already know how to manipulate S-expressions? Maybe very little, if not for a few other thing. I hope you have seen a demonstration of the game of life, once upon a time. If you haven’t, I recommend that you read the wikipedia article. With 4 simple rules, much much simpler than McCarthy’s S-expression rules, much is possible. How much? Well, the proof is long and difficult, but mathematicians have found a Game of Life configuration that will *manipulate S-expressions*. So if you only knew to manipulate squares on graph paper according to the rules of the Game of Life, seemingly easier than manipulating S-expressions, we could just give you a (pretty hefty) Game of Life that would cause you to manipulate S-expressions. Wait, but manipulating S-expressions lets you do arithmetic, right? So you could do arithmetic too!

Wait, what about arithmetic? Well it turns out, if you just know how to add 1 to numbers, and do a bunch of other trivial things (like substitute numbers for other numbers), we can give you a instructions that will let you manipulate the Game of Life. Or other instructions, that will manipulate S-expressions. It turns out that if you can do interesting enough computations, it doesn’t matter a whole bunch what you know to do — you can find an initial input (S-expression, Game of Life configuration, or arithmetic instructions) that will let you do anything. Anything? Well, not quite!

Another interesting result (called “Rice’s Theorem”) is that you cannot write an S-expression (or any of the other things) to take any S-expression (or any of the other things) and say anything interesting about what it does to any input. Notice that I didn’t say “respectively”, and didn’t mean to — you can’t make an S-expression that will take any Game of Life configuration, and will say anything interesting about what it does. You can write S-expressions that will take some S-expressions (or …) and say something interesting about them — but not *any* S-expression (or …).

What has all this to do with our friend, Alan Turing? Well, Turing invented the original thing that stands for “…”, and he called it a Turing machine. A Turing machine has an internal state, and reads from and writes to a tape. The interesting thing about a Turing machine? You can write a Turing machine that will manipulate S-expressions. Or play the Game of Life. Or perform arithmetic. Any of those Turing machines can do anything S-expressions do! All of those are called “Universal Turing Machine”, because any computation possible by any other Turing machine, or S-expression, or Game of Life, or arithmetic, can also be done by them. You only need One Turing machine, or one S-expression, or one Game of Life configuration, or one set of arithmetic rules and you can do anything anyone else can do, if maybe somewhat slowly. Fortunately, or unfortunately, all those computations also have the same basic flaw — you can never write one of those to say anything interesting about any computation that comes their way (this will be important later!)

Let us recap:

- Computations that manipulate symbols can be done either electronically or by a human, equally well, if not equally fast.
- You can define a “universal computation”, that if you learn to do, anyone can get you to do any other computation by writing the correct input.
- It is impossible for you to know anything interesting about any possible computational input.

Those three concepts (“a lot of interesting manipulations are equivalent”, “universal manipulations are possible” and “nothing interesting to say”) are the core of what computers are. A computer chip does very specific manipulations ([1]). However, programmers, by writing carefully thought out inputs, can get it to perform any computation ([2]). Lastly, it is impossible to write a computer chip that will be able to say anything interesting about any specific program ([3]). Of course, crafting these programs is difficult — this is why programmers use “programming languages”, which are *even more* “universal computation instructions”. A language is called “Turing complete” when you can use it to build a Turing machine (which would also have allowed you to build an S-expression manipulator, or Game of Life player, etc.). In short, all we have said so far applies equally well to most programming languages. There are niche non-Turing-complete languages, used where being able to say something interesting about any possible program is important — but here’s the rub — it’s actually *hard* to invent something which is not Turing complete. As we saw above, even very simple things will be able to build Turing machines.

Computers, languages and programmers are all becoming better at making computations that were “possible” a decade ago be “fast” now. There is an enormous economic pressure on that — after all, the more you can do “fast”, the more you can do “more and bigger”, and people like “more and bigger”.

Now, we come to a very simple computation — copying. A “copier” can be defined in multiple ways, but let’s suppose, for the sake of argument, that we just want to replicate the input twice (“have two copies”). It is easy to write an S-expression to do it. Therefore, any Turing complete environment can do it. But remember — it is impossible to say anything “interesting” about any possible S-expression, and therefore, it is impossible to write a computation that outputs the correct answer for “this S-expression is not a copier”. Wait, what? Yes, that’s right. If an S-expression has access to some input, it is *impossible* to know for sure that it will not copy it.

The next Word of Power I wish to introduce is “bits”. A “bit” (short for binary digit) is a place-holder for something that can be either 0 or 1. Anything that can represent at least two states (say, the light switch in your kitchen) is a bit. Now, let’s say that you have a set of symbols — say musical notes. I hope you have seen “Sounds of Music”, and remember the best song ever — “Do, a deer” (etc.). Our symbols are “Do, Re, Mi, Fa, Sol, La, Ti”. As the song makes the case, we can write any piece of music with just those (yes, I know about sharps, flats and octaves…bear with me). Now, let’s assign “bit patterns” to each note:

- Do — 000
- Re — 001
- Mi — 010
- Fa — 011
- Sol — 100
- La — 101
- Ti — 110

Now, if we want to write the notes for the first line of “Twinkle, Twinkle” (Do Do Sol Sol Ti Ti Sol) we can instead write 000000100100110110100 in bits. If we have those bits, we can just take them in threes, and convert them back to the musical notes. We can do the same with any set of symbols — say, the alphabet. This is actually how, more or less, computers store text — they can convert the alphabet into bits, and save those bits. Then they can manipulate those bits using carefully crafted programs to, say, replace the word “Foo” with “Bar”. Or, say, copy them. If we have the bits to a fan fiction of Mickey Mouse having sex with Pluto, we can copy those too. If instead, we have the bits for an S-expression that will generate the bits for a fan fiction of Mickey Mouse having sex with Pluto, we could copy those too. But wait, this is funny — there is no way to know for certain that certain bits are *not* an S-expression that will produce the bits of Mickey Mouse having sex with Pluto.

A fan fiction of Mickey Mouse having sex with Pluto is a dangerous thing. It is, as the law and case law currently stands, a violation of copyright law. In general, it is not protected under the Fair Use doctrine, and this means that writing this fan fiction is illegal. Copying this fan fiction is illegal. What’s more, Disney has an incentive (or at least, believes it has an incentive) to prevent copying this fan fiction around. When incentive (or perceived incentive) and legal powers combine, the result is expected — Disney would dearly love to have an automatic way to prevent computers from copying this fan fiction. Or, say, from copying the bits that represent the video of “Cars 2”.

Remember what I said above — it’s impossible to have a way to know for certain which bits are actually an S-expression that will create the bits for the “Cars 2” video. Although Disney has the incentive (this is a matter of economics and psychology, ultimately the results of the forces of evolution) and the legal powers (this is a matter of social convention), the math, hard and unyielding, doesn’t care. Math doesn’t care about evolution. Math doesn’t care about society. Math is math, and the math says that you can’t build a computer that will only copy things if they’re not S-expressions that produce the Cars 2 video, no matter how much you want to.

Join me next episode, when I explain the basics of cryptography, and how they pertain to the issue of copyright law.

Hmm. Is that the halting problem, or Rice’s theorem? 🙂

Well, I guess strictly speaking it’s Rice’s theorem, but Rice’s follows so directly from the Halting problem I usually treat them as the same 🙂 I will change it…

“Oh, and you’d better not drop any hints that you think you’re working a dozen inferential steps away from what the audience knows, or that you think you have special background knowledge not available to them. The audience doesn’t know anything about an evolutionary-psychological argument for a cognitive bias to underestimate inferential distances leading to traffic jams in communication. They’ll just think you’re condescending.”

This manages to be meta-condescending 🙂

That article is a bit annoying because its a great explanation to a phenomena that may or may not exists at all. I didn’t see any proof that there is a general tendency to expect explanations to be simpler. The tendency may be limited to male geeks, or to western culture, or to busy people. I hear “don’t bother explaining, I won’t understand anyway” pretty often, which shows healthy awareness that what I do is complicated and boring.

Maybe the general tendency is just not to want long explanations, which leads to either expecting short explanations or avoiding explanations completely?

I definitely had had the phenomena of “oh, well, I got the gist of it” or “just tell me the answer — can X be done or not?” (usually for Xs where you need to go back two steps and explain that X is not well defined, and you might not want some Xs and so on). To my eternal shame, I even once did this to a report — I was like “So either software X can do Y or it can’t — I need to know”, only for him to remind me that people expect simpler explanations. In short, my anecdotal evidence points that, yes it does, and I am worried that people will assume that explaining what a universal turing machine has little to do with copyright law 🙂