[A lot of the ideas in this edition awe a lot to Cory Doctorow. The responsibility for any mistakes or omissions are still mine.]
Recapping the previous episode:
- It’s easy to define a system that is as powerful as any other computational system (Turing complete)
- It’s impossible to say what any computation does (Rice’s theorem)
- You can use bits (0s and 1s) to encode any sort of data — music, text or anything else.
In this episode, I will have a lot more to say about bits. Before I delve into bits, though, I want to talk about computers. While abstract systems that do computations abound, most modern computers are fairly similar. The theoretical computational system they are built to resemble are so-called “Von Neuman Machines“. A modern computer has lots of peripherals, but strip away the peripherals and you are left with:
- A “microchip” which implements instructions like “Add Register1 and Register2 and put the Results in Register3”
- Memory — mapping of “addresses” (index numbers) to “values”
- Instructions on the microchip of the sort “Treat register 1 as an address, and fetch the value there into register 2”
The exact instruction set that the microchip implements depends on the type, but ultimately, as we saw before, it does not matter too much — all computational systems are equivalent. What is important is “Moore’s Law”, of which many variants exist but ultimately says that:
- Microchips can perform more calculations per second every year.
- The amount of memory available for a given price keeps growing.
An important computer peripheral, which almost all computers have a variant of, is a storage device. That ranges from a magnetic hard drive to a micro-SD card. All that is important, for our purposes, about those is:
- Accessing them is slower than memory
- They are bigger (have more addresses) than memory for the same price [a lot more — frequently 10x or 30x]
- They grow bigger for the same price every year
Another important computer peripheral is the network card. Networks connect computers to each other.
Now, let’s remember something we said earlier — there is no way to reliably detect any sequence of 0s and 1s that encodes a computation that we do not like — say, a Turing machine outputting a fanfiction about Micky Mouse having sex with Pluto. Therefore, if a network card allows you to send two distinct messages [if it only allows you to send one message, it’s kind of a sucky network card], you can send some “illegal” Turing machine encoding. Moreover, you will be able to store this illegal Turing machine faster and more cheaply every year.
So as an inescapable conclusion of (1) The math (b) Moore’s law we see that the law cannot be enforced, and it costs less and less every year to evade this law. The laws of the universe (such as how easy it is to use Quantum Mechanics to implement computation on the stuff that is abundant on every ocean beach, or Rice’s theorem) care nothing for Walt Disney or a starving artist — they are simple there, immutable and unforgiving, and humans must learn to deal with them.
One way to deal with them would be to “follow the money”: enforce the law only when someone is breaking it for corporate-level gain. This is not the way the legal system has gone through. Instead, they turned to God — “Crypto. Really good, standards-defined crypto”.
So to understand what transpired, it is important to understand the basics of cryptography. From the beginning of time (as humans count time, I guess), people have communicated with each other — “Look, Ug, I found an antelope and killed it. Help me eat it?” Not long afterwards, eavesdropping begun — “Hey, everyone, Ung has an antelope.” The next level was to use codes — “Look, Ug, I found a You-know-what and you-know-what-ted it. Help me you-know-what it?”, and so the battle begun.
Codes, like in the example above, have to balance two issues. The person to whom they are intended must be able to decipher them (Ung better hope Ug will get the right message). The person to whom they are not intended for must not be able to decipher them (or once again, the whole village will know about the antilope). Fast-forward to Greek times, Caeasar had the eponymous cipher, based on shifting every letter in the alphabet by a certain amount and the Nazis had Enigma. Word to the wise: use neither of those, as they both have been “broken”. “Broken” is a term cryptographers use to say “the eavesdropper can read the messages which are not intended for them”. Cryptographers spend a lot of time trying to figure out how to read those…
But back to the basics: In computer-based cryptography, we have a secret, S. S can be considered, like everything else a computer handles, as a certain sequence of bits. For cryptography to work, two parties have to pre-agree on S. (Side note: I will not be covering public-key cryptography here.) Then, we must be able to compute the “encryption”: a computation that takes M, a message, and calculates a function that depends on M and S. Then we must be able to compute “decryption”: a function that takes the encrypted message, E, and S, and returns to us M. Next, it should be impossible (or at least, very hard), to compute M from E without S.
How do we know that something is “very hard”? Well, if we have a guess for S, we can check that the decryption gives us a plausible message. How hard is it to guess S? Returning to the quotation at the beginning of the section, “standards-based crypto” is usually at least 128 bits of S (also known as the “key size”). That means that there are 2 to the power of 128 options. It is largely agreed that the smallest time-frame relevant for computation is the time it crosses a photon (light particle) to cross a hydrogen atom. A hydrogen atom is about 10**-11 meters long. The speed of light is about 10**8 meters per second, which means it takes more than 10**-20 seconds, which is more than 2**-70 seconds. Thus, if we have 2**128 options, we need more than 2**58 seconds. 2**30 seconds is more than a year, so it will be more than 2**28 years, which is about 32 million years. I used various approximations above, but the conclusion still stands — and more and more modern crypto uses 256-bit sized keys, which is not quite the heat death of the universe, but the point still stands: you cannot guess and hope to win. However, it turns out to be an open problem whether any question where you can easily verify guesses (the formal name for that is NP) is “hard” (the formal name for “easy” problems is P). By open, I mean that many computer scientists have tried tackling this problem over the last 50 years, with no success (and little progress).
This means, in particular, we cannot know that any cryptographic computation is really “good”. However, what we can do, and the second part of why “really good” is followed by “standards-based”, is to ask really smart people to try and solve a cryptographic problem. If they can’t, after trying really hard, we assume that the problem cannot be solved, and we used it as our encryption mechanism.
In modern times, the names “Ug” and “Ung”, perfectly good though they may be, have fallen into disuse. Modern cryptographers usually talk about “Alice” and “Bob” wanting to transmit messages, and “Eve” wanting to listen to them surreptitiously.
So here is an example of a cryptographic system: if you buy a DVD in the store, it is encrypted with a special key. When you buy a DVD player in the store, is contains the key. Your DVD player is a computer, with special software that decrypts the DVD, with a special key, and then plays it. So Alice, which is the Hollywood studio, encrypted the movie contents. Then Bob, the DVD, decrypts it. Bob, to play the DVD, must have the key. You bought the DVD, so there is nothing stopping you from opening up the DVD player, and taking the key, is there?
Well, there are two things. Once is simple: Hollywood, before giving the DVD maker the key, make sure that the DVD player is hard to “tamper” with. The DVD maker must put the key in a special chip, glued to the microchip, and that self-destructs if people tamper with it. As you can imagine, creative people have found ways to defeat that self-destruction. And so, Hollywood convinced the US government to pass a law called “Digital Millenium Copyright Act”. The DMCA says that
- It is illegal to tamper with the DVD that you bought and get the key out.
- It is illegal to tell someone the key that you dug out of the DVD.
- It is illegal to tell someone how to tamper with the DVD to get the key out.
- It is illegal to tell someone where to find instructions on how to tamper with the DVD, or how to get the key.
You might have noticed that 2-4 are restrictions on speech. There are numbers that are so illegal, that not only are you not allowed to write them on a piece of paper and give them to your friend, but if someone spray-painted them on a building, you are not allowed to tell anyone where that building is. Whether you’re allowed to tell them where they can find a map with the building starred in it is, I believe, still up for debate.
The incredulity continues as you find songs on You Tube and Flag images that encode illegal numbers. This means that certain songs and flags are now illegal. In fact, it very well might be that the Wikipedia page on “Illegal numbers” is already illegal, since it contains data that can allow recovery of these numbers. I wish I were kidding, but I am not.
I, personally, am not a copyright extremist. I am not committed to abolish copyright. However, I think that understanding the math and physics of computation are important, because otherwise we end up making songs illegal. When Hollywood claims that copyright infringement might cause jobs lost, and that this necessitates stronger copyright law, we should first ask “Will this make singing songs illegal?”
This is why programmers are concerned about copyright law — because we understand all of the above. I hope, after reading this, you are also concerned!