Just Enough Cryptography

"The man is insane who writes a secret in any other way than one which will conceal it from the vulgar and make it intelligible only with difficulty even to scientific men and earnest students."
Roger Bacon, Epistle on the Nullity of Magic, eighth chapter
"This Arte of Cypheringe, hath for Relative, an Art of Discypheringe; by supposition unprofitable; but, as things are, of great use. For suppose that Cyphars were well managed, there bee Multitudes of them which exclude the Discypherer. But the rawness and unskillfulnesse of Secretaries, and Clerks, in the Courts of Princes, is such that many times the greatest matters are committed to futile and weake Cyphers."
Roger Bacon, De Augmentis Scientaiarum, 1623, London
When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.
Many people on the Internet, in an extremely simple rot13 substitution cipher.

Table of Contents of this page:


Terms

Plaintext This is the original message. It could be simple text, or a JPEG image, or a zipped archive file, or whatever you want to protect. Sometimes this is called "cleartext".
Ciphertext This appears to be highly random, as the actual information contained in the plaintext has been hidden. If you really like math, or information theory, you could say that the ciphertext has high entropy. Sometimes this is called "crypttext".
E Encryption algorithm, a function applied to the plaintext to produce the ciphertext.
Ke Encryption key, used to modify how E works on this one specific run. Keys are binary strings, or if you want to think of them another way, the big numbers that those binary strings could represent.
D Decryption algorithm, a function applied to the ciphertext to produce the plaintext.
Kd Decryption key, used to modify how D works on this one specific run.
Computationally difficult Possible in theory, but a practical attacker will not have the time (or the motivation, or other needed resources) to carry out an exhaustive attack, nor will they be so lucky as to stumble upon the answer unexpectedly early.

Encryption and Decryption

               +---+                      +---+
 plaintext --->| E |--->  ciphertext  --->| D |---> plaintext
               +---+                      +---+
                 ^                          ^
                 |                          |
                 Ke                         Kd

                     ciphertext = E(p, Ke)        plaintext = D(ciphertext, Kd)
                                                            = D( E(plaintext, Ke), Kd)

Symmetric vs Asymmetric Cryptography

For many algorithms, E and D are at least slightly different. But for some algorithms, E = D. However, note that this distinction is not what we use to classify an algorithm as symmetric or asymmetric.

For symmetric cryptographic algorithms, Ke = Kd. You are free to pick the single key as any string of bits of the appropriate length. However, beware: some keys might be obvious guesses, and some algorithms have "weak keys", certain types of key patterns that allow enough information to leak into the ciphertext to make it easier to guess the key. These need to be hard to guess, and so they should look very random.

For asymmetric cryptographic algorithms, when you encrypt with some Ke you must decrypt with a completely different Kd You must use the correct math trick to generate a key pair: Ke,Kd.

If the asymmetric algorithm is to be useful, it must be extremely difficult to derive one key from the other. So use the correct math trick with a trusted asymmetric cryptographic algorithm.


Asymmetric-Key Cryptography

We can quit worrying about the encryption key versus the decryption key, and just think of a pair of keys, K1 and K2. Arbitrarily pick one and encrypt with it. Then the only thing that can be used for decryption is the other key in the pair.

                 +---+                       +---+
plaintext  -+--->| E |---> ciphertext1 --+-->| D |--->  plaintext      = D(ciphertext1, K2)
            |    +---+                   |   +---+                     = D(E(plaintext, K1), K2)
            |      ^                     |     ^
            |      |                     |     |
            |      K1                    |     K2
            |                            |
            |                            |   +---+    Meaningless Output
            |                            +-->| D |--->  ????
            |                                +---+
            |                                  ^
            |                                  |
            |                               any key other than K2
            |
            |
            |    +---+                       +---+
            +--->| E |---> ciphertext2 --+-->| D |--->  plaintext      = D(ciphertext2, K1)
                 +---+                   |   +---+                     = D(E(plaintext, K2), K1)
                   ^                     |     ^
                   |                     |     |
                   K2                    |     K1
                                         |
                                         |   +---+    Meaningless Output
       	                                 +-->| D |--->  ????
                                             +---+
                                               ^
                                               |
                                            any key other than K1

What "Strong" Means, and Inventing New Cryptosystems

A cryptographic algorithm E is strong if:

Note that each of those implies the others. For strong asymmetric cryptography, another feature is:

What if someone says that they just invented a new and very secure cipher algorithm?

That means that they must be either a mathematical genius or a fool. Which is the far more likely explanation?
(Hint: fools are much more common than mathematical geniuses)

What if someone says that a cipher algorithm must be kept secret for security?

We have known that "security through obscurity" is ineffectual for well over a century. Auguste Kerckhoffs (1835-1903) mathematically proved that the security of a cryptosystem must not depend on keeping its algorithm secret. See his article "La cryptographie militaire", in Journal des sciences militaries, vol IX, pp 5-38, Jan 1883. I pity the fool who ignores well-known 1.25-century-old knowledge.


Using Asymmetric-Key Cryptography

Let's say you have a sender and a receiver, and each has a pair of keys. For each person, they call one the "public key" and they tell the world, and they call the other the "private key" and they keep it a secret. So, each of them knows three keys — their pair, and the public key of the other.

Method 1 — Sender's private key only — Since the sender's public key is the only thing producing a coherent message, the receiver can trust that the sender sent it. Since slight changes in input make big changes in output, this also must have been the same message originally sent. But anyone can read it.

Method 2 — Receiver's public key only — Since the receivers's public key is the only thing producing a coherent message, the receiver can trust that no one else could have read it. But anyone could have sent it.

Method 3 — Sender's private key, and receiver's public key — The receiver can trust that no one else could read it, that the sender must have sent it, and that it wasn't altered in transit.


Using Asymmetric-Key Cryptography, Reworded

I want to: Send a message in such a way that I could prove:
"I was the sender, and this is precisely the message that I sent."
I must: Digitally sign the message.
I will need: My private key.
 
I want to: Send a message which only the intended receiver can read.
I must: Encrypt the message.
I will need: The receiver's public key.
 
I want to: Read a message which was sent so that only I can read it.
I must: Decrypt the message.
I will need: My private key.
 
I want to: Verify the identity of the sender and the integrity of the received message.
I must: Verify the digital signature.
I will need: The sender's public key.

Getting the Ciphers

For an enormous list, see "Applied Cryptography" by Bruce Schneier, in the "Reference Books" section. The book contains source code for many serious cryptographic algorithms.

For something more mathematically rigorous, see the CRC Handbook of Applied Cryptography by A. Menezes, P. van Oorschot and S. Vanstone — a free PDF version can be downloaded for personal use: http://www.cacr.math.uwaterloo.ca/hac/

There's a great cryptography archive at http://www.shmoo.com/crypto/, although you may not really need this if you have installed OpenSSL.


Why is Triple-DES only 112 bits, when DES is 56 bits?

Oh, it's even worse than that....

Encrypting twice in a row, using the same algorithm and two keys, does produce ciphertext:

plaintext --> encrypt --> encrypt --> ciphertext
                 ^           ^
                 |           |
                 K1          K2

However, if encryption algorithm E() is what is called a cryptographic group, then there exists some third key K3 where a single encryption with K3 alone produces the same ciphertext. Consider this from the attacker's point of view — all you need to decrypt is K3, so double encryption as above uses twice the work and provides no more security than one encryption step!

Even if E() is not a cryptographic group (and DES is not), there is something called a meet in the middle attack that makes the no more secure against a brute-force attack than just encrypting it once.

OK, so chaining multiple encryption steps is not useful...

Triple-DES works by encrypting with one key, then decrypting with a second key, and then encrypting with a third key:

plaintext --> encrypt --> decrypt --> encrypt --> ciphertext
                 ^           ^           ^
                 |           |           |
                 K1          K2          K3

Yes, that should be 168 bits. However, due to that pesky meet in the middle attack mentioned above, the complexity (meaning the computational work required) for an attacker is no greater than it would be for encrypting with one key, decrypting with the second, and then encrypting again with the first key:

plaintext --> encrypt --> decrypt --> encrypt --> ciphertext
                 ^           ^           ^
                 |           |           |
                 K1          K2          K1

Since you get no better security against determined cryptanalysis, the effective strength is no better than that of a 112-bit symmetric key no matter if you use (K1,K2,K3) or just (K1,K2,K1) as above. [see ref 1 below]

Now, in 1990 a known-plaintext attack on two-key 3DES was found, and US NIST rates two-key (K1,K2,K1) 3DES at 80 bits equivalent strength, and three-key (K1,K2,K3) 3DES at 112 bits equivalent strength. [see ref 2 below, and note the vast amounts of plaintext-ciphertext data required!]

Also see the well-written Wikipedia article on Triple DES, which points to further papers.

References:
[1] R. C. Merkle and M. Hellman, "On the Security of Multiple Encryption", Communications of the ACM, vol 24, no 7, 1981, pp 465-467. This is summarized as "Triple encryption, with three independent keys, is as secure as one might naively expect double encryption to be" in Bruce Schneier, Applied Cryptography, John Wiley, 1996, ISBN 0-471-12845-7, sections 11.3 and 15.1-15.3.
[2] "A known-plaintext attack on two-key triple encryption", van Oorschot, et al., EuroCrypt '90, and NIST Special Publication 800-57, "Recommendation for Key Management", August 2005, p 63, http://csrc.nist.gov/publications/nistpubs/800-57/SP800-57-Part1.pdf and http://csrc.nist.gov/publications/nistpubs/800-57/SP800-57-Part2.pdf


OK, So How Long Are The Keys?

The algorithms work differently, so you cannot simply compare key lengths to decide which is more secure against a brute-force search.

Very generally speaking, to get the same strength as an N-key symmetric cipher, an elliptic curve algorithm or a cryptographic hash needs to have a key 2N times as long, and an asymmetric cipher needs to have a key roughly 10N times as long or longer. Sorry, that's just the way the math goes. This is why asymmetric encryption is nice for key management, but symmetric algorithms are then used for bulk data encryption. [See section 1.53 of Handbook of Applied Cryptography, Menezes, van Oorschot, and Vanstone, 1997, available as a free PDF download]

One analysis of this is found in Applied Cryptography, Bruce Schneier, ISBN 0-471-12845-7. Thanks to Naftali Fasten for a really nice summary of parts of its chapter 7. A table based on that is:

Key Length in Bits for Approximately Equal Resistance to Brute-Force Attacks, per Schneier
Symmetric Encryption 56 64 80 112 128 256 448 1024
Cryptographic Hash 112 128 160 224 256 512 896 2048
Asymmetric Encryption 384 494 768 1792 2304 10753 39031 234205

And now, from NIST (no doubt with the assistance of the NSA), as listed at: http://www.nsa.gov/ia/industry/crypto_elliptic_curve.cfm

Key Length in Bits for Approximately Equal Resistance to Brute-Force Attacks, per NIST/NSA
Symmetric Encryption 80 112 128 192 256
Elliptic Curve encryption 160 224 256 384 521
RSA (asymmetric encryption) 1024 2048 3072 7680 15380

And now, some specific examples of cipher algorithms:

Symmetric Block Ciphers
Algorithm Key Length Notes
DES 56 No longer considered secure enough against brute-force attacks, because 256 = 72,057,594,037,927,936 is not a big enough search space given today's hardware. See http://www.eff.org/descracker/ for dedicated DES-cracking hardware designs.
FEAL 64
LOKI 64 LOKI89 is insecure, LOKI91 is secure
CAST 64 Patented, proprietary algorithm, requires license
RC2 64 Patented, proprietary algorithm, requires license
Skipjack 80
3DES 112
IDEA 128 Patented, requires license
Khafre 128 Patented, requires license
REDOC 160 Patented, requires license
AES 256 Advanced Encryption Standard, replaces DES. Algorithm was originally known as "Rijndahl"
GOST 256 Gosudarstvennyi Standard Soyuza SSR (Russian for "Government Standard of the Union of Soviet Socialist Republics", which dates it)
Blowfish 448 See: http://www.counterpane.com/blowfish.html
Khufu 512 Patented, requires license
RC4, RC5 1024 Patented, requires license
CA-1.1 1088 Patented, free for non-commercial use. Requires parallel hardware for practicality on large data sets.
Asymmetric Block Ciphers
Algorithm Key Length Notes
DSA 1024 U.S. government standard for a component of digital signatures
El Gamal 1024-4096 Used in GNU Privacy Guard. Key size is variable, 1024-4096 used in GnuPG.
RSA typically 1024-4096 At least 2048 bits are recommended today. Your computer should be fast enough to comfortably handle that key size or larger, while your attacker's computer might be getting close to being able to discover 1024-bit keys

Hash Functions

Put very simply, a hash is a type of checksum. Cryptographically useful hashing functions have some specific characteristics.

The output of a hash function H() is always of the same length, regardless of the length of the input string. Stated informally, a cryptographically strong hash is one for which a very slight change in the input causes a drastic change throughout the output, so you can't search for input strings producing a given hash value.

Here is how to calculate the SHA-1 hash of the opening sentence of the CRC Handbook of Applied Cryptography:

$ echo -n "Cryptography has a long and fascinating history." | openssl sha1
d1557b56f7274faf16d38fdf9241035ab71da799 

We can replace the initial C with K and change only one bit out of the 48-character string because of the ASCII encodings:
C = 0x43 = 01000011
K = 0x4b = 01001011
That's a change of just one bit out of 384 bits in the input. 383 of the 384 bits, or 99.739583%, are identical. But the resulting hash is very different:

$ echo -n "Kryptography has a long and fascinating history." | openssl sha1
7215da622638fc3abdab4ffc91c6d2cf4c5599fe 

Compare those two nearly identical inputs:
Cryptography has a long and fascinating history.
Kryptography has a long and fascinating history.
And now compare the two very different hash results, in hexadecimal (base 16):
d1557b56f7274faf16d38fdf9241035ab71da799
7215da622638fc3abdab4ffc91c6d2cf4c5599fe
and (at least partially) in binary:
1101000101010101011110110101011011110111001001110100...
1110010000101011101101001100010001001100011100011111...

Note that H-1(), or the inverse of a hash function, does not exist! There is no way to start with the hash of an input, and find an input for which you are certain that it was the input:

A collision is a pair of different inputs that produce the same hash output. That is,
input1 != input2     but
H(input1) = H(input2)
A random collision would be any two inputs with equal hashes. A chosen-target collision is more difficult to find, that is when input1 or H(input1) is already specified, and the attacker needs to find input2.

Another description of a good hash function is in "Deploying a New Hash Algorithm", by Steven M Bellovin and Eric K Rescorla, which follows the description of section 9.2 of the CRC Handbook of Applied Cryptography, Menezes, van Oorschot, and Vanstone, available as a free PDF download:

Computers get faster and memory gets cheaper. So, a 2128 search space is no longer considered to be adequately large, and so the MD5 algorithm is not strong enough against brute-force attacks with sophisticated modern hardware. What's worse, researchers have published results for collisions (and see Advanced topic below for even more on this):

Use at least SHA-1 instead. An algorithm considered at least as secure as MD5 (they're effectively of a family of algorithms), and a 160-bit hash and therefore a much larger search space:
2160 = 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 versus
2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456

Yes, collisions are awfully unlikely to happen randomly, and still very difficult to find on purpose, but it is possible. Here are two 128-byte data files with identical MD5 hashes! But their SHA-1 hashes differ. Notice how similar the two files are, just six regularly-spaced bits differ:


 $ openssl md5 crypto-file*.dat
 MD5(FILE1.DAT)= a4c0d35c95a63a805915367dcfe6b751
 MD5(FILE2.DAT)= a4c0d35c95a63a805915367dcfe6b751
 $ openssl sha1 crypto-file*.dat
 SHA1(FILE1.DAT)= 2783c4ff4a3f20d25f2598a8b052b890c37dcac4
 SHA1(FILE2.DAT)= 3c35410823ef00b12d020981c1cf8564c0f89bcc
 $ hexdump -C crypto-file1.dat
 00000000  d1 31 dd 02 c5 e6 ee c4  69 3d 9a 06 98 af f9 5c  |gibberish      
 00000010  2f ca b5 87 12 46 7e ab  40 04 58 3e b8 fb 7f 89  |deleted...
 00000020  55 ad 34 06 09 f4 b3 02  83 e4 88 83 25 71 41 5a  |
 00000030  08 51 25 e8 f7 cd c9 9f  d9 1d bd f2 80 37 3c 5b  |
 00000040  96 0b 1d d1 dc 41 7b 9c  e4 d8 97 f4 5a 65 55 d5  |
 00000050  35 73 9a c7 f0 eb fd 0c  30 29 f1 66 d1 09 b1 8f  |
 00000060  75 27 7f 79 30 d5 5c eb  22 e8 ad ba 79 cc 15 5c  |
 00000070  ed 74 cb dd 5f c5 d3 6d  b1 9b 0a d8 35 cc a7 e3  |
 00000080
 $ hexdump -C crypto-file2.dat
 00000000  d1 31 dd 02 c5 e6 ee c4  69 3d 9a 06 98 af f9 5c  |gibberish
 00000010  2f ca b5 07 12 46 7e ab  40 04 58 3e b8 fb 7f 89  |deleted...
 00000020  55 ad 34 06 09 f4 b3 02  83 e4 88 83 25 f1 41 5a  |
 00000030  08 51 25 e8 f7 cd c9 9f  d9 1d bd 72 80 37 3c 5b  |
 00000040  96 0b 1d d1 dc 41 7b 9c  e4 d8 97 f4 5a 65 55 d5  |
 00000050  35 73 9a 47 f0 eb fd 0c  30 29 f1 66 d1 09 b1 8f  |
 00000060  75 27 7f 79 30 d5 5c eb  22 e8 ad ba 79 4c 15 5c  |
 00000070  ed 74 cb dd 5f c5 d3 6d  b1 9b 0a 58 35 cc a7 e3  |
 00000080

If you want to play with these:

Note that MD5 is a chained hash function, and this means that if you start with the above two 128-byte files, and append identical sequences to each, the two resulting data files will also have identical MD5 hashes. So, an infinite number of collisions could be based on these.

Advanced topic — in the summer of 2004 some mathematicians presented a paper showing collision results for MD5 and others. See "Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD" by Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu, http://eprint.iacr.org/2004/199/

Someone else then published a paper on weaknesses in SHA-0, which was a precursor to SHA-1. Actually, SHA-0 was the winning submission for the U.S. government's Secure Hash Algorithm, but the NSA specified a few tweaks for reasons they (of course) did not explain. For this intermediate work, see: "Near-Collisions of SHA-0" by Eli Biham and Rafi Chen, http://eprint.iacr.org/2004/146/

At the Crypto 2005 conference the following summer, Xiaoyun Wang, Yiqun Yin, and Hongbo Yu (the first and third authors of the preceeding summer's paper) published attacks against full SHA-1. Then, at an informal meeting at that conference, Xiaoyun Wang announced new results: the time complexity was even less than the papers (so far) had announced.

And, Lucks and Daum have generated Postscript files that exploit the attack — the files have the same hash, although they contain contradictory messages!
http://th.informatik.uni-mannheim.de/people/lucks/HashCollisions/

What does all that mean? The difficulty of finding random or chosen-target collisions is not as hard as we thought it was for anything in the family of hash functions including MD5 and SHA-1.

Now, how broken is SHA-1, really? The work factor for finding a collision of an ideal 160-bit hash would be like a brute-force search of 280 but they have found a way to do it in only 263 steps, where:
280 = 1,208,925,819,614,629,174,706,176
263 = 9,223,372,036,854,775,808
So we conclude:

No need to panic, but new systems should be designed around SHA-256 or SHA-512. Trusted implementations of SHA-256 and SHA-512 are available as part of the OpenSSL package, both command-line tools and shared libraries. And cryptologists should be (and are) working on a replacement for the whole SHA family.

In February 2007 NIST announced a competition to replace the current FIPS-180-2, which now specifies SHA-1, and SHA-224, SHA-384, and SHA-512. FIPS-180-2 was issued in 2002, so the SHA family did not last as long as might have been expected. The new hash algorithms will be selected in a process like that used for AES, the Advanced Encryption Standard, defined by government standard FIPS-140-2. The plan is for submission by late 2008, final selection in late 2012. See http://www.nist.gov/hash-function/.

Also see The Hashing Function Lounge, at http://paginas.terra.com.br/informatica/paulobarreto/hflounge.html

hookah 1 hookah 2 Neither of these is what this section is about. Really.
That's just flavored tobacco, in the Silk Road Cafe in Göreme, Turkey.
Cryptographic Hash Functions
Algorithm Key Length Notes
MD1 128 Computationally inefficient...
MD2 128 Cryptographically flawed...
MD3 128 Computationally inefficient...
MD4 128 Cryptographically flawed...
MD5 128
SHA-1 160 "Secure Hash Algorithm", but use SHA-256 or SHA-512
RMD160 160
SHA-256 256
Snefru 256
GOST 256 Gosudarstvennyi Standard Soyuza SSR
EKS-Blowfish 448 Based on the Blowfish cipher
SHA-512 512

Digital Signatures

If you send a message and its hash, all you prove is that you know how to calculate a hash!

Instead, send the message, and the result of encrypting the hash with your private key (that result is called a digital signature). The digital signature is:
E( H(message), Kprivate )

Anyone can calculate the hash of the received message, and compare that to decrypting the digital signature with your public key. If the two are the same, you are the sender, and the message wasn't altered.

Digitally signing a message:
                                            +---------+
                                          { | Message |
           +----------------------------> { |  body   |
           |                              { |         |
           |   +---+   +---+                +---------+
  Message -+-->| H |-->| E |--> Digital --> |Signature|
               +---+   +---+   signature    +---------+
                         ^
                         |
                         kpriv

Verifying a digitally signed message:

    +---------+      +---+
    | Message | }--->| H |------------+
    |  body   | }    +---+            |
    |         | }                     |
    +---------+      +---+            v
    |Signature| }--->| D |----> Are they the same?
    +---------+      +---+    If so, the message is valid
                       ^      and the sender is legitimate!
                       |
                      kpub

For a less technical explanation of how this can be used with e-mail messages, see my page referenced in my .signature file, explaining it to people puzzled by their clueless mail tool's report of "unknown attachment".


Hashed Message Authentication Code (HMAC)

The sender and receiver share a secret key. HMAC is formed as hash(message+key). Sender transmits message and HMAC. Receiver performs same hash(received+key) to verify message integrity and sender authentication.

Generating HMAC:

    +----------+
    |          | }
    | Message  | }    +---+
    |          | }--->| H |---> HMAC
    +----------+ }    +---+
    |Secret Key| }
    +----------+

Transmit only the message and HMAC to the receiver.

       Sender                  Receiver
    +----------+
    |          |
    | Message  | =================>
    |          |
    +----------+

    +----------+
    |   HMAC   | =================>
    +----------+

Verifying HMAC:
    +----------+
    | Received | }
    | Message  | }    +---+
    |          | }--->| H |---> hash
    +----------+ }    +---+      ^
    |Secret Key| }               |
    +----------+                 |
                                 |
    +----------+                 |
    | Received |  <--------------Compare these.
    |   HMAC   |                 If equal, then this was a valid
    +----------+                 message from the authentic sender.

Note that a digital signature requires a hash of the entire message followed by an encryption of the hash. If you are only sending one large message, the work required for the hash overwhelms that for the encryption, and there is no real computational advantage to an HMAC.

An HMAC requires only a hash. If you are sending a large number of small messages, an HMAC has a computational advantage.

So, digitally sign an electronic mail message, but use HMAC within IPsec to verify all IP datagrams.


X.509v3 Digital Certificates

None of this makes much sense if you cannot be absolutely certain who you are talking to!

A digital certificate is a message that says, "The public key of so-and-so is such-and-such", with that message digitally signed by an entity called a Certificate Authority (CA). Attributes of a CA:

For instance, Verisign, Thawt, and others are CAs whose public keys have been coded into your web browsers by the software developers (and so you must absolutely trust your software providers, too).

When you download a secure page via HTTPS:

  1. The server sends your browser a digital certificate signed by some CA you trust.
  2. Your browser verifies the digital certificate and thus is certain that it really has the public key for some claimed identity.
  3. However, anyone could send you the well-known digital certificate, we must verify that the server really is who it claims....
  4. Your browser generates a random number and encrypts it with that public key. It then sends the result to the server, asking the server to decrypt it with the corresponding private key and send it back. [OK, I have simplified this somewhat for the sake of this explanation, go read the SSL/TLS specifications for the whole story if you care.]

X.509v3 is simply the format of digital certificate that everyone uses.

Also be aware that a digital certificate is proof of identity, but not proof of intent or good will! Anyone could get a digital certificate for the bogus-and-dishonest-bank.com domain, set up a web server, and get you to tell it your credit card numbers over an SSL/TLS connection.


Diffie-Hellman Key Negotiation

So how can you securely negotiate a shared secret key for symmetric cryptography, when you don't have a secure channel? Diffie-Hellman key negotiation saves the day.

We have two players, Ted and Alice. They agree on a base, b, and a modulus, m.

Ted and Alice each pick a secret number, St and Sa, respectively.

Each then calculates a public value from their secret value, by Public = bSecret modulo m as follows:

Ted says:    Pt = ( bSt ) modulo m
Alice says:  Pa = ( bSa ) modulo m
		

Now, because modulus of discrete logarithms is a one-way function — easy to calculate in one direction, but impractical or impossible to calculate in reverse — they can safely publish the base, modulus, and their public values and be confident that even highly motivated mathematicians could not discover their secret values. This includes Ted and Alice themselves — they cannot discover each other's secret values.

However, they can calculate a shared secret using the base, the modulus, the other party's public value, and their secret value. See above regarding X.509v3 digital certificates for issues of making certain that they really have the public key of the other player, and not some "man in the middle" pretending to be the other end.

Ted calculates the shared secret this way:

Kted,alice = [ PaSt ] modulo m
= [ ( [ bSa ] modulo m)St ] modulo m
= [ ( bSa )St ] modulo m
= [ b(SaSt) ] modulo m

Alice calculates the shared secret this way:

Kalice,ted = [ PtSa ] modulo m
= [ ( [ bSt ] modulo m)Sa ] modulo m
= [ ( bSt )Sa ] modulo m
= [ b(StSa) ] modulo m

Note that each player must use the first form of the equation from each pair, as that form uses their secret and the other player's public value. The remaining lines are there for us to work through the equivalence. Note that any calculation of the shared secret requires some secret information, and thus an outsider cannot calculate it.

Since bSaSt = bStSa, the two keys are equal. Since either of the two players must use their secret key in the calculation, it is a shared secret.

Security relies on:

For a trivial example, not secure because of the small base and modulus:


Tools for Cryptanalysis


Cryptographic Nerdcore Rap

Really. It exists. Nerdcore rap, or geeksta rap. See the rhymes of MC Plus+ and others.


Building Your Own One-Time Pad System

OK, so all algorithmic cryptography is tactical — it's only secure until the Bad Guy has enough time to apply their resources to find your key (and see Applied Cryptography for analysis of just how long that is on practical, and some impractical, systems).

However, there is a completely different form of cryptography that cannot be broken — use a one-time pad.

The cipher is based on the Exclusive-OR, or XOR, operation. You have two streams of bits, the input and the key. Use them one bit at a time, the nth input bit with the nth key bit. So, the key stream must be as long as the input stream. In words:
— If the key bit is 0, the output bit is equal to the input bit.
— If the key bit is 1, the output bit is the opposite of the input bit.
Or, in different words:
— If the input and key are equal, the output is 0
— If the input and key are different, the output is 1
Or, as a table:

Key
0 1
Input   0     0     1  
1 1 0

Or, as an example:

Input 1101 0100 1011
Key 0010 1101 1101
Output 1111 1001 0110

A one-time pad is a sequence of random bits as long as the message to be encrypted. It's the key bitstream. The sender XOR's the message with the one-time pad and sends the result. Then — and this is where even major governments mess up — the sender must destroy the one-time pad and never ever use it again. The receiver must have an identical copy of the one-time pad. They XOR the incoming message with that.


             +-----+                                      +-----+
message ---> | XOR | ---> transmitted over the clear ---> | XOR | --> message
             +-----+                                      +-----+
	        ^                                            ^
	        |                                            |
	       OTP                                          OTP
             copy #1                                      copy #2

To get you started, here is an additive stream, a dangerously weak way to generate a pseudorandom sequence (looks random but it isn't) from a relatively short seed or key:

  1. Make the seed a multi-digit base-10 sequence. For example, my old telephone number:
    7657430769
    In this example we are using a 10-digit seed.
  2. Generate the next digit by adding the first two digits modulo 10:
    (7 + 6) mod 10 = 13 mod 10 = 3
    76574307693
  3. Now shift one position to the right:
    (6 + 5) mod 10 = 11 mod 10 = 1
    765743076931
  4. And just keep going:
    (5 + 7) mod 10 = 12 mod 10 = 2
    7657430769312
    (7 + 4) mod 10 = 11 mod 10 = 1
    76574307693121
    (4 + 3) mod 10 = 7 mod 10 = 7
    765743076931217
    (3 + 0) mod 10 = 3 mod 10 = 3
    7657430769312173
  5. When you have enough digits, convert them into bits somehow. Maybe:
  6. Some seeds may be better or worse than others, but all of them will eventually loop. And see the table below, adding to the seed may generate a shorter unique sequence. Most 3-digit seeds produce 168-digit sequences, although a few produce 4-, 24-, and 28-digit sequences. Send me an e-mail if you want a C program that lets you experiment with these.
Seed Unique sequence length
76 12
765 168
7657 1,560
76574 2,343
765743 196,812
7657430 661,416
76574307 15,624
765743076 8,894,028
7657430769 1,736,327,236

And of course, if an attacker suspects that an additive stream might be used, it is pretty easy to figure out if that is the case, and in the process, discover the precise sequence and thus key stream!

So how do you make a one-time pad? Rely on physics, any attempts to generate random sequences with computers will fail.

One method would be digitize the output of a radio receiver tuned to a radio frequency spectrum completely uncontaminated with the correlated signals generated by man-made sources and even some natural sources. For a simple version, place a circuit that looks like the below in a very well-shielded case:

                   High gain
                   Amplifier
                         |\
                         | \    +-----------+         high-order
     zener  +------------|  >---|  A-to-D   |=======> bits discarded
     diode  | resistor   | /    | converter |-----+
  +----|<|--+--VVVVV--+  |/     +-----------+     | low-order bit only
  |                   |          +----------------+---------+
-----                 |          |                          |
 ---                  |          |      +----------+        |
----- battery         |          |in1   |in2       | out    | in
 ---                  |        +------------+  +-----------------+
  |                   |        | Comparator |  | Sample-and-hold |
  +-------------------+        +------------+  +-----------------+
                                     |out               |clock
				     +------------------+
	           		     |
                                     +----------------------> CLOCK
                        	     |
                                     |reset
                                +---------+
  fast, and not +-------+       |         |----> discard the
    necessarily |       |out    |         |----> high-order
         stable |  /\/  |------>|  fast   |----> bits
     oscillator |       |    in/| counter | ...
                +-------+  clock|         |-----------------> OUTPUT
                              	+---------+

What you're measuring here is the time since the output of the noise generator changed slightly. When the comparator output goes high, indicating an ADC state change, you accept the low-order output of the counter as a valid OTP bit. Once that is done, reset the counter to zero, and take the current ADC low-order bit value as the new internal state. The counter must run several orders of magnitude faster than the bandwidth of the high-gain amplifier and the ADC. All power circuits must be well-regulated and battery powered (no 50/60 Hz bias to our output!). You'll need lots of internal shielding, don't let the noise generator (diode and resistor) pick up any noise from the counter clock. Put the battery, diode, resistor, and high-gain amplifier in one shielded case, and put that case and the ADC in another shielded case.

Collect statistics on the output — your output may not exhibit adequate entropy. In other words, it may not be adequately random. You will have to be patient and accept low output bit rates. Run the counter faster, put a low-pass filter after the amplifier, or increase the amplifier gain. Start building your one-time pad early, and make sure to compress your message before encrypting it.

As a dangerous alternative that you must never ever do, disassemble a smoke detector. It will contain a small amount of radioactive Americium (look in the well shielded case with all the warnings against doing this). Radioactive decay is a random process — use a particle detector to time decay events, and base your one-time pad on those times. Something like the low-order bit of the time in microseconds since the previous decay event would serve nicely.

As a somewhat less dangerous alternative to the above, which you should never do yourself (although perhaps you could have your attorney do it on your behalf), substitute Coleman lantern mantles for the smoke detector Americium. Yes, lantern mantles are among the most highly radioactive material mere citizens can purchase on the open market. They have a lot of thorium in them for some reason. There is also some Mexican pottery out there with bright orange glaze containing uranium oxide. The "problem" with these sources is that they are less radioactive and thus the output of your one-time pad generator must be slower.


A better .sig slogan

 
$ echo 'only outlaws will have privacy.' | \
                openssl enc -aes-256-cbc -k 'rot13' | \
                hexdump -C
00000000  53 61 6c 74 65 64 5f 5f  71 2b 6f 83 ab 01 17 6c  |Salted__q+o.«..l|
00000010  ea 33 cc 00 65 71 bc d6  bd 8e 08 af 74 3b ab b4  |ê3Ì.eq¼Ö½..¯t;«´|
00000020  7f 75 5b 1d 44 87 bf 2f  6e 06 43 59 fc 22 d3 2e  |.u[.D.¿/n.CYü"Ó.|
00000030  81 a1 0a 88 6a fe 03 25  31 d1 c6 94 8f 28 3b e1  |.¡..jþ.%1ÑÆ..(;á|
00000040 

When cryptography is outlawed,
5361 6c74 6564 5f5f 712b 6f83 ab01 176c
ea33 cc00 6571 bcd6 bd8e 08af 743b abb4
7f75 5b1d 4487 bf2f 6e06 4359 fc22 d32e
81a1 0a88 6afe 0325 31d1 c694 8f28 3be1


Back to the Security Index


Home Page Site Map Public Key E-Mail
Use /bin/vi! Hosted on OpenBSD
Hosted on Apache Valid XHTML 1.1! Valid CSS!
© Bob Cromwell Jul 2008. Created with /bin/vi, hosted on OpenBSD with Apache.    Root password available here