Rotors of M-209 cipher machine.

How Passwords Work in Linux, macOS, and Windows

How passwords work, how they're stored, how strong (or weak) various operating systems are, and several ways to attempt to break passwords

What is a password? And how can it create more security problems than it solves?

You prove your identity by typing a secret password. The system then tests what you type to see if you know the secret or not. But there's a huge problem: If the system has a list of everyone's password, then someone could steal a copy of that list and masquerade as anyone on the system. To make this secure, we need something that sounds strange at first:

The system must not have any record of what your password actually is, but at the same time it must be able to tell whether or not you know it.

Passwords in more detail

When a user wants access to a computer system, they first claim some identity. The system then requires them to authenticate by typing a secret password that should be known only to the legitimate user of that identity. If they get it right, the operating system starts a login session for them, setting up an environment and associating some credentials of user identity and possibly multiple group identities (the UID and GID, respectively, in Unix and macOS, or the SID in Windows).

It has to be something that the user can generate with the keyboard, obviously, and since we're probably going to need to support remote authentication, where the user needs to login across the network while sitting at some random desktop, we may be constrained to using the character set easily generated by any random keyboard.

Certain characters may be illegal — the <Enter> key signals that the user is submitting the password to the authentication program. Others may be not work in certain situations — the <Tab> key tells a graphical login program "Jump to the next box".

The authentication program will impose length constraints. Versions of Solaris up through 9 silently ignore everything after the first 8 characters. Many U.S. government organizations have a policy that passwords must be at least 12 characters long. Solaris can be configured to insist on at least 12 characters when setting a new password, and it will require the user to type the same 12 or more character string twice when setting the new password, but only the first 8 must be typed at subsequent logins.

Of course, the user's memory and typing skills impose other length constraints! If you require users to frequently choose new, long, complex passwords, many of them will write them down and then leave those Post-It notes stuck to their monitor or the bottom of their keyboard. Since the user can't see what they're typing, and in many systems they don't even get feedback as to how many characters they have typed, long passwords or pass phrases can be very difficult to enter correctly.

Why is it important to choose a custom password?

Almost every device we use these days has some of our personal information stored on it. Keeping in mind that there are always hackers looking for ways to gain access to that information, it is very important to secure any new device with a new password. In other words, change the defaults because lists of default passwords are commonly available. If this is not done as soon as the device is installed, then it is left open to attacks by even the most novice of hackers who can gain access it by scanning the Internet for specific device signatures and trying known default usernames and passwords. This is a very common attack against Internet-connected appliances.

How is the password stored?

User Password
joe mypassword
jane letmein
.
.
.
.
.
.
This won't work!

You cannot simply store the passwords on the system as a list of user names and their current passwords. Even if that list were protected by careful settings of file ownership and permissions, that information is going to get exposed at some point.

You cannot store the passwords as an encrypted list. Encryption can be reversed by an attacker who sees that ciphertext list.

We need a one-way function, something that is easy to calculate in one direction but so difficult as to be impractical to calculate in the opposite direction. Testing the user's input will use the easy direction, while reverse-engineering their password would require solving the impractically difficult version. We need a hash function.

A hash is a cryptographic function that calculates a distinctive "fingerprint" value for a piece of input data. See the hash function page in my "Just Enough Crypto" series for details about hash functions. A hash is easy to calculate, but information is lost in the process and it is impossible to reverse a hash function.[1] Do not store the passwords, store their hashes.

Let's say the user sets a new password of bigsecret. We can easily calculate a hash value for that input. Let's use the SHA2-256 hash function:

$ echo 'bigsecret' | openssl sha256
d7e036aa730b157260e7805114b3aa322a30d53533972b0501fb01a5e3f6973c 

[1] Yes, it really is impossible to reverse a hash operation. There is no such thing as "the inverse of a hash function".

However, each hash function generates output of a fixed length. There is a large but finite number of possible hash values. Meanwhile, there are an infinite number of possible inputs. For most hash functions you could always add one more byte and generate a new arbitrarily large input. Infinity divided by any number, even an enormous one, is still infinity.

Some hash functions place a limit on their input size, for example, SHA-1 can handle inputs no larger than 264 bytes. But its output is only 160 bits long, or 20 bytes. So, on average, there are
(264)/20 = 922,337,203,685,477,580.8
different inputs that could generate any one SHA-1 output hash.

Two different inputs that generate the same hash output are called a collision. Hash functions with bounded input size have enormous numbers of them, hashes accepting arbitrarily large inputs have infinitely many for any given output. But collisions are difficult to find.

They are not as difficult to find as we once thought, at least for the MD1-MD5-SHA1 family of hash functions, as shown by some sharp mathematicians in 2004-2008. See the hash function page in my "Just Enough Crypto" series for details.

The system could store that hash value in the record defining that user. The next time that user tries to login, the system asks for the password. The user types something, then the system calculates the hash of whatever they typed and compares the result to the stored hash. If they are identical, the user must have typed the correct password (or, as suggested by the footnote here, they somehow managed to type a collision string, but that's awfully unlikely based on what we know about hash functions).

This means something that people often overlook: The operating system has no record of what your password is. It only knows what should happen when it does some mathematical processing on the string that is your password.

Since a hash function cannot be reversed, the operating system (or its administrator) cannot figure out what any user's password is.

Or at least the password can't be calculated directly from the stored information. The big problem is that if one person thought up a password that they can remember, another person can probably guess what it is. To be formal and complete we need to realize that a mathematician would argue that is doesn't have to be their actual password, it could be any one of the arbitrarily many (or infinitely many) collisions. But let's wait until we get to the section about breaking passwords to get into those details. First, though....

Where is the password information stored?

Other than early macOS and Windows, the password information is stored in a fairly simple file. ASCII text, one line per user definition, with colons separating the fields in most cases.

Operating system Location
Older UNIX, typically before about 1990 /etc/passwd
Most newer UNIX, e.g., Linux, Solaris, AIX, IRIX /etc/shadow
Moved here from passwd in System V Release 3.2 in 1988, and in BSD4.3 in 1990.
BSD UNIX /etc/master.passwd
Note that the database files /etc/pwd.db and /etc/spwd.db are generated from passwd and shadow, respectively, by the command pwd_mkdb, and updates are pushed into the databases by chpass, passwd, and vipw.
HP-UX UNIX /tcb/files/auth/<initial>/<username>
That is, the hash of the password for user jane is stored in the file
/tcb/files/auth/j/jane
macOS 10.2 and earlier Directly in the NetInfo database. (Ref here)
macOS 10.3 and later Each user has their own file:
/var/db/shadow/hash/uid
or possibly:
/private/var/db/dslocal/nodes/Default/users/username.plist
(Ref here)
Windows In the Security Accounts Manager (SAM) file:
c:\windows\sys*\config\sam

Can we enforce password quality?

Yes! But there are several components on Linux and other UNIX-family operating systems. It gets confusing. Something like:

The file /etc/pam.d/passwd has a line including the PAM rule stack stored in /etc/pam.d/system-auth. Several of the files /etc/pam.d/* include that as a system-wide standard. That way you only have one file to maintain, and the rules are applied consistently, no matter which program handles the password change. The file system-auth is standard on Red Hat, CentOS, and derivatives, but of course it could be anything.

The system-wide standard file likely calls pam_pwquality.so. That module is configured by /etc/security/pwquality.conf.

Older systems may use pam_cracklib.so.

You can specify the use of pam_pwquality.so in the files in /etc/pam.d/*. It will be easiest to manage if you do this in a common file included in several service-specific PAM files. The common file might be /etc/pam.d/system-auth on Red Hat and derivatives, but it could be any file.

Once you have specified the use of pam_pwquality.so, you need to specify how it works. That will be in /etc/security/pwquality.conf and /etc/security/pwquality.d/* (in older distributions it was /etc/pwquality.conf).

You can also specify password requirements in /etc/login.defs and /etc/libuser.conf.

This needs some salt!

A long time ago, by the late 1970s, developers of Unix had realized two things. First, you cannot prohibit a user from selecting a password already in use by some other user. Telling a user, "Sorry, someone is already using that password, try again", tells them how to authenticate as some other user. They don't know which user, but they just have to try that password once for each other user on the system and they will get in!

So, in the interest of security, you have to allow users to happen to have the same password. But you have to leave them unaware of the coincidence. It's not that two users share a password, but instead that they happen to have chosen the same password but they do not realize that.

That led to the second thing the Unix developers realized: If users can see the list of password hashes (and they could do so easily at the time[2]), then an observant and devious user who happened to have chosen one of those coincidental passwords could realize and take advantage of the situation.

Morton's Salt, Sodium Chloride for the home user.

Morton's Sodium Chloride.

[2] At that time, the password hashes were stored in the file /etc/passwd, which must be world-readable so that user programs like ls can decode the numeric user ID of a file owner into its name.

Modern Unix designs (since around 1990) put the password hashes into the file /etc/shadow, which can only be read by the system and the administrative account root. That leaves the passwd file defining almost everything about the user's account except the password!

That doesn't mean that devious users can't see the hashes, but it does force them to work harder. Maybe sniffing network traffic for LDAP messages providing user hashes for distributed authentication.

For the history of the Unix password system design, including the addition of salt, see Password Security: A Case History, (1978-04-03), Morris, Robert; Thompson, Ken; Murray Hill, NJ, USA: Bell Laboratories.

So a salt was added to the process. A salt is a fixed-length random string. A new random salt is generated when a user sets a new password. The system then creates a string made up of the random salt and the user's typed password, and it calculates the hash of that combination:
hash("salt, password")
The system stores both the salt and the resulting hash value for that user.

At subsequent logins or other authentication events, the system asks the user to type their password. It then retrieves their stored salt, combines that with what they typed, calculates the hash of that combination, and compares that to the stored hash.

An early version of this used a 12-bit salt value. Since 212 = 4096, that meant that there was still 1 chance in 4096 that two users happening to select the same password would happen to get identical salts. But the designers very reasonably assumed that the joint probability of two users selecting identical passwords, and getting the same salt, and happening to notice identical stored salts and hashes, and taking advantage of that was low enough not to worry.

Salts used by various operating systems
Operating system Salt bits Number of possible salt values
Windows
macOS before OX X 10.4 (ref re macOS)
0 1
That is, the one empty salt
Older Unix (SunOS, Solaris 2.6 through 2.8, HP-UX, etc) 12 4,096
macOS 10.4 and later 32 4,294,967,296
Unix (Solaris 9 and later, older Linux) 48 281,474,976,710,656
Linux with older Glibc crypt() 96 79,228,162,514,264,337,593,543,950,336
BSD with Blowfish crypt and 128-bit salts,
ported to Solaris 10 and Linux
128 340,282,366,920,938,463,463,374,607,431,768,211,456

How secure was this, at least originally?

That depends on two things. First, users must not use ridiculously obvious passwords, like password, letmein, 12345, and other popular ones. That sabotages any hope we might have had for security.

Second, it must be impractical for an attacker to mount some sort of brute-force attack, calculating the hashes of the possible password strings and looking for matches in a captured list of user password hashes (and salts).

Answering the question of "How secure?" requires knowing the hash functions used.

Traditional Unix

The first 8 characters of the user's password was used as a 56-bit DES key (ignoring the upper bit of each byte, meaning all ASCII). The salt was used to modify some of the internal modules of the DES algorithm. An 8-byte block of null characters (that is, 64 bits of zeros) was encrypted with that modified cipher and the resulting ciphertext output then fed back in as input again. This continued for several rounds, typically 25 but optionally configurable in some situations to more.

The thought was that brute-force searches were not practical given the speed of available computers. At the time this was the main method, from probably before 1980 to about 1990, that assumption was fairly reasonable.

The salt and hash were stored as a 13-character string encoding the 12 bits of salt and 64 bits of hash (56 bits plus 8 parity bits). It was split, using the first 3 characters for the salt and the remaining 10 for the hash, or something similar. In really old (pre-shadow) Unix, this string was the second colon-delimited field in /etc/passwd, in modern systems it's in /etc/shadow (or /etc/master.passwd and /etc/spwd.db on BSD).

From the attacker's viewpoint, attacking this system means a search space of 256 possible passwords (that is, possible DES keys), each of which requires at least 25 DES operations. There are this many possible password strings to consider:
256 = 72,057,594,037,927,936

Modern Unix and "BSD-style" hashing

So-called "BSD-style" hashing is what you have with most BSD implementations, Linux, and Solaris 10 and later. Solaris 9 does not do this by default, sticking instead with the traditional (and much older and weaker) Unix method, but see my OS-specific page for how to reconfigure Solaris to use this stronger method.

This method calculates the hash of the password plus the salt. The salt and hash can now be generated and stored in different formats, so the authentication program must be told how to interpret what it finds there. That second colon-delimited field is now broken into subfields with "$" characters. The first subfield specifies the hashing algorithm:

1 MD5
2a Blowfish hashing used in BSD. Not in mainline glibc but added in many Linux distributions and Solaris 10. This may be indicated by either 2x or 2y.
5 SHA-2-256
6 SHA-2-512

In most of the schemes, the salt and hash will follow as the second and third subfields, respectively. The salt and hash are encoded as printable characters, 6 bits per character, with values 0 through 63 encoded as "./0-9A-Za-z".

The Blowfish hashing scheme uses the second subfield to indicate the logarithm base two of the number of rounds and concatenates the 128-bit salt and the hash in the third subfield.

user1:$1$XKJKd8o8$h/Z+/Ca0MJFWJupSg677c3:...

  MD5 hashing, 48 bits of salt

user2:$2a$08$GUFTFC1QZstGFfD.odt2GERl9TMX21aqKtjFe/tkcFyQz3i19g/Di:...

  Blowfish hashing, 28 = 256 rounds, 128 bits of salt

user3:$6$/jG4/ev7gBQ.Mp34$mbVeopTitAjcJR.j9IoDWso7662HeNQFO7SVX1wjKEqKo0GKXEpLYfzNIFHC31sD7pwjEW.JFRnpuHVu.NqAH1:...

  SHA-2-512 hashing, 96 bits of salt

And yes, you could use a mix of hashing functions for users on one system. Let's say you change to a new method through PAM, /etc/libuser.conf, or another mechanism. As users change their passwords they will individually migrate to the new scheme. For more, see the on-line manual pages for crypt(3) from Linux and FreeBSD.

Windows

Windows has traditionally supported the LANMAN hash, a rather insecure method for authenticating over a poor networking protocol invented by IBM and used by Microsoft despite the availability at the time of far better alternatives. To create a LANMAN hash:

Windows also uses something Microsoft calls the NT hash. This is simply the MD4 hash of the user password. It's MD4 instead of MD5, due to history, although MD4 is a little less secure than MD5 (see RSA's explanation for details).

If the LANMAN hash is still enabled, consider that there are 94 printable ASCII characters, and 256 possible 8-bit patterns if the user is willing to use the numeric keypad. However, 26 possibilities must be removed because of the mapping from lower case to upper case. The attacker then must search the space of possible 7-byte strings made up of those sets, and this will break both 7-character password chunks in parallel. Once the LANMAN password has been found, the NT password is identical with the exception that some or all of its letters may be lower case. The search space is therefore 697 if the user uses the full printable ASCII character set, or 2307 if the numeric keypad is used to enter three-digit codes for each password character:

 697 =      7,446,353,252,589
2307 = 34,048,254,470,000,000

If the LANMAN hash has been disabled, and see Microsoft's support page for how to do this, then the search space is that of the 128-bit MD4 hash:

2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456

macOS

This is like the "BSD-style" hashing, except that it uses SHA-1 instead of MD5. That means a 160-bit output:

2160 = 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976

Note that macOS allows you to configure a system for Windows file sharing in such a way that it also stores the weak LANMAN hashing! However, you do receive and must dismiss warnings about the insecurity of configuring a system that way.

Linux with SHA-2-512 hashing

Recent Linux distributions have gone to SHA-2-512 hashing of the passwords. This is coupled with salts of up to 96 bits. This requires Glibc libraries version 2.7 or later, check your manual page for crypt(3) for support. SHA-256 is also available, but with the same libraries providing SHA-2-512 I doubt that anyone would go out of their way to specifically use SHA-2-256.

user3:$6$/jG4/ev7gBQ.Mp34$mbVeopTitAjcJR.j9IoDWso7662HeNQFO7SVX1wjKEqKo0GKXEpLYfzNIFHC31sD7pwjEW.JFRnpuHVu.NqAH1:...

   6 = "Use SHA-2-512 hashing"
salt = /jG4/ev7gBQ.Mp34 (16 characters of base 64, 96 bits)
hash = mbVeopTitAjcJR.j9IoDWso7662HeNQFO7SVX1wjKEqKo0GKXEpLYfzNIFHC31sD7pwjEW.JFRnpuHVu.NqAH1

Unix with Blowfish hashing

This applies multiple rounds of the Blowfish cipher to generate a hash function, operating on the combination of the salt and the password. A fixed string is used as the cleartext input, the concatenated 128-bit salt and password used as a key, and multiple rounds of encryption are applied (that is, the output ciphertext is fed back in a input). The salt and final output are stored in the file. This is the default method with BSD and is possible (but not always chosen) with Linux and with Solaris 10 and later. See the FreeBSD crypt(3) manual page for more details.

This uses the same subfield marking as shown with "BSD-style" above, with a first subfield of "2a" (or possibly "2x" or "2y") indicating that this method is used for this entry. Yes, you could use a variety of password hashing schemes for your users. This will happen when you change the default method for the system and users then individually update passwords.

joeuser:$2a$08$4x8jQoFkgWYFONvxmn/uY.KhbBT8Uytj71qDyyO/Wzcm/oZklnVg.:...

       2a = "Use BSD-style hashing"
   rounds = 08, so 28 = 256
salt+hash = 4x8jQoFkgWYFONvxmn/uY.KhbBT8Uytj71qDyyO/Wzcm/oZklnVg.

To attack arbitrary passwords and pass phrases, an attacker would have to deal with a search space of 2448, or approximately 7.26×10134. Notice the scroll bar!

2448 = 726,838,724,295,606,890,549,323,807,888,004,534,353,641,360,687,318,060,281,490,199,180,639,288,113,397,923,326,191,050,713,763,565,560,762,521,606,266,177,933,534,601,628,614,656

These numbers are getting enormous, but how much of that search space really matters? That leads to....

How much of the search space is really covered by user passwords?

Just a tiny fraction, even for what are considered strict password complexity policies!

Consider passwords composed of all 94 printable ASCII characters on a standard keyboard: mixed-case letters, numerals, and punctuation marks. There are 9412 possible 12-character passwords based on that character set, and that is a very large number! However, compare that number to the number of possible MD5 128-bit hash values:

Possible 12-char passwords:
9412  =                    475,920,314,814,253,376,475,136

Possible 128-bit hash values:
2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456 

[3] There's another possible issue here — do the possible outputs of the hash functions completely cover all the possibilities? I don't know if that is true for any of the hash functions being discussed here. See my page on searching the hash output space for some speculation on this.

That fairly long and complex password, hard to remember and very hard to type accurately with no visual feedback, only takes advantage of a tiny fraction of the potential search space. [3] The MD5 output space is over 7×1014 times larger! How long would an all-printable-ASCII password have to be to take full advantage of the MD5 output space? 19 or 20 characters!

Possible 19-char passwords:
9419  =    30,862,366,077,815,087,592,879,016,454,695,419,904

Possible 128-bit hash values:
 2128 =   340,282,366,920,938,463,463,374,607,431,768,211,456

Possible 20-char passwords:
9420  = 2,901,062,411,314,618,233,730,627,546,741,369,470,976 

A 20-character password — that is, a random looking password and not a highly redundant pass phrase, and see the next section for the large difference in meaning between the two — would be very difficult to remember and type accurately with no visual feedback.

Taking full advantage of the SHA-1 160-bit hash space is even worse, requiring 24 or 25 characters randomly chosen from the full ASCII keyboard:

Possible 24-char passwords:
9424  =    226,500,146,052,898,041,878,222,437,726,567,560,344,026,218,496

SHA-1 160-bit hash values:
 2160 =  1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976

Possible 25-char passwords:
9425  = 21,291,013,728,972,415,936,552,909,146,297,350,672,338,464,538,624 

[4] Bruce Schneier has a good analysis of password choice patterns and the corresponding attack patterns here. People tend to design a strong password from a seven- to nine-character root plus a common appendage (1 to 4 digits, single symbols, digit plus symbol, two-symbol combinations). The root may have internal substitutions ($ for s, @ for a, and so on) plus upper/lower case switches. More thought (and guessing difficulty) goes into the root than the appendage. Add biographical data and the local dialect (project names, facility names, internal billing codes and so on) for a much higher attack success rate.

"Strong Passwords Revisited" by Lohkee has some very good analysis. You can easily get worried about theoretical attacks, so don't lose sight of reality. Also, make sure you carefully consider the precise form of the attack. My next page on this topic gets into this. What has the attacker observed, what are they trying to accomplish, and how are they doing that? Your concerns vary widely depending on the answers.

Consider human nature. Even if you forced users to have a long password with at least two each of the categories of upper case letter, lower case letter, numeral, and punctuation mark, you know that most users will use more letters than numerals, and more numerals than punctuation marks.[4]

There is a common and very useful trick for converting a memorable sentence into a complex string. For example, turning this sentence:
  It's time for this mess to be fixed, at once!
into this 12-character string:
  It4tm2bf,@1!
However, the result is going to be biased toward starting with an upper case letter and ending with a punctuation mark. The result is pretty random, but not entirely so. And that brings us to measures of randomness....

Passwords, Entropy, and Information

Claude E Shannon, the father of Information Theory

Claude Shannon, the father of Information Theory.

Entropy is a measure of either information content or randomness. Those two concepts may seem unrelated, but think about it this way — greater randomness leads to greater difficulty in predicting what comes next. If you already had a very good idea of what was coming next, then the next piece of a message would only confirm our suspicions. But if it appears to be highly random, and every piece of the message comes as a surprise, then it can carry more information.

The entropy, or randomness, or information content, of a password or pass phrase consisting of L symbols drawn from a set of N possibilities is Llog2N bits per symbol. So:

Symbol set N Entropy
Digits: {0-9} 10 3.32 bits/symbol
Single-case letters: {a-z} or {A-Z} 26 4.70 bits/symbol
Single-case letters & digits: {a-z, 0-9} or {A-Z, 0-9} 36 5.17 bits/symbol
Mixed-case letters: {a-z, A-Z} 52 5.70 bits/symbol
Mixed-case letters & digits: {a-z, A-Z} 62 5.95 bits/symbol
All standard ASCII keyboard characters: 94 6.55 bits/symbol

However, the above measures are for truly random sequences of symbols. Human languages are very non-random! Estimates of entropy per symbol vary widely from one language to another, and even for a given language the estimate depends upon the corpus (or collection of written works) used for the analysis. Work by Claude Shannon and others suggests that typical English prose contains only 1.1 to 1.2 bits of information per letter, so even less than that if you count the space characters between words. A pass phrase of about 133 to 145 letters, or maybe 25 or so words, could contain as much entropy as the output of the SHA-1 hash function. But I don't see how you could accurately type that without seeing what you're typing!

Meanwhile, the author of NIST Special Publication 800-63 Appendix A, dictating password complexity requirements, has now admitted that he really wrote that guidance without knowing much about information security or how passwords work. See the articles in Wall Street Journal and Gizmodo.

So far it would appear that we might have far more mathematical protection than we can use. But as you'll see on the next page, passwords are easier to break than you might expect!

Part Two: How to break passwords


Back to the main Security Page