Let's limit this analysis to guessing SSH passwords. SSH service is available for routers, and it can be added to Windows systems. But the majority of hosts running SSH service are some variety of Unix. So the hacker should conclude that the system is probably Unix and therefore:
The attack will probably use two components — a program probably written in C to automate the connection and interaction with targeted SSH servers, and a data file listing the login/password pairs to use. The program also needs to be told which targets to attack, in terms of a list of hostnames and/or IP addresses or as a range or CIDR block of IP addresses. That target set could be specified in the same data file, in a separate data file, or on the command line, depending on the attack code design. See my detailed analysis of an intrusion for an example of actual attack code. Let's say that the following is the plan for a simple attack:
| Target host | |||
| target1 | target2 | target3 | |
| Login/password guess | root/password | root/password | root/password |
| root/admin | root/admin | root/admin | |
| root/letmein | root/letmein | root/letmein | |
| operator/password | operator/password | operator/password | |
| apache/password | apache/password | apache/password | |
Let's also assume that the three target hosts are within the same organization, and they are collecting their syslog messages in one place for analysis. Remember that the syslog message from the SSH daemon records success or failure for the event and the login used, but it does not (nor should it) record the password guess. The above would record that there were three guesses for the root password on each target host, but not what those guesses were.
The attack will leave a trail in the logs in one of four forms, depending on the attack code design. This is the first way we can start to distinguish between and recognize attacks. I can name these patterns based on how the attack progresses through the target table:
The attacker could throttle the rate of the attack in an attempt to avoid attention. You see this once in a while (see the later examples), but usually the attack runs as quickly as possible. See the above observation that the attack is probably launched from compromised systems, so detection is unfortunate but not critical. The SSH daemon's intentional rate throttling and interaction with security mechanisms like the PAM library will be the limiting speed factor, so expect to see no more than one guess per 1 to 4 seconds per thread of attack execution.
If we extract attack sequences from our aggregated logs, one sequence per pair of attacker host and target host, we can categorize the attack design:
Further information captured in the syslog data includes the client TCP port number. There might be some information contained in that sequence, some way of categorizing the attacking host operating system and its patch level (for example, are the client port numbers randomized or sequential?), but the meaning would be largely obscured by other simultaneous client activity occuring on the attacking host but not observable.
Finally, once in a while you will see a very simple attack. One attacking host makes just one guess for the root account on each target. What would be the point? There are two possible explanations:
One thing that does not matter is the order of the guesses. The automated attack will go through its entire list of target hosts and login/password guesses, saving any information about successful guesses for later use. The precise sequence does not matter to the designer or the user of the attack code.
As for the set of target hosts and their ordering, you do not really know how many hosts are in the set because it may contain many others not at your organization. What's more, what you can observe would often be very difficult to explain. The set of targets may be in what appears to be random order, neither numerical by IP address nor alphabetical by host name. If you have several hosts configured identically except for strictly sequential IP address and host names differing by only a single character (e.g., research1, research2, research3, etc), you will find to your surprise that given attacks only hit randomly selected subsets in random order. Even if you could somehow see the complete list of targets, it seems unlikely that it would be of any analytical help.
The list of accounts, however, can be observed from a single target and provides a way to recognize similar attacks. The person using the attack code could simply use a list included with the program itself, they could create their own list, or combine lists from several sources. Distinctive patterns observed include:
On the next page we see that it may be easy for a human to generalize a few similar attacks as "Single-threaded horizontal scan with about a hundred guesses for root, then five or six guesses each for common Unix system accounts, then two guesses each for an alphabetical list of common Spanish names starting with alberto."
However, we will also see that the attacks will probably be similar, not identical, and this makes the automatic clustering and classification more difficult.
Smart hackers attacking high-value targets will do some research and thinking when they design their attack. Usually, however, the attack sequences do not make much sense. A site in the United States is attacked by a host in Brazil attempting to guess passwords for thousands of logins based on German names, none of which exist on the target host. The attacking host is just a handy tool for a hacker who is randomly or erraticly selecting targets. Once in a while they may happen to select a target for which their attack is relevant, and in a few of those cases, they may be successful.
|
Previous: The Attacker's Perspective |
Next: Real Data and Common Patterns |
| Back to the start: The main page | |
|
|
|
|||||||||
|
|||||||||
|
| © Bob Cromwell Feb 2012. Created with /bin/vi and ImageMagick, hosted on OpenBSD with Apache. Root password available here, privacy policy here. |