Traditional Signature Based Detection
Traditional anti-virus detection techniques look for patterns of code that are unique to a malicious executable. The pattern, used to create the detection signature, will be based on several sections within the program. For example, a signature might look to match three 50-byte areas of code, at specific offsets, or locations, within a file. The combination of looking for a specific piece code and the offsets means that a particular signature is unique to an executable.
The challenge when creating such signatures is to ensure that those signatures are unique to the specific file and won't incorrectly detect a 'clean' file as one containing malicious code. A security analyst has to ensure that the areas of code that they have chosen to detect are not part of common libraries; as that would mean that any program making use of such code could be labelled as malicious when it is not. To make the signature unique, it must only detect code that is unique to the malicious file, and does not offer any sort of proactive detection.
Traditional Run-time Host Intrusion Protection (HIPS)
Host Intrusion Protection (HIPS) covers a wide variety of techniques and technologies, but in the context of this article we'll explore the most common. In its simplest form, a HIPS system monitors applications as they are running and looks for unusual or malicious behaviour. The challenge for the system is to distinguish between good and bad applications and with time this is becoming increasingly difficult: the simpler the malware, the harder it is to detect with HIPS systems.
Consider this example, when a user installs a new product, downloading it from the Internet or installing it directly from a CD, the installation application will do several things
- Uncompress the main program files of the application
- Copy those files to various directories on the computer
- Add some files to the main operating system folders so that they are available to other applications
- Make some changes to the registry so that the application runs when the machine starts.
A simple Trojan needs to do exactly the same thing: it may uncompress in memory, create registry keys, copy files, etc. This can result in run-time HIPS solutions inevitably identifying behavior as potentially malicious when it is examining clean applications. Often, some form of user interaction is required to allow legitimate applications to carry out the tasks listed above. As an additional complication, the monitoring is done while an application is running, so any modifications that are made may have an adverse effect on the operating system and stopping the execution may cause issues as well. Of course, this type of run-time analysis can only take place at the desktop or endpoint; it cannot be used to block threats at the e-mail or Web gateway.
Enter Behavioral Genotype
Behavioral genotype technology is a different approach. It uses pre-execution scanning to determine what the functionality of the application is, what behavior it is likely to exhibit, all without allowing the program to run. In addition to the run-time behavior, static characteristics can also be determined to reinforce the identification of malicious behavior. For example, resource information like the publisher - strings embedded in the application - can be used to decide on the validity of the program.
Each individual characteristic is effectively a gene: in biological terms, genes are the building blocks that individual species are made of; in technology terms, they are the building blocks of executable programs.
Leading behavioral genotyping methods thoroughly scan files and extract hundreds of genes for analysis. It's the combination of those genes that enable the identification of new malware. By extracting genes from all the existing malware in the collection, you are able to identify the characteristics and the combinations of genes that appear in malware.
It is also important to look at the genes that are seen in known good files, executables that are known not be malicious. By comparing the combinations that are found in malware, but never in clean files, the risk of false positives (incorrectly identifying a file as malicious when it is not) can be minimized.
For example, malware often uses packers that attempt to compress and hide the contents from anti-virus applications. Packers are compression tools that reduce the size of the executable, but they have the added attraction to malware authors of making the executable file easily modifiable, making traditional signature-based detection very difficult; unless, of course, the anti-virus product can decode the packing algorithm. As soon as vendors add functionality to 'unpack' and detect the packer and its contents, the malware authors move on and use a new packing algorithm.
However, the way in which an application is packed can itself be a strong indication that its content is malicious. In fact, SophosLabs determined that 21% of all malware in its collection is packed, but only 1 in 100,000 clean files are packed.
Other genes include
- Searching for publisher information
- Programming language used
- Accessing the internet
- Copying files
- Adding registry entries
So a simple example of a gene may be as follows: If an application is packed, written in Visual Basic, accesses the Internet and contains references to banking websites, there is a strong likelihood of it being a banking Trojan.
The advantage of this technique is that as the malware authors adapt their techniques, security research arms can
adapt with them. If the author decides to implement a new technique, it is often easily identified as a new gene and used in conjunction with existing genes to effectively detect many new variants of a malware campaign. All of this analysis can be done without executing the code, and therefore can be carried out at the e-mail and Web gateway as well as at the desktop.
A good example of this is the Storm worm
outbreaks that started in October 2006 and continued into February 2007. See figure below. There were many variants,
including Dorf and Dref worms, but one single behavioral genotype identity detected nearly 5000 different unique
variants. Using traditional signature-based techniques, it would have required reactive detection, which would have taken a lot of man power and been much less effective at stopping the first waves of the threat.