Devon Greene
Senior Security Researcher
Blog

MALWARE DELIVERY SECRETS: RTF OBFUSCATION

July 26, 2016 by Devon Greene

Whether you are a blue team guardian or red team ninja, you need to understand the tricks of the trade when it comes to weaponized RTF documents. In this blog post we will walk through RTF obfuscation techniques that assist malware in bypassing current-day security controls. By understanding these techniques, you can increase your success of invading a digital stronghold or defending it.

Typically, RTF documents in phishing campaigns are one of the following:

  • A different Microsoft Office document type with its extension changed to RTF
  • A true RTF document containing a common exploit found on exploit-db or other site
  • A true RTF document containing embedded content via Package Object
  • A true RTF document containing instructions to assist an attacker
  • Something that will grant your company free publicity for 30 days or longer

RTF Obfuscation Techniques

For those who have not reviewed the RTF specification, you will see that there are a lot of opportunities for masking and delivering payloads. The screen shot below is an RTF document containing an embedded package object. Those not familiar with the package object are encouraged to read Sean Wilson’s article on this topic as a delivery vector for malware. We will use this as our base example to work with as we learn about RTF obfuscation. One thing to note in our example is that the \objclass keyword is not technically necessary, and is generated by default by MS Office. However, for demonstration purposes we will obfuscate it as it does expect #PCDATA as its destination (explained later.)

1

The package object is commonly used in malicious document delivery and seen in the wild. Researchers have spoken out in the past about how it is weaponized in addition to other security concerns. (See Dropping Files Into Temp Folder Raises Security Concerns and Rich Text Malware)

File Extension Tampering

This technique is effective at bypassing filters that check for specific file extensions, which a user can define in their Windows OS.  As mentioned earlier, it’s common to receive malicious documents that appear to be true RTF documents, but are actually doc file types with RTF extensions. This also is true in the opposite case, where an RTF document has its file extension changed. In our testing, the following matrix will show you compatible file extensions based on the version of word:

2

To give everyone a perspective on how often this is used, here’s the breakdown of over a thousand samples we’ve seen in the wild:

3

To be transparent, this data is simply a snapshot in time. It represents a small collection of samples that were reported since the beginning of the year until mid-June. A few things to take away from this are:

  • No DOCHTML, DOT, DOTHTML, WBK, or WIZ extensions were observed
  • .doc is possibly used more often because it is a file type most people may recognize over .rtf.
  • Attackers/Researchers are trying other extension types to determine exploitability

MIXeD CaSE

This technique is still used to attempt bypassing filtering technologies and applies to RTF documents as well; the key is learning where it can be used. You can use mixed case wherever #PCDATA (ascii) is expected. It is important to understand that control words in most cases should not be tampered with, but the corresponding data typically can be. So in our example, the following adjustment is still successful:

4

Hex \’45scaping

Thankfully, security prevention technologies are better at detecting mixed case variations of certain payloads. If we want to obfuscate this further, we can leverage hex escaping. In RTF documents, hex escaped characters are represented as \’## where the hash marks correspond to a character’s hex value. This technique can only be applied to rtf tags expecting #PCDATA. Continuing with our example, we will escape just the first A with \’41. You will quickly see that the whole word does not need to be hex escaped, creating more complexity and increasing the difficulty for detection technologies.

5

Unicode \u0045scaping

In addition to supporting hex escaped characters, Unicode is supported as well. The format to escape Unicode in RTF documents is a prepended \u followed by four hexadecimal characters representing the Unicode value. Here we continue building on our example by Unicode escaping the second A with \u0061:

7

RTG Version Tampering

There was a great article released by Paul Rascagneres about the analysis of a malicious RTF document containing the header value \rtvpn as its RTF version. This resulted in a few analysis tools failing to parse correctly. Although the specification asks for \rtfN to define the RTF document and version, it appears that only \rt is required in Microsoft Word. Visually you can see even the syntax recognition is thrown off, as now our \rtf tag has turned from teal to magenta!  It should be noted that \rtf is required for WordPad and possibly other RTF readers. Applying this technique to our example leaves us with:

8

Wh it e Spa  ce Eva         sions

One of the tricky concepts to grasp with RTF documents is how white space is processed.  The RTF specification states that, “CRLFs should be ignored by RTF readers except that they can act as control word delimiters.” In addition, most whitespace appear to be ignored in hex streams. Here we utilize \r, \n, \t and \s to break up the hex stream following the \objdata control word.

8.5

Do note that using new line characters even seems to work with \objclass parameters.

Fictitious Control Words

Once rule writers catch-on to your tactics of mixed cases, hex escaping, whitespace, and unicode escaping, then what? One of the last tricks in the bag is fictitious control words. The RTF specification states that, “If an unknown control word is preceded by '{\*', then it starts an ignorable destination group.” Long story short, it allows for other RTF writers to create their own control words while maintaining portability between other RTF readers. We are able to take advantage of this feature by inserting {\*\Meow} to further obfuscate our payload:

9

Bin Substitution

Mixed Case, Hex Escaping and Unicode Escaping are good techniques to use for RTF control words expecting #PCDATA (ascii) as a corresponding value. Whitespace and Fictitious Control words are great for any data type. However, what evasion technique are effective with just #SDATA (hex) data? The answer is Bin Substitution!

The \bin tag can be used to define binary data to be utilized. The syntax is \binN where N is the number of bytes following the control word separator to treat as binary data. Applying it to our example, we will obfuscate 5061636b616765 (Package hex encoded) and replace it with \bin4 Pack616765.

10

It should be noted that attempting this in WordPad will not work; this appears to only be supported in MS Word.

Nesting

Nesting involves applying extra sets of curly braces around valid RTF blocks or in random places.  We will use more fictitious control words and nesting to further obfuscate the payload. An example of this can be seen below:

11

In The Wild (Examples and Exercises)

I always appreciate a good blog post that allows me to challenge myself and help me explore the vastness of cyber security. Now that you have a better understanding of these obfuscation techniques, it’s time to analyze some real-world examples. Before we do, I want to thank www.VirusShare.com  for additional examples provided. Here is a list of hashes that should be easily accessible that demonstrate the techniques listed in this blog post. As a fun exercise, your goal should be to identify which technique is being used. A challenge to regex wizards — can you easily create a rule that does not generate a false positive

Good luck and Punch On!

Leverage Subscription Service to Stay Ahead of Attacks

The Ixia BreakingPoint Application and Threat Intelligence (ATI) Subscription provides bi-weekly updates of the latest application protocols and attacks for use with Ixia platforms.