Let’s Go Phishing – an Analysis of Real-Life Phishkits
Phishing is a well-known and prevalent threat. By now, most people online are well aware—either from an unpleasant personal experience, corporate training programs, or the news industry in general—of how much damage a simple successful phishing attempt can make. The effects can range from money stolen all the way to complete identity theft or high-profile network compromise. We see more and more illustrations of phish hooks over monitors and masked hackers holding phishing rods – these simple illustrations can sometimes make us wonder how these magical attacks take place.
Phishkits are a simple and effective way of deploying phishing pages at large scale. The original authors provide the necessary tooling for phishing as well as a level of security against those trying to protect people and companies from phishing attacks. Moreover, the possibility of modifying these pages, as the code is open, allows attackers to quickly and easily adapt to login changes, add more evasion options, and take other measures to ensure their success.
In this blog post, I will walk you through how phishkits are leveraged to perform phishing attacks, and how our adversaries attempt to shield themselves from the security researchers trying to identify and block phishing pages before they can incur damage.
The Phishing Chain
Phishing is generally done using online web pages made to trick the target into a false sense of security as I explained in a prior blog. To set up one of these attacks, an attacker needs to go through the following steps:
Domains and web servers are sometimes bought off the Internet using fake or stolen credentials and means of payment. Other times, attackers rely on existing, hacked web pages, that they reuse to host their malicious infrastructure. Getting users to access these fake login pages is generally done using spam e-mail with alarming titles such as “security alerts”, “important documents pending”, or “unpaid invoices.” The focus of this blog post, however, is how attackers generate the fake login pages, attempt to evade researchers, and enjoy the fruits of their work.
Phishkits are pre-made and editable templates that allow an attacker to do several things:
- Deploy fake login pages
- Choose between multiple different templates (depending on target)
- Protect against the prying eyes of security researchers
- Gather and send stolen credentials to the attacker
Phishing is a game of trust. The attacker must gain trust so the target will enter their credentials. This is achieved by creating a sense of urgency to get the user to lower their guard and attention against the phishing attempt, as well as creating a sense of safety by displaying content that appears familiar and secure.
With that in mind, having an Outlook user access their Outlook account on a Gmail login page might trigger a security flag in the target’s mind. What the phishing actor does is pay attention to the person accessing the page and display the best login page possible for them to use. Here is the example in hand from sample above:
This code checks the URL used to access the phishing page for an “email” parameter. Most likely, the phishing URL will look something along the lines of “hxxp://phishing_domain?email=<victim_email>”. The email address provided is checked against the Hotmail, Outlook, and Office365 domains. If any of these is identified, the target receives a Microsoft-looking phishing page or a generic one if neither domain matches.
A second sample, b8dea8d46652270b2b74a7a69844045886433a5b800bc7476422d1a8ea3f0b45, contains different login pages per service:
The decision is left to the user accessing the web page, via a set of links. Sample source code and screenshot:
One more example, from sample f467363a53c938b3ef28a59d6ae30bee2ac8dcbf6ccc46335633a711269c9062. This phishkit asks the user for their email access to download their documents:
Once an e-mail address has been entered, further code will determine which fake login page the user will be redirected to by determining a provider and which web pages are to be served.
Notice the deliberately incorrect spelling – this is just one of the anti-security tricks the attacker resorts to in order not to trigger warnings from different types of security devices.
Gathering Stolen Data
Before we dig into the specifics of how security researchers and devices are thwarted from identifying these pages, I’ll mention the quick and dirty way for attackers to gather data – email. This makes sense since storing all the data in a database of sorts might lead to data loss – the hosting provider supplying the database might be warned and could possibly delete the machine. With e-mail, it’s much easier for attackers to have a temporary repository that they can quickly access via the simplest means. One example, from sample 5de1fa8bddc321a18fcbb39da85f1645f8cf7f1c1c992d92d06934761c454636:
After the victim has inserted their credentials, the attacker quietly gathers them, adds some geolocation information and e-mails them to a fixed Outlook address that has been redacted in the above image. The attacker probably runs some automation tools that will access this email address and pull the latest data into a database of their own convenience, one that can’t be taken down as easily by the law or the security industry.
You might notice “FUDPAGES [.] RU” somewhere in there. Using Google to find out more information about the domain helps us understand that this isn’t some innocent code that’s been adapted by a malicious party, but rather someone deliberately creating tools for malicious purposes.
In fact, this individual service has been pinpointed by other researchers before.
Evading Security Researchers
With phishing being such a large threat to individuals and organizations, great effort is being undertaken to stop such attempts from functioning. When security researchers identify phishing pages, the net result is that the page loses effectiveness, can be blocked, and might be advertised as dangerous and/or taken down. Therefore, attackers must take steps to reduce their footprint and thwart automatic crawlers, security products, and security researchers.
We’ve seen one such example in sample f467363a53c938b3ef28a59d6ae30bee2ac8dcbf6ccc46335633a711269c9062 above, where spelling tricks – such as replacing “I”s with “1”s and “o”s with “0”s was done. If reading that was a burden, you can imagine it’s even harder to spot these differences in a browser window. An automated system relying on fixed string checking for “login”, “Yahoo”, or other terms could also be misdirected by this trick.
Another means of reducing attention is discouraging automated crawling bots from accessing and indexing these pages. After all, if we could simply search for “Comcast login” on Google and get all the phishing pages, our job would be much easier.
This is achieved by using something called the “Robots Exclusion Standard” – a de facto standard that was created in the early ‘90s to warn automated bots against accessing different parts of websites for different reasons – avoiding system overloading and/or making sure the robot wouldn’t break something, for example. The standard specifies a format for a “robots.txt” file that is hosted on the web server. The file lists some allows and disallows on the website and it is up to the robot agent to respect them.
Well-behaving bots, such as most search engine crawlers, respect these. This means that the attacker simply can provide a “robots.txt” file to keep us from doing a web search for phishing pages:
Sample a9507c954dd3349e43830af368a22eb76377f3b5bfe11954072938e871361bc6 comes with a “blocker” script that does a set of “security” checks to protect against researchers before supplying the phishing page content. The sample is a Bank of America phishing kit.
The code does a check on the domain that the remote IP address belongs to and returns a 404 – Not Found – error message if certain keywords are found, listing crawlers, security companies, and other entities.
The next step is an IP check inside a list of banned IP address ranges, which most likely belong to entities similar to the ones above.
The last step is a check on the User-Agent string of the request, a string that is generally used to identify the browser type, for cross-browser compatibility reasons, but can also be used to identify automated bots and tools.
So far we’ve seen this sample using the robots.txt file and the web page source code to filter unwanted accessors out. The authors, however, also include a .htaccess file, a special type of Apache Web Server file generally used for web server configuration. In this case, using a couple of checks on the web site access request, the actor denies access to anyone who might give away the phishing page.
The first set of checks regard the Referer HTTP header. This special header allows a web server to know if the incoming user was redirected from a different website to generate useful statistics – say, knowing that most of your visitors come from Facebook or an advertising site. In this case, however, if the field contains one of the strings “google.com”, “paypal.com” or “firefox.com”, the result is a 403 Forbidden page as explained by the RewriteRule action [F] on line 7 of the snippet:
Another check comes for the User-Agent header described a bit earlier. If the User Agent field contains some strings, the accessing client gets redirected to a different website on line 12.
Next up comes a long list of different User-Agent strings as well as another Referer header check for Google SafeBrowsing, the service providing navigation security in Google Chrome and other browsers. All of these receive a 403 Forbidden message and can’t see the phishing page content.
Another sample, b8dea8d46652270b2b74a7a69844045886433a5b800bc7476422d1a8ea3f0b45, does similar checks but relies solely on PHP code.
There are multiple PHP scripts that do several things. First, the remote IP address, date, and user-agent string are logged to a file.
The second file, “netcraft_check.php” does a simple check for a specific user-agent, likely used by an anti-phishing provider. If the user-agent matches the string, the phishing page won’t be displayed.
The “blacklist_lookup.php” script checks if the visitor’s IP address matches some fixed IPs and ranges in a given file. If so, the phishing page is not displayed and the researcher has likely been thwarted. Example IP addresses and ranges include large hosting providers – such as Amazon, Google Apps and others, security companies – Kaspersky, Bitdefender, OpenDNS and others, as well as smaller providers from different countries, possibly because of their use in security research work or the fact that the attacker simply isn’t interested in that specific area of the world.
Here at the Application and Threat Intelligence (ATI) research center, we spend our time investigating and understanding these threats so that our customers don’t have to. However, a basic idea of how phishing works behind the scenes can only help an organization improve its security posture. The phishing pages we identify are part of our ATI subscription service, powering Ixia’s ThreatARMOR as one of our customers’ many lines of defenses against phishing and other attacks.