Tracking the Hide and Seek Botnet
Hide and Seek (HNS) is a malicious worm which mainly infects Linux based IoT devices and routers. The malware spreads via bruteforcing SSH/Telnet credentials, as well as some old CVEs. What makes HNS unique is there’s no command and control server; instead, it receives updates using a custom peer-to-peer network created out of infected devices.
Each HNS infected device runs a UDP server on a port which is either provided upon infection, or randomized. Newly infected devices are given a list of IP and port combinations which correspond to other HNS infected devices (known as peers). Infected devices maintains a list of other peers which has a limited size based on available RAM (usually around 512).
Until the peer list is full, a device will send peer request messages to peers in its’ peer list. Upon receiving a peer request, the recipient will do two things: Firstly, it will add the sender’s IP and port to its own peer list; then it will reply with a peer from its’ own peer list. Using this technique, infected devices will discover other infected devices and maintain connectivity with them.
In order to receive updates, each bot periodically asks peers in its’ peer lists for their config version number. When a bot finds a peer with a version number higher than its own, it downloads the new config from said peer. To perform an update, the botnet operator only needs to upload a new config to a single infected device; the device will then distribute the updated config to all its peers, which in turn will do the same.
Normally, tracking a P2P botnet would be pretty simple. New peers can be found by sending peer requests to known ones; thus, recursively sending peer requests to all peers should eventually result in discovery of the entire botnet. Unfortunately, HNS has some counter measures.
Above is the code responsible for handling incoming peer requests. If the recipient’s peer list is full (which it almost always is), the sender’s IP is used as a list index [line 7]. Essentially, we will always get the same response when issuing multiple peer requests to the same target from the same IP.
We could query a peer to find a new peer, then query that new peer to find another, and so on; however, if any peer returns a peer we already have, or a dead one, the process will grind to a halt.
In testing, I only got about 4 new peers from each starting peer before a a non-responsive one was returned. There is some give though.
Peer List Churn
Here is an snippet from the code responsible for updating the peer list. If a request from a peer not already in the peer list is received, but there is no free space in the list, the above code is hit.
The code checks if the lower 7 bits of the timestamp are zero (which occurs every 128 seconds). If so, a random peer in the list is overwritten with the new peer, and a mutex is set.
Due to the fact there are more peers than peer list entries, the code causes the peer list to overwrite 1 peer almost every 128 seconds. Assuming the peer list is limited to 512 peers, every ~18+ hours all entries should have been replaced or moved.
So, with the peer list constantly churning, we can slowly build up a bigger list. Firstly we recursively send peer requests until we get a dead peer (usually after 4 iterations). Now we wait a few hours, then recursively query all the previously found peers again. This should result in our peer list multiplying by ~4 every ~18 hours.
A New Challenger Arises
After kindly providing me with a sample to reverse, one researcher said he wondered how long it would take for my crawler to be discovered by theirs. I bet them around 5 minutes.
Probably about 5 mins. UDP allows for some neat tricks to speed up propagation.
— MalwareTech (@MalwareTechBlog) December 15, 2018
Yeah, maybe that’s why the peer reporting was throttled in that family. Ping me when you fire it up :-), I’ll set up a stopwatch.
— Adolf Středa (@StredaAdolf) December 15, 2018
An article about their crawler stated that it took a few days to amass 1,000 peers. In order to get noticed in 5 minutes, I’d need to discover enough peers, and get myself into enough peer lists that one would relay my IP to them.
Rapid Peer Discovery
Getting more than 1 peer response from the same peer, within 18 hours, was the first task. If you remember earlier, I said a single peer replies to the same IP with the same response. But what about multiple IPs?
The sender IP is being used as the index into the peer list, and every peer list entry is unique. If we send a request from 2 different IP addresses, we will get back 2 different peers. There is no set order for the peer list, so each peer we query could return a different 2 peers.
My server allows for up to 3 IP addresses, so I can get 3 different responses from each peer. With 3 IPs we’d be able to to query at a maximum rate of 3n; but dead peers, and the increasing chance of receiving a peer we already have, slows the rate dramatically.
For a test, I gave the crawler a single peer IP and told it to query all known peer every second. This was the result.
The rate of peer discovery grows exponentially until around 500 peers, then tapers off rapidly. This is likely due to many peer lists being limited to around 512 entries. Higher up-time peers propagate farther, so most of the smaller peer lists are composed of the same high up-time peers.
In order to aid peer discovery, we need to force ourselves into some peer lists. By existing in multiple peer lists, we will get discovered by newer peers, causing them to connect to us.
If a peer’s peer list isn’t full, every peer a request is receives from will be added. All we need to do is port forward the port we’re sending requests from and we’ll start seeing incoming requests.
If the target’s peer list is full, then we have to deal with the code which ensures only 1 peer can be overwritten every 128 seconds.
The way the code work, it’s a simple race for first to send a request when the low 7 bits of the timestamp are 0. We’re already sending requests at a much higher rate than normal bots; but we can still tip the odds further in our favor. By timing our requests to be delivered right when the low 7 bits of the timestamp are 0, we can nearly always be first past the post.
As a result of high request volume, some clever timing, and a bit of luck, we can rapidly propagate our IPs throughout the botnet.
The Final Test
For this test I let the crawler run for exactly 5 minutes, then disabled all outgoing requests. As a result, only peers whose peer lists contained my crawler IP would continue to communicate with us.
This is a graph of unique IPs in the 5 minutes after disabling outgoing requests.
Total is the total number of unique IP addresses; responsive means unique IP addresses which sent a request within a rolling 60 second window.
As you can see, we made it to around 450 peer lists in 5 minutes (about 35% of all online bots). The total count continues to climb while other online peers propagate our IP farther. The responsive count drops suddenly because the crawler IP is eventually removed from peer lists due to inactivity.
I think this is more than enough for other crawlers to notice me, but we could do better.
Possible Improvements? Everything was done from a $5 cloud VPS, but a bare-metal server would be much more ideal. Greater bandwidth, higher throughput, lower latency, and more IPs, would allow us to propagate to the entire botnet in less than 5 minutes.
There are also much more aggressive crawling techniques that could be utilized; however, some of the infected devices are very feeble and might end up crashing under the stress.
In my years of tracking P2P botnets, one fact has always remains true: it is impossible to stop a determined security researcher from mapping out the entire botnet. Over the past few decades there has been many P2P botnet which attempt to stop crawlers, but all have failed. Simply put, any limitations placed on crawling also limit bots ability to discover new peers. Botnet operators must decide between making peer discovery difficult and having a stable botnet.
Hide and Seek’s attempt at preventing crawling leads to cases where bots essentially wonder off and get lost (which i guess is one way to win at Hide and Seek).
Many IoT devices run on ram disks, meaning infection cannot survive a system reboot. Lack of persistence causes peers die at a rapid rate, but because peer discovery is limited bots can end up with a peer list full of dead peers. In fact, according to a Virus Bulletin presentation, a previous iteration of the botnet simply collapsed under its own weight.
As of today, HNS has approximately 6,000 infected devices with around 1300 online at any one time.
For more in depth stats, check out my botnet tracker.
Special thanks to Adolf Středa and VessOnSecurity for providing samples and peer lists to analyze.