The Kelihos Botnet
A while ago I started writing a series of articles documenting the Kelihos Peer-to-Peer infrastructure but had to pull them due to an ongoing operation. As most of you have probably seen, the botnet operator was arrested a few days ago and the FBI have begun sinkholing the botnet (which will most likely make all my research null & void, as well as kill my Kelihos Tracker 🙁 ).
Botnet Topology
Kelihos is known as a hybrid Peer-to-Peer botnet which mean it uses elements from both centralized botnets (C&C servers) and decentralized botnets (peer-to-peer communication), the reason behind this is that pure p2p botnets are very resilient but harder to control as you rely on bots to distribute messages among themselves, it also means that bots can read messages sent to other bots as for P2P to work a message must be broadcast to the entire network, rather than individual bots. What Kelihos does is uses the p2p network to propagate a list of “job servers” which are essentially C2s, bots will use the P2P network to find the job servers then communicate with the job server to receive commands similar to how a centralized botnet would (If the job servers get shut down the botmaster can just push out a new list out via the p2p network).
The Kelihos botnet is made up of 4 distinct parts:
- Bots – These are just your average boring old infected computers
- Supernodes – Bots which are capable of accepting incoming connections (i.e. not behind NAT or a corporate proxy).
- Job Server – Nginx proxy servers which relay requests to the real C&C.
- C&C server – I don’t think anyone who doesn’t know what one is has made it this far.
When a computer is infected with Kelihos it will try to connect to each IPs in a list of supernode IPs hardcoded inside the bot binary. After a connection to a supernode is established the bot will send it a list of supernodes that the bot knows about and the supernode will respond with a list of supernodes it knows about, the bot will then merge the new list with its own (this is called a peer exchange and is done frequently to ensure each bot has an up-to-date list of supernode IPs).
To receive commands bots will pick an online supernode and send it a job request over HTTP, the supernode will then pick one of the 6 job servers and forward this request to it, relaying any response back to the bot. Due to the fact all supernodes implement this crude HTTP proxy which is used by bots to contact job servers, blocking communication between a bot and a job server would require finding and blocking ~600 supernode IPs, instead of 6 job servers.
Reverse Engineering
This would usually be the part where i document how i reverse engineered the malware, but not today.
The original version of Kelihos used a protocol containing descriptive field names as well as comprehensive debug string, but these were slowly removed with each new version. When I picked up Kelihos it had already remove debug strings and used a pure binary protocol, which meant stepping around in the debugger for hours to try and figure what each field was based on how it was used. To make matters worse Kelihos relies on multiple 3rd party C++ libraries including boost and is heavily object-oriented and the total binary size is about 2 MB (There are generally 2 types of malware developers: those who learn to code by writing malware and those who learn to malware by writing code; the former tend to write very low level C code or if doing a big project that requires C++ they will steer away from libraries and write something akin to C with classes (this is very fast to reverse). On the other hand you have professional programmers who are familiar with C++ to its full extent, and when tasked with writing malware they will produce heavily object-oriented C++ which is less malware and more alternative bloatware, like Kelihos). Funnily enough I actually got the job I’m currently in after the CEO stumbled across my Kelihos tracker and contracted me to implement it for them because none of their reverse engineers wanted to go near Kelihos (If they are anywhere near as blunt as I am, I can imagine the response to being asked to reverse it would be something along the lines of “I’d rather pour hot sauce into my eyes”).
Asides from that a Kelihos reversing article would go on for days, there’s also the fact that the malware isn’t actually that interesting; There’s no code injection, any real attempt to hide, or obfuscation (except for using boost to turn the code into a clusterfuck of OOP and sadness), so I’m going to skip the reversing part of the article.
If you are interested in the specifics of the protocol, most of it has already been posted by Kyle Yang here, so I’m not going to rehash it.
Tracking
For supernode tracking I implemented the peer exchange protocol so I could write code to connect to a supernode and ask it for a list of other supernodes it knows about, then the code connects to each supernode in the returned list and asks them the same question until we have a list of all supernodes on the botnet. After the code has verified a supernode has returned a valid response, it can continue monitoring the supernode’s uptime by sending it periodic SYN requests on port 80.
Due to the fact only about 10% of the bots are capable of acting as supernodes while the other 90% uses them to communicate, to see the full extent of the botnet we need to become one. This was probably the most time consuming part as it involved implementing both sides of the peer exchange protocol (the bot side and the supernode side), then writing a scalable TCP server to handle the high volume of clients (the Kelihos developer uses a Windows feature called IOCP for this, but we use Linux so epoll was the best option). The bot communicates over port 80 using 2 different protocols: Kelihos’ custom key/peer exchange protocol, and HTTP (which is basically just the former wrapped in a HTTP header). HTTP requests are simply proxied to the job servers while key/peer exchange requests must be handled by the supernode.
Initially I had written a Linux daemon to implement the full supernode protocol and be indistinguishable from real supernodes because I figured that after 3 successful takedowns the botnet operator would be a little paranoid, but someone in my IRC called Til- (still no idea who you are but thanks) suggested that the HTTP requests can just be dropped without being kicked from the botnet, so we started ignoring them (this is great because it means we’re not passing commands to the bots which is technically illegal in some countries).
To become supernode I rewrote my supernode crawler to slip one of our IPs into the peer list it sends to supernodes during the peer exchange, which resulted in the supernode eventually adding said IP to the list it gives to other bots. As our IP became shared around the botnet, bots started connecting to us for peer exchanges which allowed us to insert IPs directly into bots peer lists ( now we could use this to get enough of our fake supernodes onto the botnet that all the bots would connect to us at least once per day, but not so many that it lead the botmaster to believe someone is attempting a takedown and go into full paranoia mode).
As for the stats we gathered: until the FBI sinkholing operation is complete, you will still be able to see live data here.