Generative Adversarial Networks for Maldev
Atlan Team
Introduction:
While many out there may not be familiar with Machine Learning and the various types of ML models that can be developed, this post will focus on GANs - Generative Adversarial Networks and present a short introduction and some suggestions of an approach, that we ourselves are following.
A GAN is defined as:
A generative adversarial network is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in a game. Given a training set, this technique learns to generate new data with the same statistics as the training set.
While this definition is useful sometimes a graphic can say a thousand words as below. Essentially you are pitting one Machine Learning model against another:
While not a huge amount of work has been done in the private sector related to harnessing and weaponising GANs for malware development, the Chinese Academy of Sciences did release a technical paper in 2017, with accompanying code here, and at Hunnic Cyber we had developed our own in-house GAN for Malware Dev in 2020 before a series of events led to that firm shuttering and so we could not release the tool publicly.
What this post, in combination with my upcoming post around attacking Microsoft ATP seeks to present, is a research idea for ambitious hackers & ML engineers to develop a Generative Adversarial Network outlined in the technical signposts below.
Technical Signposts
- You will need to develop your own custom lab to be able to interface with Microsoft ATP - I have outlined in my blog here, how you can get going for free with a lab that you can set up
- While AMSI has been our adversary in malware development until now and has since been baked into the Common Language Runtime and other Windows DLLs, for developing your GAN, AMSI is your friend now and acts as the discriminator allowing the feedback to further train your GAN to become more effective. Every malware sample that you generate, can now be queried against the AMSI API, and you can quickly generate a feedback mechanism that enables your GAN to be fine tuned. Developing your own limited .NET compiler will allow you to generate samples with a limited set of recurring code - for example the shellcode and the shellcode execution elements - and a vast set of template non-malware code that would comprise much of the additional code within your application. Since 2020 there has been progression of all tooling, so you may find that code completion offered by Microsoft can assist you in your adventures!
- While the Chinese Academy of Science's approach was to take existing malware samples, as I have outlined, it appears that Microsoft have made some logical assumptions that are easier to attack. Therefore for the training set, we seek to train our model around developing malware based on the characteristics of NON-MALWARE samples, compiling with ample junk code, real method & function names and so on and so forth. Dynamic evasion is much more complex but our later posts will follow up on this and how we approached that area.
I will leave a few links below that could help you along your journey:
USEFUL LINKS
What’s new about TimeGAN?
Different from other GAN architectures for sequential data, the proposed framework is able to generate it’s training to handle a mixed-data setting, where both static (attributes) and sequential data (features) are able to be generated at the same time.
Less sensitive to hyper parameters changes
A more stable training process, when compared to other architectures.