Got Malware? — Meet Us

Posted On Sat, 22 Jul 2017 01:06:27 +0000. Filed in ramblings. By Vaibhav Bhandari.

Malware Blues! (Source:

Editor’s Note: This article describes the project “GotMalware?” to explore Malware fingerprinting and visualization techniques, it’s been developed with help from MalwareBytes & Lib13 Inc as part of the Cyber Defenders 2017 Program.

The Problem

Malware infects computer systems as well as mobile devices with malicious software that has the intent to obtain secured private information, delete and modify important information. In our project, we want to identify the fingerprints of different malware by looking at the files of a computer before and after infection. Then, we want to learn how to visualize the effect malware has on a computer system. We eventually would like to explore the desktop Malware Analysis techniques on a mobile phone — especially Android Devices.

What are we trying to do?

Although there is a lot of software available that can detect malware and prevent computer systems from getting infected. Our objective in this project is to observe malware behavior and the footprints it may leave behind, by comparing files and associated signals from a regular test bed environment to files in an infected test bed environment. The testbed will provide us with a platform to understand malware detection better, and to develop tools for the same. We will test three different types of malware to compare the different fingerprints they leave behind.


To accomplish this goal, we first surveyed the current research that exists regarding malware detection as well as the use of machine learning in malware detection.

Here are some of the papers we read:

Our Approach

We are planning to take the following steps:

  1. Learning about required tools: Our internship includes a Java course, but because Python has much better libraries for data analysis and visualization, we decided to learn and use it for our project.

  2. Creating a malware analysis test bed: We are writing a Python program that will index the files (make an organized list of all the files along with their sizes) on multiple virtual machines (software that emulates a mini computer inside of your main computer). Then, it will compare the directories and generate a report that tells the user the modifications in the files caused by the malware.

  3. Infect the virtual machines with different types of viruses and compare the files between the infected machines and a clean machine.

  4. Extract meaningful features from our samples. These features will be the basis of our study; features are what describe something, for example, the features of a house are: number of rooms, area of the house, Price of the house.

  5. Visualize data. Malware is a threat to anyone who uses a computer, but many people have only a vague idea of what is and what the effects can be. We aim to write something that will help people clearly visualize the effect of malware in their computers.

  6. Use machine learning on the prepared dataset.

Why is it beneficial?

Malware is a serious, constantly changing threat. Creating a program that will identify malware, and help people see the effect malware will have on their systems will assist them in seeing the practical effects of malware and make more informed decisions in the future.

Can this be done in a better way?

A bonus part of our project (if time permits) is to use machine learning techniques to identify malware. Because malware is constantly changing to avoid the latest detection techniques, machine learning can be crucial in identifying forms of malware that are not currently known, but are similar to already known strains.

What have we done until now?

Our team has worked with Java Virtual Box to set up Windows 10 virtual machines. We have also studied the programming language Python, by taking the Introductory and Intermediate Python for Data Science courses on DataCamp.

This week, we began writing our code. So far, we have two programs written: one that indexes the files on two virtual machines, and another that compares these directories to determine what files have been changed by the virus.

We have also experimented with other file comparison programs, mainly ‘ExamDiff Pro’ to get an idea of how a file comparison program works and the footprints it might find. Specifically, we used Metasploit to make a malicious pdf, and compared it with a benign pdf in ExamDiff. This will help us learn behaviours of malware so we have an idea of what results we should expect to find when we run our own program.

Our next step is to find three viruses and infect the virtual machines with them.

Code Review — Please?

Following is some of the code we plan to use, please review and advise:

Related posts you may enjoy: