Data
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25958/85p1-4w32&rft.title=Matthew Gaber: Peekaboo&rft.identifier=https://drive.google.com/drive/folders/110rGGbOGglQzN_Jt8rL2UpnKU7kZtvxE?usp=drive_link&rft.publisher=Edith Cowan University&rft.description=Cyber-attacks continue to evolve, increasing in frequency and sophistication where Artificial Intelligence (AI) is becoming essential in detecting modern malware. However, the accuracy of AI in malware detection is dependent on the quality of the features it is trained with. Static and dynamic analysis of malware is limited by the widespread use of obfuscation and anti-analysis techniques employed by malware authors, where if an analysis environment is detected the malware will hide its malicious behavior. However, Dynamic Binary Instrumentation (DBI) allows deep and precise control of the malware sample, thereby facilitating the extraction of authentic features from sophisticated and evasive malware. We developed Peekaboo, a DBI tool to defeat the anti-analysis techniques and extract authentic behavior from live malware samples. We collected 18,527 malware samples across ransomware, spyware, trojans, botnets, worms, Advanced Persistent Threats (APT) and post exploitation tools where every sample includes type, family, and variant information, for example Ransomware-WannaCry-SHA256. We also collected 1,973 benign software samples for analysis. This dataset contains the results for each sample, that were run for up to 15 minutes, to observe not only the anti-analysis techniques used but also its complete behavior. For each malware sample, the network traffic, every opcode that is executed and every evasive technique that is used are captured.&rft.creator=Helge Janicke&rft.creator=Matthew Gaber&rft.creator=Mohiuddin Ahmed&rft.date=2024&rft_rights= http://creativecommons.org/licenses/by-nc/4.0/&rft_subject=dynamic binary instrumentation&rft_subject=DBI&rft_subject=malware analysis&rft_subject=feature extraction&rft_subject=evasive malware&rft_subject=sophisticated malware&rft_subject=Computer Engineering&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Non-Commercial Licence view details

Access:

Open

Contact Information

Matthew Gaber

Full description

Cyber-attacks continue to evolve, increasing in frequency and sophistication where Artificial Intelligence (AI) is becoming essential in detecting modern malware. However, the accuracy of AI in malware detection is dependent on the quality of the features it is trained with. Static and dynamic analysis of malware is limited by the widespread use of obfuscation and anti-analysis techniques employed by malware authors, where if an analysis environment is detected the malware will hide its malicious behavior. However, Dynamic Binary Instrumentation (DBI) allows deep and precise control of the malware sample, thereby facilitating the extraction of authentic features from sophisticated and evasive malware. We developed Peekaboo, a DBI tool to defeat the anti-analysis techniques and extract authentic behavior from live malware samples. We collected 18,527 malware samples across ransomware, spyware, trojans, botnets, worms, Advanced Persistent Threats (APT) and post exploitation tools where every sample includes type, family, and variant information, for example Ransomware-WannaCry-SHA256. We also collected 1,973 benign software samples for analysis.

This dataset contains the results for each sample, that were run for up to 15 minutes, to observe not only the anti-analysis techniques used but also its complete behavior. For each malware sample, the network traffic, every opcode that is executed and every evasive technique that is used are captured.

Notes

There are three main folders in the linked repository.

  1. The Peekaboo Data folder contains zip files of the timestamped raw json files extracted by Peekaboo for each sample and are organised by the malware family. There is also a csv file generated with analysis.py for each family.
  2. The Peekaboo Network Traffic folder contains zip files of the .pcap files extracted by Peekaboo for every sample organised by family.
  3. The Python Scripts folder contains the Python scripts detailed below.

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover