BODMAS Malware Dataset

View on GitHub

Update (10/09/2023) - Since Limin is graduadated, please email his labmate Zhi Chen (zhic4@illinois.edu) and CC Dr. Gang Wang (gangw@illinois.edu) for all the future requests.

Update (12/15/2021) - Malware category information is available at Google Drive

Update (08/29/2021) - Source code is available at: GitHub

BODMAS is short for Blue Hexagon Open Dataset for Malware AnalysiS. We collaborate with Blue Hexagon to release a dataset containing timestamped malware samples and well-curated family information for research purposes. The BODMAS dataset contains 57,293 malware samples and 77,142 benign samples collected from August 2019 to September 2020, with carefully curated family information (581 families).

We extract the feature vectors using the LIEF project (version 0.9.0), the same as the Ember dataset (details can be found here). Each sample is represented as a 2381 feature vector, along with its label (benign or malicious) and malware family if it’s malicious. We also release the original binary for malware samples only.

Further details can be found in our paper “BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware” [PDF], Deep Learing and Security Workshop 2021 (co-located with IEEE Security and Privacy 2021).

If you end up building on this dataset as part of a project or publication, please include a reference to our paper:

@inproceedings{bodmas,
  title = {BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware},
  author = {Yang, Limin and Ciptadi, Arridhana and Laziuk, Ihar and Ahmadzadeh, Ali and Wang, Gang},
  booktitle = {4th Deep Learning and Security Workshop},
  year = {2021}
}

Download

  1. The feature vectors and metadata are open to everyone. Download the data here: Google Drive
    • feature vectors (~250 MB): bodmas.npz
    • metadata (~12 MB): bodmas_metadata.csv
    • They are sorted by the timestamp in the ascending order (i.e., each feature vector corresponds to one row in the metadata file).
  2. We cannot release the original file for the benign software due to copyright considerations. But we will host the original binaries of malware samples.

    To avoid misuse, please read and agree to the following conditions before sending us emails.

    • Please email Limin (liminy2@illinois.edu) Zhi Chen (zhic4@illinois.edu) and CC Gang (gangw@illinois.edu). Also, please include your Gmail address in the body so that I can add you to the google drive folder where the dataset is stored.
    • Do not share the data with any others (except your co-authors for the project). We are happy to share with other researchers based upon their requests.
    • Explain in a few sentences of your plan to do with these binaries. It should not be a precise plan.
    • If you are in academia, contact us using your institution email and provide us a webpage registered at the university domain that contains your name and affiliation.
    • If you are in research (industrial) labs, send us an email from your company’s email account and introduce yourself and company. In the email, please attach a justification letter (in PDF format) in official letterhead. The letter needs to state clearly the reasons why this dataset is being requested.

    Please note that an email not following the conditions might be ignored. And we will keep the public list of organizations accessing these samples at the bottom.

Get Started

Organizations Reguested Our Dataset

  1. Simon Fraser University, Canada
  2. Oracle Labs
  3. Columbia University
  4. Telkom University, Indonesia
  5. University of Alberta, Canada
  6. Orange Inc., France
  7. Beijing Institute of Technology
  8. College Of Engineering Pune, India
  9. University of Salerno, Italy
  10. Shanghai Jiao Tong University
  11. Southeast University
  12. Beijing University of Posts and Telecommunications
  13. Guizhou Normal University
  14. Korea University
  15. GuiLin University of Electronic and Technology
  16. New York University
  17. University of Chinese Academy of Sciences
  18. University of the West of England (UWE) Bristol
  19. University College Dublin, Ireland
  20. Women Engineering College, Ajmer, India
  21. Beijing University of Technology
  22. Air University Islamabad, Pakistan
  23. Eastern Connecticut State University
  24. Yonsei University, South Korea
  25. Arizona State University
  26. Bandung Institute of Technology, Indonesia
  27. University of Southampton, United Kingdom
  28. Xidian University
  29. University of Balamand, Lebanon
  30. The University of Chicago
  31. Xinjiang University
  32. University of Turin, Italy
  33. Punjab University College of Information Technology, Pakistan
  34. Guangzhou University
  35. Middle East Technical University, Turkey
  36. Microsoft
  37. Sana'a University, Yemen
  38. HarfangLab, France
  39. Purdue University Northwest
  40. PSG College of Technology, India
  41. University of Windsor, Canada
  42. Georgia Tech
  43. De Montfort University, United Kingdom
  44. Ghent University, Belgium
  45. Iowa State University
  46. Macquarie University, Australia
  47. Hongik University, South Korea
  48. UiTM Shah Alam, Malaysia
  49. Hanoi University of Science and Technology, Vietnam
  50. Ain Shams university, Egypt
  51. Open University of Catalonia, Spain
  52. Amrita Vishwa Vidyapeetham, India
  53. National University of Science and Technology, Zimbabwe
  54. Nagoya University, Japan
  55. Institute of Information Security, Japan
  56. Heriot-Watt University, United Kingdom
  57. Edinburgh Napier University, United Kingdom
  58. Istanbul University-Cerrahpaşa, Turkey
  59. Zhejiang University
  60. Hanyang University, South Korea
  61. Army Engineering University of PLA
  62. Purdue University
  63. University of Molise, Italy
  64. SharpAI LLC
  65. Silesian University of Technology, Poland
  66. Florida State University
  67. University Of Bath, United Kingdom
  68. National University of Computer and Emerging Sciences, Pakistan
  69. Chungnam National University, South Korea
  70. PeeploTech
  71. Damietta University, Egypt
  72. Queen's University Belfast, United Kingdom
  73. Vilnius Tech, Italy
  74. Indian Institute of Technology Roorkee, India
  75. Beijing University of Civil Engineering and Architecture
  76. University of Quebec in Outaouais, Canada
  77. National Institute of Technology Raipur, India
  78. University of Colorado Colorado Springs
  79. University of Technology and Applied Sciences, Oman
  80. University of Portsmouth, United Kingdom
  81. Brno University of Technology, Czechia
  82. Royal Holloway, University of London, United Kingdom
  83. The University of Alabama in Huntsville
  84. University of Portsmouth, United Kingdom
  85. Wuhan University
  86. Guizhou University
  87. Amrita Vishwa Vidyapeetham, India
  88. Birkbeck, University of London, United Kingdom
  89. GoldenEye Inc
  90. Huazhong University of Science and Technology
  91. Sam Houston State University
  92. Hoseo University, South Korea
  93. East China University of Science and Technology
  94. Xiamen University Malaysia
  95. Pamantasan ng Lungsod ng Maynila, Pilipinas
  96. Sichuan University
  97. Nanjing University of Information Science and Technology
  98. University of Information Technology, Ho Chi Minh City, Vietnam
  99. Seoul National University of Science and Technology, South Korea
  100. University of Science and Technology of China
  101. Tsukuba University, Japan
  102. University of Toronto, Canada
  103. Charles Darwin University, Australia
  104. Zoho Corporation, India
  105. University of Cape Town, South Africa
  106. Sivas University of Science and Technology, Turkey
  107. University of Bari Aldo Moro, Italy
  108. UET Lahore University of Engineering and Technology
  109. Bandung Institute of Technology, Indonesia
  110. Sungshin Women's University,South Korea
  111. Budapest University of Technology and Economics, Hungary
  112. University of Bari (islab-uniba), Italy
  113. Dongguk University, South Korea
  114. People's Public Security University, China
  115. Fujian Normal University, China
  116. Qassim University, Saudi Arabia
  117. Sichuan University, China
  118. Zhejiang Normal University, China
  119. University of Minnesota
  120. Amrita Vishwa Vidyapeetham, India
  121. Indian Institute of Technology Jammu, India
  122. Babes-Bolyai University of Cluj-Napoca, Romania
  123. Texas A&M University
  124. Ho Chi Minh City University of Technology, Vietnam
  125. AnxinSec, China
  126. Czech Technical University in Prague, Czechia
  127. Koç University, Turkey
  128. Telkom University, Indonesia
  129. ShanghaiTech University, China
  130. University of Electronic Science and Technology of China, China
  131. VNU-HCM University of Information Technology, Vietnam
  132. Johns Hopkins University
  133. Umm Al-Qura University, Kingdom of Saudia Arabia
  134. Federal University of Parana, Brazil
  135. University of Sannio in Benevento, Italy
  136. German University in Cairo, Egypt
  137. BRAC University, Bangladesh
  138. University of Piraeus, Greece
  139. ECIT-Queens University Belfast, Northern Ireland
  140. Nanjing University of Posts and Telecommunications, China
  141. National University of Defense Technology, China
  142. Numidia Institute of Technology, Algeria
  143. George Washington University

Contributors

Limin Yang, Ph.D. from UIUC.

Arridhana Ciptadi, Blue Hexagon Inc.

Ihar Laziuk, Blue Hexagon Inc.

Ali Ahmadzadeh, Blue Hexagon Inc.

Gang Wang, Associate Professor at UIUC