Abstract
Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this study, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04%. We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as additional features, the true positive rate significantly improves by 30.3% (from 51.47% to 81.77%), while the accuracy increases by 11.84% (from 71.20% to 83.04%).
Original language | English |
---|---|
Title of host publication | 2020 IEEE Conference on Communications and Network Security, CNS 2020 |
Editors | Chunxiao Jiang |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Number of pages | 9 |
ISBN (Electronic) | 9781728147604, 9781728147598 |
ISBN (Print) | 9781728147611 |
DOIs | |
Publication status | Published - 2020 |
Externally published | Yes |
Event | IEEE Conference on Communications and Network Security 2020 - Online, France Duration: 29 Jun 2020 → 1 Jul 2020 https://ieeexplore.ieee.org/xpl/conhome/9153729/proceeding (Proceedings) https://cns2020.ieee-cns.org/index.html#:~:text=Following%20the%20advice%20and%20guidelines,presented%20at%20the%20virtual%20conference. (Website) |
Conference
Conference | IEEE Conference on Communications and Network Security 2020 |
---|---|
Abbreviated title | CNS 2020 |
Country/Territory | France |
Period | 29/06/20 → 1/07/20 |
Internet address |
Keywords
- classification
- compression
- phishing detection
- web page