Abstract
Because of the colossal growth of internet, most of the users have changed their preference from traditional shopping, banking etc. to online mode. This paved the way for a lot of cybercrimes including phishing into existence. The attackers try to extract sensitive/personal details such as user ID, passwords and debit card/credit card information by disguising themselves as reliable websites. Identifying whether the Uniform Resource Locator (URL) of a website is legitimate or phishing is a difficult task because it exploits the user's vulnerabilities. Although many products are available for detecting phishing websites, they are just making use of heuristic approach and black lists and hence they can't prevent phishing in a more effective way. A system that detects phishing websites in real time has been proposed in this paper. It uses five different classification algorithms with two different feature sets using natural language processing and word vectors to identify which performs better. After analyzing the accuracy of different machine learning classification algorithms like naive bayes, logistic regression, support vector machine, decision tree and random forest using different features, it has been found that random Forest algorithm with features based on natural language processing has performed better with an accuracy of 97.99.
Original language | English |
---|---|
Pages | 336-340 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 12 Mar 2020 |
Event | 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) - India, Palladam, India Duration: 12 Dec 2019 → 14 Dec 2019 |
Conference
Conference | 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) |
---|---|
Country/Territory | India |
City | Palladam |
Period | 12/12/19 → 14/12/19 |
Keywords
- Phishing
- Natural language processing
- Machine learning
- Word vectors
- Classification
- Cyber security