Predict categories based domain names and it's content
Project description
This package used Shallalist dataset to train the model. Scrapped homepages of the domains mentioned in above dataset. This package predicts the category based on the domain name and its content.
Install
We strongly recommend installing piedomains inside a Python virtual environment (see venv documentation)
pip install piedomains
General API
domain.pred_shalla_cat will take array of domains and predicts category.
Examples
from piedomains import domain domains = [ "forbes.com", "xvideos.com", "last.fm", "facebook.com", "bellesa.co", "marketwatch.com" ] result = domain.pred_shalla_cat(domains) print(result)
Output -
name text_pred_label text_label_prob img_pred_label \ 0 forbes.com news 0.575000 recreation 1 xvideos.com porn 0.897716 porn 2 last.fm music 0.229545 shopping 3 facebook.com recreation 0.200815 porn 4 bellesa.co porn 0.962932 shopping 5 marketwatch.com finance 0.790576 recreation img_label_prob used_domain_content used_domain_screenshot \ 0 0.911997 True True 1 0.755726 True True 2 0.416521 True True 3 0.274597 True True 4 0.374870 True True 5 0.366329 True True text_domain_probs \ 0 {'adv': 0.010590500641848523, 'aggressive': 0.... 1 {'adv': 0.002181818181818182, 'aggressive': 9.... 2 {'adv': 0.002181818181818182, 'aggressive': 0.... 3 {'adv': 0.006381039197812215, 'aggressive': 0.... 4 {'adv': 0.00021545223423966907, 'aggressive': ... 5 {'adv': 0.0007271669575334497, 'aggressive': 9... img_domain_probs 0 {'adv': 9.541013423586264e-05, 'aggressive': 1... 1 {'adv': 0.00041423083166591823, 'aggressive': ... 2 {'adv': 0.008832501247525215, 'aggressive': 0.... 3 {'adv': 0.027437569573521614, 'aggressive': 0.... 4 {'adv': 0.0008953566430136561, 'aggressive': 3... 5 {'adv': 0.007870808243751526, 'aggressive': 0....
Functions
We expose 1 function, which will take array of domains and predicts category.
domain.pred_shalla_cat(input)
What it does:
predicts category based on domain and its content
Output
Returns panda dataframe with label and probabilities
Contributor Code of Conduct
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.
License
The package is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for piedomains-0.0.8-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac6744057c272bf420a8d39e8f275a0a161cba63ea848fa44daac0c4d5df15ad |
|
MD5 | 3109158435265cb7afa49d5777537898 |
|
BLAKE2b-256 | 3f806d027b1f0511362dc2cb6140d23eaff594718349a3d6f6f3e02fbf50e798 |