Predict categories based domain names and it's content
Project description
This package used Shallalist dataset to train the model. Scrapped homepages of the domains mentioned in above dataset. This package predicts the category based on the domain name and its content.
Install
We strongly recommend installing piedomains inside a Python virtual environment (see venv documentation)
pip install piedomains
General API
domain.pred_shalla_cat will take array of domains and predicts category.
Examples
from piedomains import domain domains = [ "yahoo.com", "forbes.com", "xvideos.com", "last.fm", "facebook.com", "bellesa.co", "marketwatch.com" ] result = domain.pred_shalla_cat(domains) print(result)
Output -
name pred_label label_prob used_domain_content all_domain_probs 0 yahoo.com recreation 0.229020 True {'adv': 0.03176470588235294, 'aggressive': 0.0... 1 forbes.com news 0.575000 True {'adv': 0.010590500641848523, 'aggressive': 0.... 2 xvideos.com porn 0.348249 False {'adv': 0.004716507777220271, 'aggressive': 0.... 3 last.fm music 0.229545 True {'adv': 0.002181818181818182, 'aggressive': 0.... 4 facebook.com recreation 0.200815 True {'adv': 0.006381039197812215, 'aggressive': 0.... 5 bellesa.co porn 0.957209 True {'adv': 0.00033715441672285906, 'aggressive': ... 6 marketwatch.com finance 0.627273 True {'adv': 0.001249639527059502, 'aggressive': 9....
Functions
We expose 1 function, which will take array of domains and predicts category.
domain.pred_shalla_cat(input)
What it does:
predicts category based on domain and its content
Output
Returns panda dataframe with label and probabilities
Contributor Code of Conduct
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.
License
The package is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for piedomains-0.0.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e260963deb747ddea1834e4fe98c5365b2b0d9fa0c6e423269f6a28664b44e8 |
|
MD5 | fe898159908bb7e27e42d1faad6a7951 |
|
BLAKE2b-256 | 8c96eadef27e06ca858b529bcd43aba8f17e0d34c1c44e7a620b308cb6205625 |