Skip to main content

Predict categories based domain names and it's content

Project description

https://ci.appveyor.com/api/projects/status/k0b72xay9i4ufxff?svg=true https://img.shields.io/pypi/v/piedomains.svg Documentation Status https://pepy.tech/badge/piedomains

This package used Shallalist dataset to train the model. Scrapped homepages of the domains mentioned in above dataset. This package predicts the category based on the domain name and its content.

Install

We strongly recommend installing piedomains inside a Python virtual environment (see venv documentation)

pip install piedomains

General API

  1. domain.pred_shalla_cat will take array of domains and predicts category.

Examples

from piedomains import domain
domains = [
    "forbes.com",
    "xvideos.com",
    "last.fm",
    "facebook.com",
    "bellesa.co",
    "marketwatch.com"
]
result = domain.pred_shalla_cat(domains)
print(result)

Output -

                name text_pred_label  text_label_prob img_pred_label  \
0       forbes.com            news         0.575000     recreation
1      xvideos.com            porn         0.897716           porn
2          last.fm           music         0.229545       shopping
3     facebook.com      recreation         0.200815           porn
4       bellesa.co            porn         0.962932       shopping
5  marketwatch.com         finance         0.790576     recreation

  img_label_prob  used_domain_content  used_domain_screenshot  \
0        0.911997                 True                    True
1        0.755726                 True                    True
2        0.416521                 True                    True
3        0.274597                 True                    True
4        0.374870                 True                    True
5        0.366329                 True                    True

                                  text_domain_probs  \
0  {'adv': 0.010590500641848523, 'aggressive': 0....
1  {'adv': 0.002181818181818182, 'aggressive': 9....
2  {'adv': 0.002181818181818182, 'aggressive': 0....
3  {'adv': 0.006381039197812215, 'aggressive': 0....
4  {'adv': 0.00021545223423966907, 'aggressive': ...
5  {'adv': 0.0007271669575334497, 'aggressive': 9...

                                    img_domain_probs
0  {'adv': 9.541013423586264e-05, 'aggressive': 1...
1  {'adv': 0.00041423083166591823, 'aggressive': ...
2  {'adv': 0.008832501247525215, 'aggressive': 0....
3  {'adv': 0.027437569573521614, 'aggressive': 0....
4  {'adv': 0.0008953566430136561, 'aggressive': 3...
5  {'adv': 0.007870808243751526, 'aggressive': 0....

Functions

We expose 1 function, which will take array of domains and predicts category.

  • domain.pred_shalla_cat(input)

    • What it does:

      • predicts category based on domain and its content

    • Output

      • Returns panda dataframe with label and probabilities

Authors

Rajashekar Chintalapati and Gaurav Sood

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

License

The package is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piedomains-0.0.8.tar.gz (2.9 MB view hashes)

Uploaded Source

Built Distribution

piedomains-0.0.8-py2.py3-none-any.whl (3.0 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page