nyawc

A web crawler that gathers more than you can imagine.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
Programming Language
Topic
- Security

Project description

Not Your Average Web Crawler

N.Y.A.W.C is a Python library that enables you to test your payload against all requests of a certain domain. It crawls all requests (e.g. GET, POST or PUT) in the specified scope and keeps track of the request and response data. During the crawling process the callbacks enable you to insert your payload at specific places and test if they worked.

Installation
Crawling flow
Documentation
Minimal implementation
Testing
Issues
License

Installation

First make sure you’re on Python 2.7/3.3 or higher. Then run the command below to install N.Y.A.W.C.

$ pip install --upgrade nyawc

Crawling flow

You can define your startpoint (a request) and the crawling scope and then start the crawler.
The crawler repeatedly starts the first request in the queue until max threads is reached.
The crawler adds all requests found in the response to the end of the queue (except duplicates).
The crawler goes back to step #2 to spawn new requests repeatedly until max threads is reached.

Please note that if the queue is empty and all crawler threads are finished, the crawler will stop.

Documentation

Please refer to the documentation or the API for all the information about N.Y.A.W.C.

Minimal implementation

You can use the callbacks in example_minimal.py to run your own exploit against the requests. If you want an example of automated exploit scanning, please take a look at ACSTIS (it uses N.Y.A.W.C to scan for AngularJS client-side template injection vulnerabilities).

You can also use the kitchen sink (which contains all the functionalities from N.Y.A.W.C.) instead of the example below. The code below is a minimal implementation of N.Y.A.W.C.

$ python example_minimal.py
$ python -u example_minimal.py > output.log

# example_minimal.py

from nyawc.Options import Options
from nyawc.QueueItem import QueueItem
from nyawc.Crawler import Crawler
from nyawc.CrawlerActions import CrawlerActions
from nyawc.http.Request import Request

def cb_crawler_before_start():
    print("Crawler started.")

def cb_crawler_after_finish(queue):
    print("Crawler finished.")
    print("Found " + str(len(queue.get_all(QueueItem.STATUS_FINISHED))) + " requests.")

def cb_request_before_start(queue, queue_item):
    print("Starting: {}".format(queue_item.request.url))
    return CrawlerActions.DO_CONTINUE_CRAWLING

def cb_request_after_finish(queue, queue_item, new_queue_items):
    print("Finished: {}".format(queue_item.request.url))
    return CrawlerActions.DO_CONTINUE_CRAWLING

options = Options()

options.callbacks.crawler_before_start = cb_crawler_before_start # Called before the crawler starts crawling. Default is a null route.
options.callbacks.crawler_after_finish = cb_crawler_after_finish # Called after the crawler finished crawling. Default is a null route.
options.callbacks.request_before_start = cb_request_before_start # Called before the crawler starts a new request. Default is a null route.
options.callbacks.request_after_finish = cb_request_after_finish # Called after the crawler finishes a request. Default is a null route.

crawler = Crawler(options)
crawler.start_with(Request("https://finnwea.com/"))

Testing

The testing can and will automatically be done by Travis CI on every push to the master branch. If you want to manually run the unit tests, use the command below.

$ python -m unittest discover

Issues

Issues or new features can be reported via the GitHub issue tracker. Please make sure your issue or feature has not yet been reported by anyone else before submitting a new one.

License

Not Your Average Web Crawler (N.Y.A.W.C) is open-sourced software licensed under the MIT license.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
Programming Language
Topic
- Security

Release history Release notifications | RSS feed

This version

1.8.2

Feb 1, 2019

1.8.1

Mar 10, 2018

1.8.0

Nov 3, 2017

1.7.11

Oct 31, 2017

1.7.10

Sep 24, 2017

1.7.9

Sep 20, 2017

1.7.8

Sep 16, 2017

1.7.6

Aug 20, 2017

1.7.5

Jul 29, 2017

1.7.4

Jul 21, 2017

1.7.3

Jul 21, 2017

1.7.2

Jul 20, 2017

1.7.1

Jul 20, 2017

1.7.0

Jul 14, 2017

1.6.5

Jul 10, 2017

1.6.4

Jul 10, 2017

1.6.3

Jun 28, 2017

1.6.0

Jun 28, 2017

1.5.2

Jun 26, 2017

1.5.1

Jun 4, 2017

1.5.0

Jun 4, 2017

1.4.7

May 11, 2017

1.4.6

May 11, 2017

1.4.5

May 2, 2017

1.4.4

May 1, 2017

1.4.3

Apr 30, 2017

1.4.2

Apr 28, 2017

1.4.1

Apr 26, 2017

1.4.0

Apr 26, 2017

1.3.0

Apr 5, 2017

1.2.3

Apr 2, 2017

1.2.2

Mar 26, 2017

1.2.1

Mar 25, 2017

1.2.0

Mar 12, 2017

1.1.0

Mar 9, 2017

1.0.1

Mar 6, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyawc-1.8.2.tar.gz (28.3 kB view hashes)

Uploaded Feb 1, 2019 Source

Hashes for nyawc-1.8.2.tar.gz

Hashes for nyawc-1.8.2.tar.gz
Algorithm	Hash digest
SHA256	`f40accb78bd2108312753e82e11beb356413dda9d7dd00ebb79f645418583e13`
MD5	`5de68b0879a6be2e53f367ee2684dc3c`
BLAKE2b-256	`301dc3450607bb047b2566f280628904ce456d57f45e2e1733f028755a9d4a18`