twistml

TWItter STock market Machine Learning package

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 2.7
Topic
- Scientific/Engineering :: Information Analysis

Project description

TwistML

TwistML is a package that makes it easier to work with raw twitter data for machine learning tasks, like predicting changes in the stock market.

TwistML implements a pipeline that includes filtering of the twitter data, preprocessing, feature extraction into several feature representations (bag of words, sentiments, Doc2Vec), regression / classification using algorithms from the sklearn package, and model selection / evaluation.

The API documentation is available at TwistML’s PyPI page. A more usage focused documentation is coming soon, until then you can get the full package from BitBucket (also linked at the PyPI page) and check out the experiments folder for some usage examples.

TwistML was developed as part of my master’s thesis and I hope to keep improving it afterwards.

Installation

You can use pip to install TwistML like so:

$ pip install twistml

Please make you sure you have numpy, scipy and gensim installed as well. I have opted out of adding them to the install_requires as this has caused problems in my own tests on windows machines. (For numpy the problem is described here.) So these packages will not be installed automatically by pip.

Known Issues & Planned Improvements

Implement a DateRange class and replace all occurences of fromdate, todate, dateformat.
Implement find_files() without dateranges at all. It should be possible to simply process all files within a directory (also recursively)
TwistML currently assumes raw twitter data to be avaialble as one json file per day. Make sure the internet-archive’s file scheme is supported as well
Add support for hourly time resolution instead of daily only.
Evaluation subpackage can only deal with binary classification. Possibly explore adding multiclass.
The way logging is currently set up is weird and should be reworked.
gensim’s LabeledSentence is deprecated, use TaggedDocument instead

Changes

Version 0.9

Changed status to Beta
Added API documentation generated via sphinx and numpydoc
Doc2VecTransformer now supports iterative training (see: http://rare-technologies.com/doc2vec-tutorial/)
Regression evaluation can now treat predictions as binary classifications and evaluate AUC and F1
Changed some command line scripts to have more intuitive usage
various small fixes

Version 0.2.4

ATTENTION: Some of these may break existing code!!

renamed combine_tweets.py to combine.py
added support for stacking of features
classification targets are now 0 / 1 instead of -1 / 1
added toydata module -> create some toydata for testing
added F1-Score to classifcation evaluation
added additional window functions: window_stack and window_element_avg

Version 0.2.3

Improved long_description generation
Fixed CHANGES.rst

Version 0.2.2

Added sentiment features based on TextBlob sentiments

Version 0.2.1

Added functionality for complex category subsets to tml-generate-features
Also improved documentation for tml-generate-features (on cmd line as well as docstring)
improved test coverage

Version 0.2.0

Changed Development Status to Alpha
Removed Sentence2Vec as that functionality is included in current gensim versions’ Doc2Vec class
Added Changelog

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 2.7
Topic
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

This version

0.9

Feb 21, 2016

0.2.4

Feb 4, 2016

0.2.3

Jan 20, 2016

0.2.2

Jan 20, 2016

0.2.1

Jan 19, 2016

0.2.0

Jan 18, 2016

0.1.25

Jan 16, 2016

0.1.24

Jan 14, 2016

0.1.23

Jan 13, 2016

0.1.22

Jan 11, 2016

0.1.21

Jan 11, 2016

0.1.20

Jan 11, 2016

0.1.19

Jan 11, 2016

0.1.18

Jan 6, 2016

0.1.17

Jan 6, 2016

0.1.16

Jan 6, 2016

0.1.15

Jan 6, 2016

0.1.14

Jan 6, 2016

0.1.13

Jan 6, 2016

0.1.12

Jan 5, 2016

0.1.11

Jan 5, 2016

0.1.10

Jan 5, 2016

0.1.9

Jan 5, 2016

0.1.8

Jan 4, 2016

0.1.7

Jan 4, 2016

0.1.6

Dec 22, 2015

0.1.5

Dec 21, 2015

0.1.4

Dec 21, 2015

0.1.3

Dec 21, 2015

0.1.2

Dec 18, 2015

0.1.1

Dec 17, 2015

0.1

Dec 14, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twistml-0.9.zip (30.7 MB view hashes)

Uploaded Feb 21, 2016 Source

Hashes for twistml-0.9.zip

Hashes for twistml-0.9.zip
Algorithm	Hash digest
SHA256	`5f1413435be5a26f9f8d46e868f1d0408d9e2b27064efffb2700287193609d44`
MD5	`c2f06fbe6f51b66f68af99a97d1d72b1`
BLAKE2b-256	`24cf93b72c40dfa67eff3bfacfbd05a57d0853e2778f9dc965372943c569619f`