Skip to main content

Learning Point Processes Using Deep Granger Nets

Project description

cynet PyPI Downloads
cynet version
alternate text

See <> for theoretical background


ZeD@UChicago <>


Implementation of the Deep Granger net inference algorithm, described in, for learning spatio-temporal stochastic processes (point processes). cynet learns a network of generative local models, without assuming any specific model structure.

sudo apt-get install python3-tk


from cynet import cynet
from cynet.cynet import uNetworkModels as models
from viscynet import viscynet as vcn
cynet module includes:
  • cynet

  • viscynet

  • bokeh_pipe

cynet library classes:

  • spatioTemporal

  • uNetworkModels

  • simulateModels

  • xgModels

Description of Pipeline:

You may find two examples of this pipeline in your enviroment’s bin folder after installing the cynet package.

Step 1:

Use the spatioTemporal class and its utility functions to fit and manipulate your data into a timeseries grid. The end outputs will be triplets: files that contain the rows (coordinates), the columns (dates), and the timeseries. The splitTS function will help generate rows of the timeseries. Generally, we use this to create timeseries beyond the length of the data in the triplets. We use the triplets to generate predictive models and then split, which have the longer timeseries to evaluate those models.

Step 2:

Run xGenESeSS on the triplets to generate predictive models. The xgModels class can be used to assist in this step. If running on a cluster, set run local to false and calling will generate the shell commands to run xGenESeSS in a text file. Otherwise, xgModels will run locally using the binary installed with the package. The end result are predictive models. Note that example 1 starts at this point. Thus there are sample models provided.

Step 3:

To evaluate the models afterwards, use the run_pipeline utility function. This calls uNetworkModels and simulateModels in parallel to evaluate each model. simulateModels calls the cynet and flexroc binaries. Outputs will be auc, tpr, and fpr statistics.

See example 2 for an example of the entire pipeline.

class spatioTemporal

Utilities for spatial-temporal analysis. Used to fit timeseries onto a grid and output data into xGenESeSS friendly format.

  • log_store (Pickle): Pickle storage of class data & dataframes

  • log_file (string): path to CSV of legacy dataframe

  • ts_store (string): path to CSV containing most recent ts export

  • DATE (string):

  • EVENT (string): column label for category filter

  • coord1 (string): first coordinate level type; is column name

  • coord2 (string): second coordinate level type; is column name

  • coord3 (string): third coordinate level type; (z coordinate)

  • end_date ( upper bound of daterange

  • freq (string): timeseries increments; e.g. D for date

  • columns (list): list of column names to use; requires at least 2 coordinates and event type

  • types (list of strings): event type list of filters

  • value_limits (tuple): boundaries (magnitude of event above threshold)

  • grid (dictionary or list of lists): coordinate dictionary with respective ranges and EPS value OR custom list of lists of custom grid tiles as [coord1_start, coord1_stop, coord2_start, coord2_stop]

  • grid_type (string): parameter to determine if grid should be built up from a coordinate start/stop range (‘auto’) or be built from custom tile coordinates (‘custom’)

  • threshold (float): significance threshold


__init__(self, log_store='log.p', log_file=None, ts_store=None, DATE='Date',
        year=None, month=None, day=None, EVENT='Primary Type', coord1='Latitude',
        coord2='Longitude', coord3=None, init_date=None, end_date=None, freq=None,
        columns=None, types=None, value_limits=None, grid=None, threshold=None)

fit(self, grid=None, INIT=None, END=None, THRESHOLD=None, csvPREF='TS',
    auto_adjust_time=False,incr=6,max_incr=24, poly_tile=False):

    Fit dataproc with specified grid parameters and
    create timeseries for
    date boundaries specified by INIT, THRESHOLD,
    and END or input list of custom coordinate boundaries which do NOT have
    to match the arguments first input to the dataproc

    Inputs -
        grid (dictionary or list of lists): coordinate dictionary with
            respective ranges and EPS value OR custom list of lists
            of custom grid tiles as [coord1_start, coord1_stop,
            coord2_start, coord2_stop]
        INIT ( starting timeseries date
        END ( ending timeseries date
        THRESHOLD (float): significance threshold
        auto_adjust_time (boolean): if True, within increments specified
        (6H default), determine optimal temporal frequency for timeseries data
        incr (int): frequency increment
        max_incr (int): user-specified maximum increment
        poly_tile(boolean): whether or not tiles define polygons

    Outputs -
        (No output) grid pd.Dataframe written out as CSV file to path specified

getTS(self, _types=None, tile=None, freq=None):
    Given location tile boundaries and type category filter, creates the
    corresponding timeseries as a pandas DataFrame
    (Note: can reassign type filter, does not have to be the same one
    as the one initialized to the dataproc)

        _types (list of strings): list of category filters
        tile (list of floats): location boundaries for tile
        freq (string): intervals of time between timeseries columns
        poly_tile (boolean): whether or not input for tiles defines
            a polygon filter

        pd.Dataframe of timeseries data to corresponding grid tile
        pd.DF index is stringified LAT/LON boundaries
        with the type filter  included

    Picks random tile from options fed into timeseries method which maps to a
    non-empty subset within the larger dataset

    Inputs -
        LAT (float or list of floats): singular coordinate float or list of
                                       coordinate start floats
        LON (float or list of floats): singular coordinate float or list of
                                       coordinate start floats
        EPS (float): coordinate increment ESP
        _types (list): event type filter; accepted event type list
        tiles (list of lists): list of tiles to build
            Ex:(list of [lat1 lat2 lon1 lon2]) or tuples (i.e. [(x1,y1),(x2,y2)])
            defining polygons
        poly_tile (boolean): whether input for tile specifies a polygon

    Outputs -
        tile dataframe (pd.DataFrame)

    Returns the optimal frequency for timeseries based on highest non-zero
    to zero timeseries event count

    Input -
        df (pd.DataFrame): filtered subset of dataset corresponding to
        random tile from get_rand_tile
        incr (int): frequency increment
        max_incr (int): user-specified maximum increment

    Output -
        (string) to pass to pd.date_range(freq=) argument

    Returns the tile coordinates of the working as a list of lists

    Input -
        (No inputs)
    Output -
        TILE (list of lists): the grid tiles

pull(self, domain='', dataset_id='crimes', token=None,
    store=True, out_fname='pull_df.p', pull_all=False):
    Pulls new entries from datasource

    Input -
        domain (string): Socrata database domain hosting data
        dataset_id (string): dataset ID to pull
        token (string): Socrata token for increased pull capacity;
            Note: Requires Socrata account
        store (boolean): whether or not to write out new dataset
        pull_all (boolean): pull complete dataset
        instead of just updating

    Output -
        None (writes out files if store is True and modifies inplace)

timeseries(self, LAT=None, LON=None, EPS=None,_types=None,CSVfile='TS.csv',
    THRESHOLD=None,tiles=None,incr=6,max_incr=24, poly_tile=False):

    Creates DataFrame of location tiles and their
    respective timeseries from input datasource with
    significance threshold THRESHOLD
    latitude, longitude coordinate boundaries given by LAT, LON and EPS
    or the custom boundaries given by tiles
    calls on getTS for individual tile then concats them together

    Input -
        LAT (float or list of floats): singular coordinate float or list of
                                       coordinate start floats
        LON (float or list of floats): singular coordinate float or list of
                                       coordinate start floats
        EPS (float): coordinate increment ESP
        _types (list): event type filter; accepted event type list
        CSVfile (string): path to output file
        tiles (list of lists): list of tiles to build
            (list of [lat1 lat2 lon1 lon2])
        auto_adjust_time (boolean): if True, within increments specified
        (6H default), determine optimal temporal frequency for timeseries data
        incr (int): frequency increment
        max_incr (int): user-specified maximum increment
        poly_tile (boolean): whether or tiles define polygons

        No Output grid pd.Dataframe written out as CSV file to path specified
Utility functions for spatioTemporal:
splitTS(TSfile, csvNAME='TS1', dirname='./', prefix='@', BEG=None, END=None,
    Utilities for spatio temporal analysis

    Writes out each row of the pd.DataFrame as a separate CSVfile
    For XgenESeSS binary

    Inputs -
        TSfile (pd.DataFrame): DataFrame to write out
        csvNAME (string): output filename
        dirname (string): directory for output file
        prefix (string): prefix for files
        VARNAME (string): string to append to file names
        BEG (datetime): start date
        END (datetime): end date

    Outputs -
        (No output)

    Utility function

    Converts list into string separated by dashes
    or empty string if input list is not list or is empty

        List (list): input list to be converted


to_json(pydict, outFile):
    Writes dictionary json to file

    Input -
        pydict (dict): ditionary to store
        outFile (string): name of outfile to write json to

    Output -
        (No output but writes out files)

     Utilities for spatio temporal analysis

     Reads in output TS logfile into pd.DF and outputs necessary
     CSV files in XgenESeSS-friendly format

     Input -
         TSfile (string or list of strings): filename of input TS to read
             or list of filenames to read in and concatenate into one TS
         csvNAME (string)
         BEG (string): start datetime
         END (string): end datetime

     Output -
         dfts (pandas.DataFrame)
class uNetworkModels:

Utilities for storing and manipulating XPFSA models inferred by XGenESeSS


jsonFile (string): path to json file containing models

Methods defined here:

__init__(self, jsonFILE):

    Utilities for storing and manipulating XPFSA models
    inferred by XGenESeSS

    append models to internal dictionary

    Utilities for storing and manipulating XPFSA models
    inferred by XGenESeSS

    Calculates the distance between all models and stores
    them under the
    distance key of each model;

    No I/O

    reverse=False, store=None,
    Utilities for storing and manipulating XPFSA models
    inferred by XGenESeSS

    Selects the N top models as ranked by var specified value
    (in reverse order if reverse is True)

    Inputs -
        var (string): model parameter to rank by
        n (int): number of models to return
        reverse (boolean): return in ascending order (True)
            or descending (False) order
        store (string): name of file to store selection json
        high (float): higher cutoff
        equal (float): choose models with selection values
            equal to the given value
        low (float): lower cutoff
        inplace (bool): update models if true
    Output -
        (dictionary): top n models as ranked by var
                     in ascending/descending order

    Utilities for storing and manipulating XPFSA models
    inferred by XGenESeSS

    Extracts the varname for src and tgt of
    each model and stores under src_var and tgt_var
    keys of each model;

    No I/O

    Utilities for storing and manipulating XPFSA models
    inferred by XGenESeSS

    Writes out updated models json to file

    Input -
        outFile (string): name of outfile to write json to

    Output -
        (No output but writes out files)

    Generate dataframe representation of models

    Input -
        scatter (string) : prefix of filename to plot 3X3 regression
        matrix between delay, distance and coefficiecient of causality
    Output -
        Dataframe with columns
        ['latsrc','lonsrc','lattgt', 'lontgtt','gamma','delay','distance']
class simulateModel

Utilities for generating statistical analysis after processing models

  • MODEL_PATH(string)- The path to the model being processed.

  • DATA_PATH(string)- Path to the split file.

  • RUNLEN(integer)- Length of the run.

  • READLEN(integer)- Length of split data to read from begining

  • CYNET_PATH - path to cynet binary.

  • FLEXROC_PATH - path to flexroc binary.

run(self, LOG_PATH=None,

This function is intended to replace the shell script. This
function will use the subprocess library to call cynet on a model to process
it and then run flexroc on it to obtain statistics: auc, tpr, fuc.
    LOG_PATH(string)- Logfile from cynet run
    PARTITION(string)- Partition to use on split data
    FLEXWIDTH(int)-  Parameter to specify flex in flwxroc
    FLEX_TAIL_LEN(int)- tail length of input file to consider [0: all]
    POSITIVE_CLASS_COLUMN(int)- positive class column
    EVENTCOL(int)- event column
    tpr_thershold(float)- tpr threshold
    fpr_threshold(float)- fpr threshold
auc, tpr, and fpr statistics from flexroc.
Utility functions for simulateModel:
def parallel_process(arguments):
    This function takes a model and produces statistics on them. The output is
    saved to a result file with the suffix defined by RESUFFIX. We note that
    arguments needs to be a list of various arguments (detailed below) due to
    the nature of joblib. We expect this function to be called by a parallel
    processing library such as joblib.
        arguments(list) - a list of arguments necessary for the function:
            arguments[0]-FILE(str): path to the model being processed.
            arguments[1]-model_nums(int): Number of models to use in prediction
            arguments[2]-Horizon(int): prediction horizon.
            arguments[3]-DATA_PATH: path to split file.
                Ex: './split/1995-01-01_1999-12-31'
            arguments[4]-RUNLEN(int): the runlength
            arguments[5]-VARNAME(list)-Variable names to be considering.
            arguments[6]-RESSUFIX- suffix to add to the end of results.
            arguments[7]-CYNET_PATH- path to cynet binary.
            arguments[8]-FLEXROC_PATH- path to flexroc binary.

def run_pipeline(glob_path,model_nums,horizon, DATA_PATH, RUNLEN, VARNAME,
                RES_PATH, RESSUFIX = '.res', cores = 4):

    This function is intended to take the output models from midway, process
    them, and produce graphs. This will call the parallel_process function
    in parallel using joblib. Eventually stores the result as 'res_all.csv'.
    Cynet and flexroc are binaries written in C++.
        Glob_path(str)-The glob string to be used to find all models.
            EX: 'models/*model.json'
        model_nums(list of ints)- The model numbers to use. Ex; [10,15,20,25]
        Horizon(int)- prediction horizons to test in unit of temporal
            quantization (using cynet binary)
        DATA_PATH(str)-Path to the split files.
            Ex: './split/1995-01-01_1999-12-31'
        RUNLEN(int)-Length of run. Ex: 2291.
        VARNAME(list of str)- List of variables to consider.
        RES_PATH(str)- glob string for glob to locate all result files.
        RESUFFIX(str)- suffix to add to the end of results.Ex:'.res'
        cores(int)-cores to use for parrallel processing.

        Outputs: Produces graphs of statistics.

def get_var(res_csv, coords,varname='auc',VARNAMES=None):

    This function outputs graphs of the results produced by run_pipeline. The
    graphs concern auc, fpr, and tpr statistics.
        res_csv(str)- path to 'res_all.csv' file produced by run_pipeline.
        coords(list of str)- the coords to consider.
        varname(str)-the variable name to consider. Ex: 'auc'.
            VARNAMES(str)- List of the variable name from the dataset
            to consider.
            Ex: VARNAMES=['Personnel','Infrastructure','Casualties']
class xgModels

Utility for running xGenESeSS to generate models. Used to produce shell commands which will be run on a cluster. Can also be used to run those shell commands locally.

  • TS_PATH(string)-path to file which has the rowwise multiline time series data

  • NAME_PATH(string)-path to file with name of the variables

  • LOG_PATH(string)-path to log file for xgenesess inference

  • BEG(int) & END(int)- xgenesses run parameters

  • NUM(int)-number of restarts (20 is good)

  • PARTITION(float)-partition sequence

  • XgenESeSS_PATH(str)-path to XgenESeSS

  • RUN_LOCAL(bool)- whether to run XgenESeSS locally or produce a list of commands


This function is intended to be called by the run method in xgModels. This
function uses the subprocess module to execute a XgenESeSS command
and wait for its completion.
  command(str): the XgenESeSS command to be executed.
  command_count(int): the command number of this command.

run(self, calls_name='program_calls.txt', workers = 4):

Here we run XgenESeSS. This either happens locally or this function
will output the program calls text file to run on a cluster.
  calls_name(str)-Name of file containing program_calls. Only used if
    RUN_LOCAL = 0.
  workers(int)- Number of workers to use in pool. If none, then will
    default to number of cores in the system.
viscynet library classes:

visualization library for Network Models produced by uNetworkModels based on matplotlib

draw_screen_poly(lats, lons, m, ax, val, cmap, ALPHA=0.6)
    utility function to draw polygons on basemap

    Inputs -
        lats (list of floats): mpl_toolkits.basemap lat parameters
        lons (list of floats): mpl_toolkits.basemap lon parameters
        m (mpl.mpl_toolkits.Basemap): mpl instance for plotting
        ax (axis parent handle)
        cax (colorbar parent handle)
        val (Matplotlib color)
        cmap (string): colormap cmap parameter
        ALPHA (float): alpha value to use for plot

    Outputs -
        (No outputs - modifies objects in place)

getalpha(arr, index, F=0.9)
    ction to normalize transparency of quiver

    Inputs -
        arr (iterable): list of input values
        index (int): index position from which alpha value should be taken from
        F (float): multiplier
        M (float): minimum alpha value

    Outputs -
        v (float): alpha value

showGlobalPlot(coords, ts=None, fsize=[14, 14], cmap='jet',
              m=None, figname='fig', F=2):
    plot global distribution of events within time period specified

    Inputs -
        coords (string): filename with coord list as lat1.lat2.lon1.lon2
        ts (string): time series filename with data in rows, space separated
        fsize (list):
        cmap (string):
        m (mpl.mpl_toolkits.Basemap): mpl instance for plotting
        figname (string): Name of the Plot
        F (int)

    Output -
       num (np.array): data values
       fig (mpl.figure): heatmap of events from fitted data
       ax (axis handler): output axis handler
       cax (colorbar axis handler): output colorbar axis handler


    Utility function to visualize spatio temporal interaction networks

    Inputs -
        unet (string): json filename
        unet (python dict):
        jsonfile (bool): True if unet is string  specifying json filename
        colormap (string): colormap
        res (string): 'c' or 'f'
        drawpoly (bool): if True draws transparent patch showing srcs
        figname  (string): prefix of pdf image file
    Outputs -
        m (Basemap handle)
        fig (figure handle)
        ax (axis handle)
        cax (colorbar handle)

    normalize array for plotting

    Inputs -
        a (ndarray): input array
    Output -
        a (ndarray): output array

    COLORMAP,Horizon,model_nums, newmodel_name='newmodel.json',

    For use after model.json files are produced via XgenESeSS. Will produce a
    network interaction map of all the models.
      model_path(str)- path to the model.json files.
      MAX_DIST(int)- max distance cutoff in render network.
      MIN_DIST(int)- min distance cutoff in render network.
      MAX_GAMMA(float)- max gamma cutoff in render network.
      MIN_GAMMA(float)- min gamma cutoff in render network.
      COLORMAP(str)- colormap in render network.
      Horizon(int)- prediction horizons to test in unit of temporal
      model_nums(int)- number of models to use in prediction.
      newmodel_name(str): Name to save the newmodel as. This new model
          will be loaded in by viz.
      figname(str)-Name of figure drawn)

    COLORMAP,Horizon,model_nums, newmodel_name='newmodel.json',

    This function aims to achieve the same thing as render_network but in
    parallel. Outputs json files and combines them at the end.
      model_path(str)- path to the model.json files.
      MAX_DIST(int)- max distance cutoff in render network.
      MIN_DIST(int)- min distance cutoff in render network.
      MAX_GAMMA(float)- max gamma cutoff in render network.
      MIN_GAMMA(float)- min gamma cutoff in render network.
      COLORMAP(str)- colormap in render network.
      Horizon(int)- prediction horizons to test in unit of temporal
      model_nums(int)- number of models to use in prediction.
      newmodel_name(str): Name to save the newmodel as. This new model
        will be loaded in by viz.
      figname(str)-Name of figure drawn)


    A rendering for a single file. This function is called by render_network_parallel.
    arguments(list)-list of arguments for the rendering.
    arguments[0]:model_path(str)- path to the model.json files.
    arguments[1]:MIN_DIST(int)- min distance cutoff in render network.
    arguments[2]:MAX_DIST(int)- max distance cutoff in render network.
    arguments[3]:MIN_GAMMA(float)- min gamma cutoff in render network.
    arguments[4]:MAX_GAMMA(float)- max gamma cutoff in render network.
    arguments[5]:Horizon(int)- prediction horizons to test in unit of temporal
    arguments[6]:model_nums(int)- number of models to use in prediction.
    arguments[7]:counter(int)- a number to keep track of model progress.
bokeh_pipe library:

visualization library for Network Models produced by uNetworkModels based on bokeh

Process overview:

This code starts from the point when the json data files have been obtained.

To get the neighborhood plot:
  1. run json_to_csv on the batch of json files to get the batch of csv files.

  2. run combine_merc to combine the batch of csv files into one csv file in mercator coordinates.

  3. run neighbor_plot on the combined csv file to get the neighbor hood plot.

To get the streamline plot:
  1. same as step 1 of neighborhood plot (can be skipped if already done)

  2. run streamheat_combine to combine the batch of csv files into one csv file. THIS IS IN A FORMAT DIFFERENT FROM THAT OF THE NEIGHBORHOOD PLOT.

  3. run on the combined file.

To get the heatplot:
  1. same as streamline plot.

  2. same as streamline plot.

  3. run heat_map on the combined file.

We have provided two sample datasets for use. ‘crime_filtered_data.csv’ can be considered the combined file for the neighborhood plot. ‘contourmerc.csv’ can be considered the combined file for the streamline plot and the heatplot.

json_to_csv(FILEPATH, DEST):
    This function takes a group of json data files and transforms
    them into csv files for use. Edit the selection variables as
    you see fit. It is very important that you initialize DEST to a folder,
    as it generates many csv files. WARNING: Run this function in
    python2. The rest of the code should use python3.

  Inputs -
      FILEPATH (string): the filepath to the json files. Example: 'jsons/'
      DEST (string): the place for the csv files to be stored. Example: 'csvs/'

combine_merc(DIR, filename, N = 20):
    This function combines the csv's into a single file. At the same time,
    this function will convert the format of the coordinates from longitude
    and latitude which is necessary to make our neighborhood plot. Our tileset
    accepts mercator coordinates. This generates one combined csv in the
    current directory. USE PYTHON 3.

        DIR (string): The location(filepath) of the csvs to be combined.
            Example 'csvs/'
        filename (string): the desired name for the combined csv file.
            Example: 'combined.csv'
        N (int): the max number of sources selected for in json_to_csv:
            high argument is N.

neighbor_plot(filepath= 'crime_filtered_data.csv'):
    This is the first implementation of our Bokeh plot. The function takes
    the filepath of the data and opens the bokeh plot in a browser. Chrome
    seems to be the best browser for bokeh plots. The datafile must be a csv
    file in the correct format. See the file 'crime_filtered_data.csv' for an
    example. Each row represents a point, all the lines(sources) connected to
    it and the gammas and delays associated with the lines. The current
    implementation results in the bokeh plot, and a linked table of the data.
    IMPORTANT: Points are in MERCATOR Coordinates. This is because the current
    tileset for the map is in mercator coordinates.
    Example file is 'crime_filtered_data.csv'

    Inputs -
      filepath (string): input data file

streamheat_combine(DIR, filename):
    We need to once again combine the csvs, into a format appropriate for the
    streamplots. This file will do that. This function will produce two files.
    File 1 will be in longitude and latitude. File 2 will be in mercator
    coordinates. We will be primiarily working with file 2

    Inputs -
        DIR (string): The filepath to the csvs. Ex: 'csvs/'
        filename (string): The filename for the combined csv file.
          Ex: 'contourmerc.csv'

crime_stream(datafile='contourmerc.csv',density=4, npoints=10,
    output_name='streamplot.html', method = 'cubic'):
    This function takes a csv datafile of crime vectors, reads it into
    a pandas dataframe and plots the streamplot using Delanuay
    interpolation. Function will open the plot in a new browser. Use chrome.
        datafile: name of the csv file. Example file is 'contourmerc.csv'
        density: desired line density of the plot. Ex: 4.
        npoints: The dimensions used for the streamplot. The grid will
            have npoints**2 number of grids. It is not advised to have
            npoints > 200.
            Reccommended: npoints =10.
        ouput_name: name to save plot to.
        method: method for interpolation. 'cubic','linear', or 'nearest'

heat_map(datafile='contourmerc.csv', npoints=300, output_name='heatmap.html',
        method = 'linear'):
    Makes a heatmap from the same datafile that cimre_stream uses.
    datafile: name of the datafile. Example file is 'contourmerc.csv'.
    npoints: dimension for plot. number of squares = npoints**2.
        Recommended: 100-300

    Inputs -
      output_name (string): output file name for the plot.
      method (string): method for interpolation. 'cubic','linear', or 'nearest'

VERSION 1.1.76

Project details

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cynet-1.1.84.tar.gz (40.2 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page