skip to navigation
skip to content

picalo 4.94

A data analysis/structure library with tables, type-aware columns, records, and cells.

Package Documentation

Picalo

Data Analysis Library

http://www.picalo.org/

Picalo is a Python library to help anyone who works with data files,
especially those who work with data in relational/spreadsheet format.
It is primarily created for investigators and auditors search through
data sets for anomalies, trends, ond other information, but it is
generally useful for any type of data or text files.

Picalo is different from NumPy/Numarray in that it is meant for
heterogeneous data rather than homogenous data. In NumPy, you
have an array (table) of the same type--all ints, for example.
In Picalo, you have a table made up of different column types,
very similar to a database.

One of Picalo's primary purposes is making relational
databases easier to work with. Once you have a Picalo table,
you can add, move, or delete columns; work with records (horizontal
slices of the data); select and group records in various ways;
and run analyses on tables. Picalo includes adapters for popular
databases, and it provides a Query object that make queries seem
just like regular Tables (except they are live from the database).

If you work with relational databases, delimited (CSV/TSV) files,
EBCDIC files, MS Excel files, log files, text files, or other
heterogeneous datasets, Picalo might make your life easier.

Picalo is programmed to be as Pythonic as possible. It's core objects--
tables, columns, records--they act like lists. A column is a list of cells.
A record is a list of cells. A table is a list of records. Tables can be
sorted via the sort function, just like the Sorting HowTo shows. The return
values of almost all functions are new tables, so functions can be chained
together like pipes in Unix.

Picalo includes an optional Project object that stores tables in
Zope Object DB files. When Projects are used, Picalo automatically
swaps records in and out of memory as needed to ensure efficient use of
resources. Projects allow Picalo to work with essentially an unlimited
amount of data.

The project was started in 2003 by Conan C. Albrecht, a professor
in Information Systems at Brigham Young University. Conan remains
the primary developer of Picalo.

Here's an example of Picalo code loading a CSV and working with it:

# import the picalo libraries and turn off visual progress bars
import picalo, StringIO
picalo.use_progress_indicators(False)

# load the csv, could have been from a filename
csv = '''Name,Age
Homer,35
Marge,34
Lisa,8
Bart,10
'''
table = picalo.load_csv(StringIO.StringIO(csv))

table.set_type('Age', int) # set the type of the Age column (csv defaults types to str)
table.view() # prints a formatted table
print table[0].Age # prints 35
print table[0]['Age'] # also prints 35
print table[0][1] # again prints 35
print table[-1].Name # prints Bart
table2 = table[0:2] # get a slice of records
for name in table.column('Name'):
print name # prints the names, one by one

# insert a column, which defaults cells to None
table.insert_column(1, 'DoubleAge', int)
# change cells using an expression
table.replace_column_values('DoubleAge', 'record.Age * 2')

# sort by Name, then Age
picalo.Simple.sort(table, True, 'Name', 'Age')
# sort in more Pythonic way (only by Name this time)
table.sort(key=lambda r: r.Name)

# print the std. dev. of the age column
print picalo.stdev(table.column('Age'))

# select records by regex, those containing 'a'
table2 = picalo.Simple.select_by_regex(table, Name='^.*a.*$')

# filter the existing table, then clear the filter
table.filter('record.Age > 20')
print len(table) # prints 2
table.clear_filter()
print len(table) # prints 4

# reorder the columns
table.reorder_columns(['Age', 'Name', 'DoubleAge'])

# add a live, calculated column
table.append_calculated('ReverseName', unicode, 'record.Name[::-1].capitalize()')
print table[0][3] # prints 'Trab'
table[0].Name = 'Maggie'
print table[0][3] # prints 'Eiggam'

# split into multiple tables by value
table.append_column('FavNum', int, [ 5, 5, 2, 2 ])
tablelist = picalo.Grouping.stratify_by_value(table, 'FavNum')
tablelist[0].view() # view first table in list (has two records)
tablelist[1].view() # view second table in list (has two records)
# any operations to a list of tables is made to all tabels in list
# this sets the type of the FavNum column in *both* tables
tablelist.set_type('FavNum', float)


Picalo is released in two formats:
1) As a pure-Python library that is used by issuing one of the
following:
from picalo import *
# or #
import picalo
Python programmers will be primarily interested in the library
version.

This format is installed in the typical Python fashion, either
as an .egg via setuptools, or via "python setup.py install" from
the source.

2) As a standalone, wx-Python-based GUI environment that allow
end users to access the Picalo libraries.

This version is packaged as a Windows setup.exe file, Mac
application bundle, and Linux rpm and deb files. The user
may not realize Python is even being used when running the
full application environment.

Please see the following:
- HOW TO RUN at the bottom of this file for running the source
distribution or compiling a new bundle.

- CHANGELOG.TXT has good information about what's changed in recent
versions.

- LICENSE.TXT for the GNU Public License that Picalo is released under.
For those who don't want to read the license, here's the higlights:

1. You may use Picalo free of charge. I hope it is helpful to you.
Please improve the code and share back with the community.

2. Picalo has NO warrantee. I don't guarantee it will do anything
correctly or even incorrectly. It may do unsightly things to your
machine. It may munch your data or even corrupt your hard drive.
Picalo might fry your computer or ruin your marriage. You take
all risks upon yourself.

3. You must release any additions to Picalo under the GPL.

4. Picalo source code cannot be included or used in any products that
are licensed with something other than the GPL.

5. More information on these issues can be found in LICENSES.TXT


- doc/PicaloCookbook.pdf has some of the best information right now.


- doc/Manual.pdf for installation instructions (see the Installation section)


- doc/Manual.pdf for detailed usage instructions, tutorial, etc.

Enjoy! Please report any bugs to me. I also welcome additions to the toolkit.

Dr. Conan C. Albrecht
conan@warp.byu.edu


=======================================
HOW TO RUN/COMPILE THE SOURCE
=======================================

### TO INSTALL THE PICALO LIBRARY ONLY:

If you want to install the library version for use in your Python environment and
you have setuptools installed, you can simply use easy_install:

easy_install picalo

If you don't have setuptools or want to install manually, expand the
picalo-x.xx.tar.gz file and run the traditional Python setup.py file:

(first install ZODB from the Zope libraries)
(this assumes you downloaded picalo-5.12.tar.gz)
tar xvfz picalo-5.12.tar.gz
cd picalo-5.12
python setup.py install



### TO BUILD THE FULL GUI APPLICATION:

Note that this section is primarily for developers. If you simply want to
install and run Picalo, visit http://www.picalo.org/ and download a pre-built
bundle for Windows, Mac, or Linux.

Picalo has several dependencies that you'll need to ensure your Python
installation has. These include the following:

NOTE: Don't install eggs on Windows because py2exe chokes on them. When doing a
manual setup, use "python setup.py install_lib" to disable the egg building.
This only applies to people wanting to compile the Windows exe files.
NOTE: To build on Mac, you need to be using the Framework version of Python. This
is the version on python.org, not the one that comes with an Apple. Be sure
to explicitly install Python and ensure it is being used.

REQUIRED:
- Python 2.5+ (http://www.python.org) - It probably runs on version 2.4 and earlier,
but all testing is now being done on Python 2.5+.
We have not made the jump to Python 3 because some libraries aren't there yet
(especially wxPython).
- wxPython (http://www.wxpython.org) - We're on version 2.8.x.x right now.
We try to keep current with wxPython, so try the most recent version of wxPython.
If you hit GUI snags, email Conan and ask what version we're currently on. wxPython
often changes the API from one version to another, so you'll know right away if
it says some wx method doesn't exist. Note that for the command-line version of
Picalo, wxPython is not required -- the code can run entirely without any dependencies
here.
- ZODB - Zope Object Database
- zc.blist - A tree-based list optimized for storage in ZODB.
- pyODBC (http://pyodbc.sourceforge.net/) - This allows you to access ODBC databases.
Picalo should be able to run without it, although the database GUI dialogs will fail.
- pysycopg2 - This allows you to access PostgreSQL directly.
The Windows build is at Stickpeople.com.
Picalo should be able to run without it, although the database GUI dialogs will fail.
- pygresql - An alternative driver to access PostgreSQL directly.
Picalo should be able to run without it, although the database GUI dialogs will fail.
- MySQLdb - This allows you to access MySQL directly.
Picalo should be able to run without it, although the database GUI dialogs will fail.
- cx_Oracle - Allows you to connect to Oracle 10.
Picalo should be able to run without it, although the database GUI dialogs will fail.
- MX Base distribution - I'm not sure if this is still required or not. Picalo itself
doesn't use it, but some of the dependencies above might.
- chardet.universaldetector (http://chardet.feedparser.org/)
- Windows Only: py2exe - if you want to compile Picalo on Windows
- Windows Only: InnoSetup - if you want to compile Picalo on Windows
- Mac OS X Only: py2app - if you want to compile Picalo on Mac OS X (installs easily
with easy_install).

Once you are sure the above are running, change to the trunk/ directory. Run the following:

python Picalo.pyw

Alternatively, to run the command line version, execute the following from within the
Python interpreter:

>>> from picalo import *  
File Type Py Version Uploaded on Size
picalo-4.94-py2.6.egg (md5) Python Egg 2.6 2011-03-16 596KB
picalo-4.94.tar.gz (md5) Source 2011-03-16 291KB
  • Author: Conan C. Albrecht
  • Documentation: picalo package documentation
  • Home Page: http://www.picalo.org/
  • Keywords: data log analysis spreadsheet database
  • License:
    Picalo itself is released under the GNU General Public License (GPL).
    It is NOT public domain software.  It has some limitations on your
    use of the product.  In short, here are the rules:
    
     * I give this software to the analysis and fraud detection community
       in good faith that those using it will contribute modules and additions
       back to me and other users.  This is enforced by the license.  But
       beyond the legal issues, please support the community by developing
       your routines in general ways that all can use.
       
     * You can use Picalo free of charge (commercial or otherwise)
     
     * You can modify the source code as you wish for personal use.
     
     * If you release, sell, or otherwise distribute Picalo to others,
       you MUST release the product and any changes you make to
       the software under the GPL.  This means you must provide the source
       code to whoever asks for it, including your changes.  You cannot
       change the license, even on derivative works.
       (This prevents companies or individuals from benefiting from this
       product without allowing others to use their modifications.)
    
    Although I release the software for free under the GPL, I do make money
    using the software in consulting.  I am also often willing to make 
    additions that companies or individuals need for consulting fees.
    Finally, I actively consult in training people on how to use Picalo.
    Contact me if you wish to discuss these options.
    
    Picalo uses the following extra packages.  Picalo's license
    is compatible with all of the licenses of these products. Thanks
    to the host of individuals who worked many long hours to release these
    products.  Picalo is my contribution to the open source community.
    
     * Python (Core language, http://www.python.org/)
     * wxPython (Widget set, http://www.wxpython.org/)
     * pstat.py (Statistics library, Gary Strangman)
     * Nuvola icon set (David Vignoni, ICON KING - www.icon-king.com)
     * pyodbc (ODBC driver)
     * psycopg2 (PostgreSQL driver)
     * MySQLdb (MySQL driver)
     
    
    =======================================================================
    
    
    		    GNU GENERAL PUBLIC LICENSE
    		       Version 2, June 1991
    
     Copyright (C) 1989, 1991 Free Software Foundation, Inc.
                           59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
     Everyone is permitted to copy and distribute verbatim copies
     of this license document, but changing it is not allowed.
    
    			    Preamble
    
      The licenses for most software are designed to take away your
    freedom to share and change it.  By contrast, the GNU General Public
    License is intended to guarantee your freedom to share and change free
    software--to make sure the software is free for all its users.  This
    General Public License applies to most of the Free Software
    Foundation's software and to any other program whose authors commit to
    using it.  (Some other Free Software Foundation software is covered by
    the GNU Library General Public License instead.)  You can apply it to
    your programs, too.
    
      When we speak of free software, we are referring to freedom, not
    price.  Our General Public Licenses are designed to make sure that you
    have the freedom to distribute copies of free software (and charge for
    this service if you wish), that you receive source code or can get it
    if you want it, that you can change the software or use pieces of it
    in new free programs; and that you know you can do these things.
    
      To protect your rights, we need to make restrictions that forbid
    anyone to deny you these rights or to ask you to surrender the rights.
    These restrictions translate to certain responsibilities for you if you
    distribute copies of the software, or if you modify it.
    
      For example, if you distribute copies of such a program, whether
    gratis or for a fee, you must give the recipients all the rights that
    you have.  You must make sure that they, too, receive or can get the
    source code.  And you must show them these terms so they know their
    rights.
    
      We protect your rights with two steps: (1) copyright the software, and
    (2) offer you this license which gives you legal permission to copy,
    distribute and/or modify the software.
    
      Also, for each author's protection and ours, we want to make certain
    that everyone understands that there is no warranty for this free
    software.  If the software is modified by someone else and passed on, we
    want its recipients to know that what they have is not the original, so
    that any problems introduced by others will not reflect on the original
    authors' reputations.
    
      Finally, any free program is threatened constantly by software
    patents.  We wish to avoid the danger that redistributors of a free
    program will individually obtain patent licenses, in effect making the
    program proprietary.  To prevent this, we have made it clear that any
    patent must be licensed for everyone's free use or not licensed at all.
    
      The precise terms and conditions for copying, distribution and
    modification follow.
    
    		    GNU GENERAL PUBLIC LICENSE
       TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
    
      0. This License applies to any program or other work which contains
    a notice placed by the copyright holder saying it may be distributed
    under the terms of this General Public License.  The "Program", below,
    refers to any such program or work, and a "work based on the Program"
    means either the Program or any derivative work under copyright law:
    that is to say, a work containing the Program or a portion of it,
    either verbatim or with modifications and/or translated into another
    language.  (Hereinafter, translation is included without limitation in
    the term "modification".)  Each licensee is addressed as "you".
    
    Activities other than copying, distribution and modification are not
    covered by this License; they are outside its scope.  The act of
    running the Program is not restricted, and the output from the Program
    is covered only if its contents constitute a work based on the
    Program (independent of having been made by running the Program).
    Whether that is true depends on what the Program does.
    
      1. You may copy and distribute verbatim copies of the Program's
    source code as you receive it, in any medium, provided that you
    conspicuously and appropriately publish on each copy an appropriate
    copyright notice and disclaimer of warranty; keep intact all the
    notices that refer to this License and to the absence of any warranty;
    and give any other recipients of the Program a copy of this License
    along with the Program.
    
    You may charge a fee for the physical act of transferring a copy, and
    you may at your option offer warranty protection in exchange for a fee.
    
      2. You may modify your copy or copies of the Program or any portion
    of it, thus forming a work based on the Program, and copy and
    distribute such modifications or work under the terms of Section 1
    above, provided that you also meet all of these conditions:
    
        a) You must cause the modified files to carry prominent notices
        stating that you changed the files and the date of any change.
    
        b) You must cause any work that you distribute or publish, that in
        whole or in part contains or is derived from the Program or any
        part thereof, to be licensed as a whole at no charge to all third
        parties under the terms of this License.
    
        c) If the modified program normally reads commands interactively
        when run, you must cause it, when started running for such
        interactive use in the most ordinary way, to print or display an
        announcement including an appropriate copyright notice and a
        notice that there is no warranty (or else, saying that you provide
        a warranty) and that users may redistribute the program under
        these conditions, and telling the user how to view a copy of this
        License.  (Exception: if the Program itself is interactive but
        does not normally print such an announcement, your work based on
        the Program is not required to print an announcement.)
    
    These requirements apply to the modified work as a whole.  If
    identifiable sections of that work are not derived from the Program,
    and can be reasonably considered independent and separate works in
    themselves, then this License, and its terms, do not apply to those
    sections when you distribute them as separate works.  But when you
    distribute the same sections as part of a whole which is a work based
    on the Program, the distribution of the whole must be on the terms of
    this License, whose permissions for other licensees extend to the
    entire whole, and thus to each and every part regardless of who wrote it.
    
    Thus, it is not the intent of this section to claim rights or contest
    your rights to work written entirely by you; rather, the intent is to
    exercise the right to control the distribution of derivative or
    collective works based on the Program.
    
    In addition, mere aggregation of another work not based on the Program
    with the Program (or with a work based on the Program) on a volume of
    a storage or distribution medium does not bring the other work under
    the scope of this License.
    
      3. You may copy and distribute the Program (or a work based on it,
    under Section 2) in object code or executable form under the terms of
    Sections 1 and 2 above provided that you also do one of the following:
    
        a) Accompany it with the complete corresponding machine-readable
        source code, which must be distributed under the terms of Sections
        1 and 2 above on a medium customarily used for software interchange; or,
    
        b) Accompany it with a written offer, valid for at least three
        years, to give any third party, for a charge no more than your
        cost of physically performing source distribution, a complete
        machine-readable copy of the corresponding source code, to be
        distributed under the terms of Sections 1 and 2 above on a medium
        customarily used for software interchange; or,
    
        c) Accompany it with the information you received as to the offer
        to distribute corresponding source code.  (This alternative is
        allowed only for noncommercial distribution and only if you
        received the program in object code or executable form with such
        an offer, in accord with Subsection b above.)
    
    The source code for a work means the preferred form of the work for
    making modifications to it.  For an executable work, complete source
    code means all the source code for all modules it contains, plus any
    associated interface definition files, plus the scripts used to
    control compilation and installation of the executable.  However, as a
    special exception, the source code distributed need not include
    anything that is normally distributed (in either source or binary
    form) with the major components (compiler, kernel, and so on) of the
    operating system on which the executable runs, unless that component
    itself accompanies the executable.
    
    If distribution of executable or object code is made by offering
    access to copy from a designated place, then offering equivalent
    access to copy the source code from the same place counts as
    distribution of the source code, even though third parties are not
    compelled to copy the source along with the object code.
    
      4. You may not copy, modify, sublicense, or distribute the Program
    except as expressly provided under this License.  Any attempt
    otherwise to copy, modify, sublicense or distribute the Program is
    void, and will automatically terminate your rights under this License.
    However, parties who have received copies, or rights, from you under
    this License will not have their licenses terminated so long as such
    parties remain in full compliance.
    
      5. You are not required to accept this License, since you have not
    signed it.  However, nothing else grants you permission to modify or
    distribute the Program or its derivative works.  These actions are
    prohibited by law if you do not accept this License.  Therefore, by
    modifying or distributing the Program (or any work based on the
    Program), you indicate your acceptance of this License to do so, and
    all its terms and conditions for copying, distributing or modifying
    the Program or works based on it.
    
      6. Each time you redistribute the Program (or any work based on the
    Program), the recipient automatically receives a license from the
    original licensor to copy, distribute or modify the Program subject to
    these terms and conditions.  You may not impose any further
    restrictions on the recipients' exercise of the rights granted herein.
    You are not responsible for enforcing compliance by third parties to
    this License.
    
      7. If, as a consequence of a court judgment or allegation of patent
    infringement or for any other reason (not limited to patent issues),
    conditions are imposed on you (whether by court order, agreement or
    otherwise) that contradict the conditions of this License, they do not
    excuse you from the conditions of this License.  If you cannot
    distribute so as to satisfy simultaneously your obligations under this
    License and any other pertinent obligations, then as a consequence you
    may not distribute the Program at all.  For example, if a patent
    license would not permit royalty-free redistribution of the Program by
    all those who receive copies directly or indirectly through you, then
    the only way you could satisfy both it and this License would be to
    refrain entirely from distribution of the Program.
    
    If any portion of this section is held invalid or unenforceable under
    any particular circumstance, the balance of the section is intended to
    apply and the section as a whole is intended to apply in other
    circumstances.
    
    It is not the purpose of this section to induce you to infringe any
    patents or other property right claims or to contest validity of any
    such claims; this section has the sole purpose of protecting the
    integrity of the free software distribution system, which is
    implemented by public license practices.  Many people have made
    generous contributions to the wide range of software distributed
    through that system in reliance on consistent application of that
    system; it is up to the author/donor to decide if he or she is willing
    to distribute software through any other system and a licensee cannot
    impose that choice.
    
    This section is intended to make thoroughly clear what is believed to
    be a consequence of the rest of this License.
    
      8. If the distribution and/or use of the Program is restricted in
    certain countries either by patents or by copyrighted interfaces, the
    original copyright holder who places the Program under this License
    may add an explicit geographical distribution limitation excluding
    those countries, so that distribution is permitted only in or among
    countries not thus excluded.  In such case, this License incorporates
    the limitation as if written in the body of this License.
    
      9. The Free Software Foundation may publish revised and/or new versions
    of the General Public License from time to time.  Such new versions will
    be similar in spirit to the present version, but may differ in detail to
    address new problems or concerns.
    
    Each version is given a distinguishing version number.  If the Program
    specifies a version number of this License which applies to it and "any
    later version", you have the option of following the terms and conditions
    either of that version or of any later version published by the Free
    Software Foundation.  If the Program does not specify a version number of
    this License, you may choose any version ever published by the Free Software
    Foundation.
    
      10. If you wish to incorporate parts of the Program into other free
    programs whose distribution conditions are different, write to the author
    to ask for permission.  For software which is copyrighted by the Free
    Software Foundation, write to the Free Software Foundation; we sometimes
    make exceptions for this.  Our decision will be guided by the two goals
    of preserving the free status of all derivatives of our free software and
    of promoting the sharing and reuse of software generally.
    
    			    NO WARRANTY
    
      11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
    FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
    OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
    PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
    OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
    MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
    TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
    PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
    REPAIR OR CORRECTION.
    
      12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
    WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
    REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
    INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
    OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
    TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
    YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
    PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
    POSSIBILITY OF SUCH DAMAGES.
    
    		     END OF TERMS AND CONDITIONS
  • Categories
  • Package Index Owner: doconix
  • DOAP record: picalo-4.94.xml