Skip to main content

An AST based navigation system.

Project description

=======
Redhawk
=======

Redhawk is a code navigation system built on the idea of a language agnostic
parse tree. It currently supports C & Python.

Code navigation systems are few and far. They are either too tied to a
language, or are very heuristic in nature --- using regex based parsers.
Redhawk attempts to acheive the best of both worlds. It uses standard, robust,
parsers each of the languages, and converts the resulting AST to a language
agnostic AST, or LAST.

The resulting LAST can be queried by using either Selectors (similar to
JQuery), or an xpath like syntax. A Typical use of Redhawk is as shown below::

redhawkquery/DefineFunctionfile1.pyfile2.cRedhawkiscurrentlyunderheavydevelopment.Thecodecanbefoundongithub.Redhawkcurrentlyrequirespython2.6or2.7.ProjectNewsAnintroductorysetofvideos,havebeenuploadedtoYoutube.AVimpluginreleasedinversion1.1.5,forquery,andreplace(usinganeditablequickfixlist).FromVersion1.1.2onwards,Redhawksupportsparallelqueryingusingtheparallelpython(pp)module.ThisspeedsupRedhawksqueryingonlargecodebases.QueryingforclosuresanywhereinDjango( 2200files)cannowbedonein 20secondsonaceleronnetbook.ProjectObjectives(orwhatscomingup)1.Allowuserstoeffectivelyfindandtherebynavigatecodeinaneditorindependentmanner.2.BetterdocumentationforAPIusage,andalonglistofexamples,withexamplesscriptsusingtheSelectorAPI.3.Allowcrosslanguageanalysisinthefuture,therebybenefittingprojectsinmultiplelanguages.4.ExposetheLASTinasimplemannerviatheRedhawkAPIforothertools.Thesetoolscouldinvolveindentingcode,suggestingcompletions,orstaticanalysis.5.EventuallyalloweditingoftheLAST,andtherebypowerfulrefactoring.DependenciesRuntimeDependencies:pycparserisrequiredtoparseCcodeintoASTs.ThisinturndependsonPythonPLY(pythonplyondebianubuntu).OptionalbuthighlyrecommendedDependencies:ppParallelPythonisrequiredforrunningqueriesinparallel.Thisspeedsupqueriesbymorethan2x.Thisishighlyrecommendedifyouaregoingtoquerylargeprojects.ThewholeofDjangocanbequeriedinlessthan20seconds,byusingparallelpython(passingptothequerycommand).PythonGraphvizisrequiredforgeneratingprettyASTgraphs.Thispackageisanoptionaldependency,buthighlyrecommended.ThispackagegoesbythenamepythonpygraphvizonUbuntu,anddependsongraphviz,anddot.(Pipseemstohaveahardtimeinstallpygraphviz.Eithereasyinstallorinstallingfromyourdistributionspackagemanagershouldwork).Development(Compiletime)Dependencies:PythonYAMLisrequiredforgeneratingtheASTclassesinnode.pyformasimpleconfigurationfile.Thisgoesbythenamepythonyamlondebian/ubuntu.nosetestsisrequiredforrunningthetestsuite.DevelopmentCycle:Usebin/startsimplebashwithredhawkinpythonpath.shRunnosetestsfromredhawk/asroot.Ensuretestspass.Makeawesomechanges!Submitapullrequesttotheprojectongithub.InstallingpipistherecommendedtooltoinstallRedhawk.Itgoesbypythonpipondebian/ubuntuandpiponthePythonPackageIndex.Thecommand:: sudo pip install redhawk

should install redhawk, along with its dependency - pycparser.

It is however recommended that you install the other packages also::

sudoeasyinstallpygraphviz sudo pip install nose 'PyYAML>=3.09' 'nose>=0.11'

or by using your distribution's package manager. On Ubuntu/Debian (Ubuntu
Lucid seems to have new enough packages)::

sudoaptitudeinstallpythonpygraphvizpythonyamlpythonnoseNotes1.Runbuildtables.pyinthepycparserdirectory,topregeneratethelexandyacctables.ThiswillenablequickerparsingofCfiles.Ifpycparserwasinstalledforallusers,thenRootpriviligesmayberequiredtorunbuildtables.pyPermissionsfortheresultinglextab.pyandyacctab.pymustbechangedtoallowalluserstoread(755).Youcanfindmoreaboutthishere.UsingRedhawkRedhawkcanbeusedfromeithertheredhawkexecutable,orviatheredhawkAPI.1.Usingtheredhawkprogram.2.UsingtheRedhawkAPIviaimportredhawkUsingtheredhawkexecutableTheredhawkprogramsupportseightcommands:================================================================CommandPurpose================================================================addAddfilestoanASTindex.initCreateanEMPTYASTindex.listfilesListallthefilesintheASTindex.promptDropintoapythonpromptwithhelpfulfunctionsforexploringtheparsetree.queryQueryforapatterninalistoffiles,orintheindex.removeRemovefilesfromtheASTindex.showShow(visualize)afileeitherastext,orasanimage.wherePrintthelocationofthecurrentredhawkindex(ifthereisone).================================================================Thesimplestwaytorunredhawkistosimplyuseaquerycommandonafile(ordirectory).Thequerycommandasdescribedabovetakesanxpathlikequery,andalistoffiles(ordirectories),andsearchesformatches.Inthecasethatthesetoffilesislargeandistoberepeatedlyqueried,aredhawkLanguageAgnosticTree(LAST)databasecanbecreatedusingtheredhawkinitcommand.Filesintheprojectcanbeaddedtothedatabaseusingtheredhawkaddcommand.TheshowcommandhelpsvisualisetheinternalLASTstructureused.Thecommand:: redhawk show file.c

will show the LAST of `file.c` in a lisp/scheme like (sexp) syntax. A more
descriptive helpful visualisation can be obtained using the `-i` (or `-e`)
flags, which show graphs (generated using `graphviz` using the
`python-graphviz` module). This *requires* the pygraphviz module, an optional
though recommended, dependency. The command::

redhawkshowfile.cishowsagraphusingthedefaultimagepythonlibraries.ThepromptcommanddropsyouintoapromptforexploringandqueryingtheLAST.Thisenablestheuseofselectors,averypowerfulmethodforfindingwhatyouwant.Formoreinformationonselectors,see:: pydoc redhawk.common.selector

for detailed documentation.

Introduction to the Query Language
----------------------------------

The `query` command supports an XPATH-like language for querying. We describe
examples below. In querying for a particular construct, the name of that Node
in the LAST has to be known. (Thorough documentation about this is coming up.
For now, one can refer to the `node`_ and `types`_ yaml configuration files on
github.) [1]_

For the examples below, we shall use the `counter.py`_ file. It is to be noted
that the same queries will work with other languages also (only C is supported
for now).::

1 def CounterClosure(init=0):
2 value = [init]
3 def Inc():
4 value[0] += 1
5 return value[0]
6 return Inc
7
8 class CounterClass:
9 def __init__(self, init=0):
10 self.value = init
11
12 def Bump(self):
13 self.value += 1
14 return self.value
15
16 def CounterIter(init = 0):
17 while True:
18 init += 1
19 yield init
20
21 if __name__ == '__main__':
22 c1 = CounterClosure()
23 c2 = CounterClass()
24 c3 = CounterIter()
25 assert(c1() == c2.Bump() == c3.next())
26 assert(c1() == c2.Bump() == c3.next())
27 assert(c1() == c2.Bump() == c3.next())
28


Try `redhawk show` on the above file, to get a feel of its structure. You can
view the graphviz generated graph at `imgur`_.

*Example 1*:
Let us find all functions at the module level in `counter.py`::

redhawkqueryDefineFunctioncounter.pyThisgivesus::counter.py:16:defCounterIter(init=0):counter.py:1:defCounterClosure(init=0):NOTE:1.Theresultsarenotnecessarilyinasortedorder,withrespecttolinenumber.ThisdoesnothampertheuseofRedhawkforsearchingandnavigation.(Theresultswillalwaysbeguaranteedtobesortedwithrespecttothefiles).Ontheplusside,thismakesRedhawkalittlebitfaster.Iforderisrequired,asimpleinvocationoftheunixsortprogramshouldfixthis.2.TheabovequerywouldworkonaCprogramaswell.Runningthesamequeryonstats.cgivesus::stats.c:17:floatVariance(floatp,intlen)stats.c:5:floatMean(floatp,intlen)stats.c:34:intmain()Example2:Letusfindallfunctionsonelevelbelowthemodulelevelincounter.py:: redhawk query '*/DefineFunction' counter.py

This gives us::

counter.py:9:def __init__(self, init=0):
counter.py:3:def Inc():
counter.py:12:def Bump(self):


*Example 3*:
Let us find all functions *anywhere* in the program.::

Missing open brace for subscript redhawk query '**/DefineFunction/**/DefineFunction' counter.py

This gives us::

counter.py:3:def Inc():

*Example 5*:
Let us find all functions whose name starts with 'Counter'. Looking at the
`node` yaml configuration tells us that `DefineFunction` has an argument called
name. Now we simply need to test whether the first 7 letters of the name are
"Counter"::

redhawkquery/DefineFunction@n.name[:7]=="Counter"counter.pyThisgivesus:counter.py:16:defCounterIter(init=0):counter.py:1:defCounterClosure(init=0):The@..representsapythonlambdafunction,withthedefaultvariablen.Thus,itisanotherwayofprovidingarbitraryfunctionstomatchwith.[2]Toremindthereaderthatallthesequeriesarelangaugeagnostic,runningtheabovecommand,butinsteadsearchforallfunctionsthathavethelettereinthethem,inthestats.cfile.:: redhawk query '**/DefineFunction@{n.name.find("e") != -1}' stats.c

gives us::

stats.c:17:float Variance(float *p, int len)
stats.c:5:float Mean(float *p, int len)

*Example 7*:
Find all assignments where init is involved. Looking again at the `node`
configuration file, we realise that we are looking for `Assignment` Nodes, which
have a `ReferVariable` descendent, whose name is 'init'::

redhawkquery/Assignment//ReferVariable@[name="init"]counter.pyThisgivesus::counter.py:2:value=[init]counter.py:18:init+=1counter.py:10:self.value=initNotethe@[..]syntaxsimilartoXPATH,forreferringtoanattribute.Example8:Whatifwewantedassignmentswereinitwasbeingset,andnotreferredto?WewoulduseacodeblocktolookatthelvalueoftheAssignment.:: redhawk query '**/Assignment@{n.lvalue.name == "init"}' counter.py

This gives us::

counter.py:18:init += 1

*Example 9*:
Let us find all Function calls that start with 'Counter'. Looking again at the
`node`_ yaml configuration, we see that we want to find 'CallFunction's, where
the function object has a name starting with "Counter". [3]_ ::

redhawkquery/CallFunction@n.function.name[:7]=="Counter"counter.pyThisgivesus::counter.py:24:c3=CounterIter()counter.py:22:c1=CounterClosure()counter.py:23:c2=CounterClass()Example10LetusfindallFunctiondefinitionswhosefirstargumentisself[4]:: redhawk query '**/DefineFunction/FunctionArguments/@[name="self"][0]' counter.py

This gives us::

counter.py:12: def Bump(self):
counter.py:9: def __init__(self, init=0):

The last `[0]` is square brackets, indicates the position of that node with
respect to its parent.

*Example 11*
Let us find all Function definitions whose last argument is `self`. The
following query is *WRONG*::

redhawkquery/DefineFunction/FunctionArguments/@[name="self"][1]counter.pyTheabovequerygivesusnooutput.Why?Lookingatthenodeconfigurationfile,weseethat,FunctionArgumentshasthreechildrenarguments,vararguments,kwdarguments,thelattertwoofwhichareNoneeverywhereinthefileasnovariableorkeywordargumentsareused.Thus,thechildrenofFunctionArgumentseverywhereinthecounter.pyfiletakestheform[[..],None,None].Whatwereallywant,isthelastelementofthefirstelement,theargumentslist.Thiscanbeexpressedasfollows[4]:: redhawk query '**/DefineFunction/FunctionArguments/@[name="self"][0, -1]' counter.py

This gives us::

counter.py:12: def Bump(self):

In hindsight, the query in the previous example could have also been expressed
as::

redhawkquery/DefineFunction/FunctionArguments/@[name="self"][0,0]counter.pyNote:Forconveniencessake,even[0,1,0],or[0,1,0,0,..,0]isdefinedtoreturnthesameresult.ReadthePositionSyntaxsectioninthedocumentationofredhawk.common.xpathformoreinformation.Anabstractgrammarofthequerylanguagecanbefoundvia:: pydoc redhawk.common.xpath

Much more is possible, using the Selector API.

Using the API
-------------

The `redhawk` package can also be used as an API by importing
`redhawk.common.selector` and related packages. Some of the useful packages
are already imported for the user in `redhawk prompt` and are a good place to
start things at.

*Example 1*:
Suppose in the above file we wanted to find all generators, i.e, function
definitions, which had a yield as a descendent. We shall see how easy, and
logical this query becomes using selectors.

We first go into a redhawk prompt::

$ redhawk prompt counter.py


We are greeted with a help banner::

Built in Variables:
trees - contains the parse trees of the files passed in the command line

Built in Functions:
ConvertFileToAst - Converts a file into a language agnostic AST.
ConvertCodeToAst - Converts a code snippet into a language agnostic AST.
Help - Displays this prompt.
ShowASTAsImage - Shows the AST as a graph using dot.

Built in Modules:
S - redhawk.common.selector
X - redhawk.common.xpath
F - redhawk.common.format_position

To view this again, use the Help() function.


In the prompt, we define our selectors. (See `pydoc redhawk.common.selector`
for what selectors are, and how they can be composed)::

In [1]: function_def = S.S(node_type='DefineFunction')
In [2]: yield_stmt = S.S(node_type='Yield')
In [3]: reqd_selector = function_def.HasDescendant(yield_stmt)


We then apply the selector on the file. The asts of the files passed are in
the `trees argument`. Since this file was the first, it is in `trees[0]`::

In [4]: results = list(reqd_selector(trees))
In [5]: results[0]

gives us::

Out[5]: DefineFunction


This is indeed the function we wanted. Just to be sure, we use the
`F.PrintContextInFile` function to print the context of the tree.::

In [6]: F.PrintContextInFile(results[0], context=6)
counter.py:10: self.value = init
counter.py:11:
counter.py:12: def Bump(self):
counter.py:13: self.value += 1
counter.py:14: return self.value
counter.py:15:
counter.py:16: > def CounterIter(init = 0):
counter.py:17: while True:
counter.py:18: init += 1
counter.py:19: yield init
counter.py:20:
counter.py:21: if __name__ == '__main__':
counter.py:22: c1 = CounterClosure()


It is easy to see from this example that selectors are highly composable, and
thus are very powerful. It is hoped that using selectors becomes a natural way
to write powerful custom scripts, for querying code.

License
-------
Redhawk is distributed under the terms of the 2-clause BSD license. You are
free to use it for commercial or non-commercial projects with little or no
restriction. For a complete text of the license see the LICENSE.txt file in
the source distribution.

Change List
------------
*v1.2.3*

* Previous release introduced a regression in C where a top level node was
returning an empty AST. This version contains this small but important fix.

*v1.2.2*

* Support for running Redhawk via Python 3. Great thanks to
bkabrda@ and ncoghlan@ for their changes!

* Compatibility for new PyCParser changes.

*v1.2.1*

* The Lua source code written in ANSI-compliant C, can now be Redhawk-ed!

*v1.2.0*

* Added new position functionality to xpath.py! (See Examples 10, and 11 aboev for
example usage).
* Added to the default imports in prompt.py: redhawk.common.nodes, redhawk.common.types, redhawk.common.xpath
* Added a --show-parsed-query option to redhawk query.
* Made only critical messages appear in the default verbose level.

*v1.1.6*

* Major internal refactoring involving get_ast.py
* Prompt command accepts directories, and can be told not to use IPython.
* A new selector function called Apply to make prompt usage easier.

* Bug fixes wrt IPython shell and error handling.

*v1.1.5*

* `Vim plugin`_ released.

* Patch to FormatPosition to not strip lines when context = 0.

*v1.1.4*

* Bugs fixed in xpath.py and pickling of NodeMatchQuery class for Parallel
Python.

*v1.1.3*

Bugs in the README's RST syntax fixed.

*v1.1.2*

* Redhawk can now use parallel python (on the same machine), to perform
queries on codebases. This speeds up Redhawk (almost) proportionally to the
number of cores you have on your computer. Redhawk can now query for
closures in Django in just ~20 seconds.

* Friendlier usage strings and help messages.

*v1.1.1*

* Python2.7 compatibility: ast.parse (Thanks to Nafai77)

* Profiled, performance improvements by 15% by shifting to deque, and caching
flattened children.

* Provided a bin/start_simple_bash_with_redhawk_in_pythonpath.sh to enter a
temporary shell with redhawk in PYTHONPATH (for devs).

*v1.1.0*

* Fast enough to work on Django - Querying DefineClass anywhere in the
codebase (~2300 python files), takes just 45 seconds on a celeron netbook.
Thats 19ms per file!

* Uses the shelve module instead of the pickle module, to decrease read and
write times for the redhawk database.

* Redhawk supports three new commands - `listfiles`, `remove`, `where`

* The `query`, and `show`, commands take an extra argument `-s`, to decide if
new trees should be added to the database.

* Skip a file if there is a parser error.

.. [1] `ast_gen.py`_ generates `node.py`_ and `types.py`_ using these YAML configuration files.

.. [2] In fact the portion inside the `@{..}` is just appended to a 'lambda n:' and `eval`-ed to get a function.

.. [3] Note that 'CallFunction's do not directly have a name. This is because the function object, unlike that of a function definition, can be a value. It is possible to do (f.g[x])(y), and such.

.. [4] These queries actually finds us the argument, and not the function itself. But this shouldn't matter when we have the definition on the same line.



.. _Vim plugin: http://www.vim.org/scripts/script.php?script_id=3586
.. _imgur: http://imgur.com/CBHCX
.. _counter.py: https://github.com/spranesh/Redhawk/tree/master/redhawk/test/files/examples/counter.py
.. _stats.c: https://github.com/spranesh/Redhawk/tree/master/redhawk/test/files/examples/stats.c
.. _ast_gen.py: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/_ast_gen.py
.. _node.py: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/node.py
.. _types.py: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/types.py
.. _node: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/_node_cfg.yaml
.. _types: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/_types_cfg.yaml
.. _here: http://pycparser.googlecode.com/hg/README.html#installation-process
.. _pip: http://pypi.python.org/pypi/pip
.. _github: http://www.github.com/spranesh/Redhawk
.. _Python Graphviz: http://networkx.lanl.gov/pygraphviz/
.. _pycparser: http://code.google.com/p/pycparser/
.. _pp: http://pypi.python.org/pypi/pp
.. _Python YAML: http://www.pyyaml.org
.. _nosetests: http://somethingaboutorange.com/mrl/projects/nose/1.0.0/
.. _Youtube: http://www.youtube.com/watch?v=azaXpahrxww

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page