Skip to main content

Easy to use parser for simple XML

Project description

Help module to parse a simple XML buffer and store it as a read-only (mostly)
dictionary-type object (MyXml). This dictionary can hold other dictionaries,
nodes-lists, or leaf nodes. Access to the nodes is by using attributes.

>>> xml = parse("<Foo><Bar>Val</Bar></Foo>")
>>> xml.Foo.Bar == "Val"
True
>>> xml.Foo.Bar
<Bar>Val</Bar>

I don't like to use the built in Python DOM parsers for simple XML data, but
this module is good only for simple XML! No name-spaces, CDATA and other fancy
features are supported.

There are three factory functions, "parse", "parse_file" and "parse_object".

- parse takes an XML string and builds MyXml object from it.

- parse_file takes a file name reads it and do the same.

Both functions take an optional list of tags names from the beginning of the
XML data, to ignore.

- parse_object takes a complex python object (of dictionaries, sequences and
scalars) and creates MyXml object from it.

It is possible, but not convenient, to construct an XML trees using this module.

Usage Examples:

>>> xml = parse('''
... <?xml bla bla bla>
... <!-- Comment -->
... <Main>
... <Text>One Two &amp; Three</Text>
... <List>
... <!-- This is a list of items -->
... <Item aaa="bbb" ></Item>
... <Item ccc = "ab&#43;c" />
... <Item>Bla Bla Bla</Item>
... </List>
... <BoolNum num="3.5" bool="Yes">No</BoolNum>
... <Double><Double>Value</Double></Double>
... </Main>
... ''')

- An XML node is an attribute of the MyXml object

>>> xml.Main.Text
<Text>One Two &amp; Three</Text>

- And also

>>> xml.Main.Text == "One Two & Three"
True

>>> xml.Main.Text.value == "One Two &amp; Three"
True

There is also a way to access a node with "nd_" prefix (so we can access
python reserved words), this will also return EMPY_NODE if the node doesn't
exists.

>>> xml.nd_Main.nd_Text
<Text>One Two &amp; Three</Text>

- A node can be looked at as a list with one item

>>> xml.Main.Double.Double[0] is xml.Main.Double.Double
True

- Nodes Lists are regular lists
>>> len(xml.Main.List.Item)
3
>>> unicode(xml.Main.List.Item[2])
u'Bla Bla Bla'

- MyXml object is a dictionary

>>> xml["Main"]["Text"] == xml.Main["Text"]
True
>>> xml.Main.get("Text") == xml["Main"].Text
True

- There is also a very simple XPath-like method

>>> xml.xpath("Main/List/Item")[2]
<Item>Bla Bla Bla</Item>

- Attributes can be accessed with an "at_" prefix

>>> xml.Main.List.Item[1].at_ccc
u'ab&#43;c'

- Access the attributes dictionary with "at_dict"

>>> xml.Main.List.Item[0].at_dict["aaa"]
u'bbb'

- Every value can be looked at as a number and a boolean

>>> xml.Main.BoolNum.boolean
False

- Also attribute can be looked at as booleans or numbers

>>> xml.Main.BoolNum.at_num.number * 2
7.0
>>> xml.xpath("Main/BoolNum").at_bool.boolean
True

- But if the value is not a number or boolean (yes, no, true, false, 1, 0) the
- return value is None

>>> xml.Main.List.Item[0].at_aaa.number

- "get" and "xpath" return an empty node by default, so we can still use the
- number/boolean attributes.

>>> bool(xml.get("foo").boolean)
False

>>> xml.xpath("Main/foo").number is None
True

- Printing MyXml objects keeps the original order and adds indentation.
- The indentation is not thread safe though.

>>> print xml.Main.List
<List>
<Item aaa="bbb" />
<Item ccc="ab&#43;c" />
<Item>Bla Bla Bla</Item>
</List>

- Constructing MyXml object from a python complex object:

>>> xml = parse_object({
... "foo1": "bar",
... "foo2": ["bar1", "bar2", "bar3"],
... "foo3": {"bar": "foo"},
... "foo4": 5
... }, "Main") # "Main" is the name of the top most node

>>> xml.xpath("Main/foo4").number
5

- The names of the nodes that hold a sequence items, are the type name of the
- sequence (list, tuple, set, generator).

>>> xml.xpath("Main/foo2/list")[1] == "bar2"
True

- Finally - not very useful - but you can modify MyXml object

>>> add_returns_self = xml.add(MyNode("bar5", "foo5")) # MyNode(value, name)
>>> xml.foo5.at_dict["attr"] = "attr value"
>>> xml.xpath("Main/foo5").at_attr == "attr value"
True

One can also use the other built in dictionary and list methods, but this is not
recommended

>>> xml # Here the order is not preserved because of the python dictionary
<Main>
<foo4>5</foo4>
<foo1>bar</foo1>
<foo2>
<list>bar1</list>
<list>bar2</list>
<list>bar3</list>
</foo2>
<foo3>
<bar>foo</bar>
</foo3>
<foo5 attr="attr value">bar5</foo5>
</Main>

Please note that this module is not efficient in parsing large XML buffers. It
uses string slicing heavily.

Erez Bibi

Please send comments and questions to
erezbibi AT users DOT sourceforge DOT net

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

my_xml-0.1.2.zip (12.7 kB view hashes)

Uploaded Source

Built Distribution

my_xml-0.1.2-py2.6.egg (18.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page