Skip to main content

CaboCha output-XML accessor

Project description

Parse XMLs from CaboCha and provides simple tree accessors.

Usage

Expected usages are focused on chunk surfaces and dependency links:

>>> aisansan = xmlpumpkin.parse_to_tree(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... )
>>> len(aisansan.chunks)
8
>>> print(aisansan.root.surface)
流したりして
>>> print(aisansan.root.func_surface)
て
>>> for dep in aisansan.root.linked:
...     print(dep.surface)
...
降って
涙を

You need CaboCha in your path, or shortly with prepared XML:

>>> tree = xmlpumpkin.Tree(xml_as_unicode)

Should you need an easy interface from Python to CaboCha:

>>> from xmlpumpkin import cabocha
>>> print(cabocha.txttree(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... ))
    愛燦々と-----D
          この-D |
            身に-D
            降って-------D
            心密かな---D |
              うれしい-D |
                    涙を-D
              流したりして
EOS
>>> print(cabocha.as_xml(
...     u'愛燦々とこの身に降って心密かなうれしい涙を流したりして'
... ))
<sentence>
  ...
</sentence>

All I/Os are unicodes! If encodings other than UTF-8 is preferred, directly modify following constants:

>>> import xmlpumpkin.runner
>>> xmlpumpkin.runner.CABOCHA_ENCODING = 'SJIS'
>>>
>>> import xmlpumpkin.tree
>>> xmlpumpkin.tree.XML_ENCODING = 'SJIS'

Properties

Not enough but a few properties are provided via Tree and Chunk objects.

class xmlpumpkin.Tree(cabocha_xml)
  • chunks - tuple of chunks

  • root - root (not depending on any chunks) Chunk object

  • chunk_by_id(chunk_id) - get Chunk object by its id generated by CaboCha

  • _element - origin XML as lxml Element object

class xmlpumpkin.Chunk(element, parent)
  • id - chunk id

  • link_to_id - its depending chunk id

  • linked_from_ids - tuple of chunk id depending to this chunk

  • func_id - functional token id of this chunk

  • dep - its depending Chunk object

  • linked - list of all Chunk objects depending to this chunk

  • surface - surface of this chunk

  • func_surface - surface of this chunk’s functional token

  • _tokens() - its containing tokens as lxml Element objects

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xmlpumpkin-0.1.tar.gz (7.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page