Skip to main content

'BigramSplitter' is add-on search product for Plone 3.x. It supports non-English languages, especially south east Asian languages.

Project description

Introduction

Specification: Text character normalization process uses Python unicodedata. Convert full-width numeric and alphabet character into half-width equivalent. Convert half-width Katakana into full-width equivalent. Therefore all of above character variations can be recognized as same ones.

Language Specifications:

  • Chinese

  • No space between words.

  • There is only Kanji(Chinese) character

  • Process with Bigram(2-gram) model

  • Japanese

  • No space between words

  • Combination 0f Kanji(Chinese), Katakana, and Hiragana character

  • Korean

  • There are spaces between words, but it contains a particle

  • Combination of Korean alphabet and Kanji(Chinese) character

  • Discriminate Korean alphabet and Kanji(Chinese) character and processed with Bigram(2-gram) model

  • Thai

  • No space between words

  • It’s very difficult to handle this language in a computer

  • A vowel and a consonant are registered in Unicode separately so that it is difficult to recognize as one word.

  • However, there is a possibility of dealing with Thai characters to use Bigram(2-gram) model.

  • Other languages (Including English)

  • There is a space between words

  • It is indexed each word

Notes:

  • Source Code

    Since no documents are available on how to develop ‘word splitter’, we refer to other splitter source code. But I still have a number of questions. If you have any more information, please feel free let us know.

  • Hotfix to Plone 3.0 source code

    Because Plone 3.x catalog setting, catalog.xml, doesn’t have existing index overwrite mechanism, we developed hotfix and added XML attribute. We believe Plone 3 XML define mechanism is simple and clear, so that we take this approach. We appreciate any comment.

Installation

Use zc.buildout

  • Add Products.BigramSplitter to the list of eggs to install, e.g.:

    [buildout]
    ...
    eggs =
        ...
        Products.BigramSplitter
  • Tell the plone.recipe.zope2instance recipe to install a ZCML slug:

    [instance]
    recipe = plone.recipe.zope2instance
    ...
    zcml =
        Products.BigramSplitter
  • Re-run buildout, e.g. with:

    $ ./bin/buildout
  • Restart Zope

  • Plone setting – Add on products – Quick install

Old Style

  • Untar downloaded file, then copy to ‘Products’ directory of your Plone instance.

  • Restart Zope

  • Plone setting – Add on products – Quick install

Required

  • Plone3.0.x or higher

License

  • See docs/LICENSE.txt

Author

  • Manabu Terada e-mail : terada@cmscom.jp

  • Mikio Hokari

  • Naoki Nakanishi

  • Naotaka Hotta

  • Takashi Nagai

Changelog

1.0 (2010-12-06)

  • Adding uninstall script

1.0b4 (2010-06-07)

  • Fixed missing skin folder name

1.0b3 (2010-03-20)

  • Adding keyword highlight (JavaScript)

1.0a2 (2010-01-29)

  • Fixed full width space for and search

1.0a1 (2009-12-05)

  • Initial release

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page