Convert SHIFT-JIS to UTF-8
iconv --f SHIFT-JIS -t UTF-8 input-shift-jis-file.txt -o output-utf8-file.txt
MeCab
See
this post.
Japanese Morphological Analyzer with IGO
Package: https://pypi.python.org/pypi/igo-python/0.9.8
Sample code:
>>> from igo.Tagger import Tagger
>>> t = Tagger() # use bundled dictionary
>>> for m in t.parse(u'すもももももももものうち'):
... print m.surface, m.feature
...
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
>>>
[TO BE UPDATED]