Discussion:
Chinese Wordbreaker
(too old to reply)
Martin
2006-04-27 20:09:54 UTC
Permalink
I am looking for commercially available Chinese word breaker for Index
server. I am looking for a word breaker that would be able to perform
the actual Chinese words segmentation instead of considering each
Chinese character as a word like current Index Server Chinese word
breaker does.

Does anyone know where I could find it?
Hilary Cotter
2006-05-01 13:13:33 UTC
Permalink
The Chinese word breaker appears to look at each character, detect radicals,
subcharacters, and then parse the token looking for compound characters.

You can find the patent filed for the actual process that they use -
unfortunately I can't find it right now, but I did find it through Google
some time ago.

Once upon a time Oracle, Sybase, Microsoft, and IBM all used the same
company's word breaker - infosoft. I am not sure who uses what now.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
Post by Martin
I am looking for commercially available Chinese word breaker for Index
server. I am looking for a word breaker that would be able to perform
the actual Chinese words segmentation instead of considering each
Chinese character as a word like current Index Server Chinese word
breaker does.
Does anyone know where I could find it?
Loading...