WALDO
2007-05-30 19:07:00 UTC
I am developing a custom library for WordBreaking and Stemming using
Microsoft's IWordBreaker, IWordSink, IWordFormSink, IStemmer, and
IPhraseSink interfaces. I am reasonably certain my code works, but I can't
verify the IPhraseSink implementation.
I feed my library bits of text that just roll off the top of my head, which
parse very well (breaking and generating alternate words), like:
"I am the very model of a modern major general."
"You're either with me or against me."
but the word breaker never steps into the IPhraseSink. I can't distinguish
whether the text I'm feeding it is just not phraseable [I realize that's not
a word :)], or if I've implemented the IPhraseSink incorrectly. There isn't
much out there on the web about IPhraseSink specifically.
Does anyone out there have a test bit of text that they know will be broken
into phrase(s) using the default en-us locale (1033)?
Or should I just trust my code?
Or should I just keep feeding it more text?
Any help is appreciated.
Thanks in advance.
WALDO
Microsoft's IWordBreaker, IWordSink, IWordFormSink, IStemmer, and
IPhraseSink interfaces. I am reasonably certain my code works, but I can't
verify the IPhraseSink implementation.
I feed my library bits of text that just roll off the top of my head, which
parse very well (breaking and generating alternate words), like:
"I am the very model of a modern major general."
"You're either with me or against me."
but the word breaker never steps into the IPhraseSink. I can't distinguish
whether the text I'm feeding it is just not phraseable [I realize that's not
a word :)], or if I've implemented the IPhraseSink incorrectly. There isn't
much out there on the web about IPhraseSink specifically.
Does anyone out there have a test bit of text that they know will be broken
into phrase(s) using the default en-us locale (1033)?
Or should I just trust my code?
Or should I just keep feeding it more text?
Any help is appreciated.
Thanks in advance.
WALDO