public class HMMChineseTokenizer extends SegmentingTokenizerBase
The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.
AttributeSource.Statebuffer, BUFFERMAX, offsetDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
HMMChineseTokenizer()
Creates a new HMMChineseTokenizer
|
HMMChineseTokenizer(AttributeFactory factory)
Creates a new HMMChineseTokenizer, supplying the AttributeFactory
|
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
incrementWord()
Returns true if another word is available
|
void |
reset()
This method is called by a consumer before it begins consumption using
TokenStream.incrementToken(). |
protected void |
setNextSentence(int sentenceStart,
int sentenceEnd)
Provides the next input sentence for analysis
|
end, incrementToken, isSafeEndclose, correctOffset, setReaderaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringpublic HMMChineseTokenizer()
public HMMChineseTokenizer(AttributeFactory factory)
protected void setNextSentence(int sentenceStart,
int sentenceEnd)
SegmentingTokenizerBasesetNextSentence in class SegmentingTokenizerBaseprotected boolean incrementWord()
SegmentingTokenizerBaseincrementWord in class SegmentingTokenizerBasepublic void reset()
throws IOException
TokenStreamTokenStream.incrementToken().
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
If you override this method, always call super.reset(), otherwise
some internal state will not be correctly reset (e.g., Tokenizer will
throw IllegalStateException on further usage).
reset in class SegmentingTokenizerBaseIOExceptionCopyright © 2000-2017 The Apache Software Foundation. All Rights Reserved.