Interface Segments
- All Known Implementing Classes:
SegmentsImpl
CharSequence to a Segmenter.
The segmentation results can be provided either as the segmentation boundary indices
({code int}s) or as segments, which are represented by the Segment class. In turn, the
Segment object can also provide the subsequence of the original input that it
represents.
Example:
Segmenter wordSeg =
LocalizedSegmenter.builder()
.setLocale(ULocale.forLanguageTag("de"))
.setSegmentationType(SegmentationType.WORD)
.build();
Segments segments = wordSeg.segment("Das 21ste Jahrh. ist das beste.");
List<CharSequence> words = segments.subSequences().collect(Collectors.toList());
- See Also:
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptiondefault IntStreamReturns all segmentation boundaries, starting from the beginning and moving forwards.boundariesAfter(int i) Returns all segmentation boundaries after the provided index.boundariesBackFrom(int i) Returns all segmentation boundaries on or before the provided index.booleanisBoundary(int i) Returns whether offsetiis a segmentation boundary.segmentAt(int i) Returns the segment that contains indexi.segments()Returns aStreamof allSegments in the source sequence.segmentsBefore(int i) Returns aStreamof allSegments in the source sequence where all segment limitslsatisfyl ≤ i.segmentsFrom(int i) Returns aStreamof allSegments in the source sequence where all segment limitslsatisfyi < l.default Stream<CharSequence> Returns aStreamof theCharSequences for all of the segments in the source sequence.
-
Method Details
-
subSequences
Returns aStreamof theCharSequences for all of the segments in the source sequence. Start from the beginning of the sequence and iterate forwards until the end.- Returns:
- a
Streamof allSegmentsin the source sequence.
-
segmentAt
Returns the segment that contains indexi. Containment is inclusive of the start index and exclusive of the limit index.Specifically, the containing segment is defined as the segment with start
sand limitlsuch thats ≤ i < l.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- A segment that either starts at or contains index
i - Throws:
IndexOutOfBoundsException- ifiis less than 0 or greater than or equal to the length of the inputCharSequenceto theSegmenter
-
segments
-
segmentsFrom
Returns aStreamof allSegments in the source sequence where all segment limitslsatisfyi < l. Iteration moves forwards.This means that the first segment in the stream is the same as what is returned by
segmentAt(i).The word "from" is used here to mean "at or after", with the semantics of "at" for a
Segmentdefined bysegmentAt(int)}. We cannot describe the segments all as being "after" since the first segment might containiin the middle, meaning that in the forward direction, its start position precedesi.segmentsFromandsegmentsBefore(int)create a partitioning of the space of allSegments.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- a
Streamof allSegments at or afteri
-
segmentsBefore
Returns aStreamof allSegments in the source sequence where all segment limitslsatisfyl ≤ i. Iteration moves backwards.This means that the all segments in the stream come before the one that is returned by
segmentAt(i). A segment is not considered to contain indexiif {code i} is equal to limitl. Thus, "before" encapsulates the invariantl ≤ i.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- a
Streamof allSegments beforei
-
isBoundary
boolean isBoundary(int i) Returns whether offsetiis a segmentation boundary. Throws an exception wheniis not a valid index position for the source sequence.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- Returns whether offset
iis a segmentation boundary. - Throws:
IllegalArgumentException- ifiis less than 0 or greater than the length of the inputCharSequenceto theSegmenter
-
boundaries
Returns all segmentation boundaries, starting from the beginning and moving forwards.Note:
boundaries() != boundariesAfter(0). This difference naturally results from the strict inequality condition in boundariesAfter, and the fact that 0 is the first boundary returned from the start of an input sequence.- Returns:
- An
IntStreamof all segmentation boundaries, starting at the first boundary with index 0, and moving forwards in the input sequence.
-
boundariesAfter
Returns all segmentation boundaries after the provided index. Iteration moves forwards.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- An
IntStreamof all boundariesbsuch thatb > i
-
boundariesBackFrom
Returns all segmentation boundaries on or before the provided index. Iteration moves backwards.The phrase "back from" is used to indicate both that: 1) boundaries are "on or before" the input index; 2) the direction of iteration is backwards (towards the beginning). "on or before" indicates that the result set is
bwhereb ≤ i, which is a weak inequality, while "before" might suggest the strict inequalityb < i.boundariesBackFromandboundariesAfter(int)create a partitioning of the space of all boundaries.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- An
IntStreamof all boundariesbsuch thatb ≤ i
-