If the line substring matches the regexp,
it adds a " featurename=1" to the string buffer
It is useful for producing external datasets in Minorthird format
Train a AnnotatorLearner and return the learned
Annotator, using some unspecified source of information to
get AnnotationExample's to train the learner.
Adjust the precision-recall of a CMM that is based on an array of hyperplanes,
by adjusting the bias term of the hyperplane associated with the background (NEG)
class.
Sequential learner based on the perceptron algorithm, as described
in Discriminative Training Methods for Hidden Markov Models: Theory
and Experiments with Perceptron Algorithms, Mike Collins, EMNLP
2002.
Return a measure of confidence that the correct ClassLabels for
indices lo...hi-1 of the 'sequence' are in fact the ones in
'predictedClasses[lo..hi-1]', rather than the ones given in
in 'alternateClasses'.
Split into k separate disjoint folds, then return k train/test splits
where each train set is the union of k-1 folds, and the test set
is the k-th fold.
Make it the case that there are no spans whatsoever of the given
type contained by the given span, other than those described by
the given span looper.
DEFSEED1 -
Static variable in class edu.cmu.minorthird.classify.algorithms.random.RandomElement
DEFSEED2 -
Static variable in class edu.cmu.minorthird.classify.algorithms.random.RandomElement
degreeHelp -
Static variable in class edu.cmu.minorthird.classify.algorithms.svm.SVMLearner
Right now this method executes the mixup program associated with this annotator and the
caller is expected to get the results directly out of the labels set that was passed in
originally.
Finds a mapping path from the source text base to the destination textbase and translates
the specified span through each successive mapping until the coresponding span in the
destination text base is located.
Load data from the given location according to configuration and whether
location is a directory or not
Calling load a second time will load into the same text base (thus the
second call returns documents from both the first and second locations).
Load data from the given location according to configuration and whether
location is a directory or not
Calling load a second time will load into the same text base (thus the
second call returns documents from both the first and second locations).
usage: programFile textFile/directory [outfile]
evaluates the given program file against the specified data (either a file or directory of files)
if an outfile is specified it outputs the types as operators to that file
Without constrains, the maximum number of times a mixup
expression can extract something from a document of length N is
O(N*N), since any token can be the begin or end of an extracted
span.
Without constraints, the maximum number of times a mixup
expression can extract something from a document of length N is
O(N*N), since any token can be the begin or end of an extracted
span.
Compute the method of moment estimates of the rate 'mu' and the parameter
which controls the variability 'delta' of a Negative-Binomial models, using
integer counts x[] from examples with different lengths omega[].
Marker interface for SpanFeatureExtractor objects which allow one
to attach a type of required annotations that must be present
before feature extraction starts.
Reads a file and converts it to a String via a byte array and inputStream.available()
I'm not positive that inputStream.available() works the same under multi-threading
Run an annotation-learning experiment based on pre-labeled text , using a
sequence learning method, and showing the result of evaluation of the
sequence-classification level.
Attach an annotatorLoader to the SpanFeatureExtractor, which is
used to find the required Annotation (and any other Annotations
that that it might recursively require.)
A correct implementation of a MixupCompatible
SpanFeatureExtractor will call
textLabels.require(annotation,null,loader) before
extracting features relative to textLabels.
tokenProperties depends on the requiredAnnotation, so override
default setRequiredAnnotation() method to reset the
tokenPropertyFeatures to null when this changes.
A Span.Looper which also passes out two additional types
of information about each returned span s:
if s is a FALSE_POS, FALSE_NEG, or TRUE_POS,
relative to the original spans.
Create a new dataset in which each instance has been augmented
with the features constructed from the *predicted* labels
of neighbor examples, where the prediction is made using
cross-validation.
Create a new dataset in which each instance has been augmented
with the history features constructed from the *predicted* labels
of previous examples, where the prediction is made using
cross-validation.
Wraps the svm.svm_train algorithm from libsvm
(http://www.csie.ntu.edu.tw/~cjlin/libsvm/)
Parameterization is done via an SVM object (see libsvm docs for examples/info).
A complicated splitter that stratifies samples according to an
arbitrary "profile" property, and restricts train/test splits to
not cross boundaries defined by "user" and "request" properties.