|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.cmu.minorthird.text.mixup.Mixup
public class Mixup
A simple pattern-matching and information extraction language.
EXAMPLE:
... in('begin') @number? [ any{2,5} in('end') ] ... && [!in('begin')*] && [!in('end')*]
BNF:
simplePrim -> [!] simplePrim1
simplePrim1 -> id | a(DICT) | ai(DICT) | eq(CONST) | eqi(CONST) | re(REGEX)
| any | ... | PROPERTY:VALUE | PROPERTY:a(foo) )
prim -> < simplePrim [,simplePrim]* > | simplePrim
repeatedPrim -> [L] prim [R] repeat | @type | @type?
repeat -> {int,int} | {,int} | {int,} | {int} | ? | * | +
pattern -> | repeatedPrim pattern
basicExpr -> pattern [ pattern ] pattern
basicExpr -> (expr)
expr -> basicExpr "||" expr
expr -> basicExpr "&&" expr
SEMANTICS:
basicExpr is pattern match - like a regex, but returns all matches, not just the longest one
token-level tests:
eq('foo') check token is exactly foo
'foo' is short for eq('foo')
re('regex') checks if token matches the regex
eqi('foo') check lowercase version of token is foo
'foo' or eq('foo') checks a token is equal to 'foo'
a(bar) checks a token is in dictionary 'bar'
ai(bar) checks that the token is in dictionary 'bar', ignoring case
color:red checks that the token has property 'color' set to 'red'
color:a(primaryColor) checks that the token's property 'color' is in the dictionary 'primaryColor'
!test is negation of test
conjoins token-level tests
any is true for any token
token-sequences:
test? is 0 or 1 tokens matching test
test+ is 1+ tokens matching test
test* is 0+ tokens matching test
test{3,7} is between 3 and 7 tokens matching test
... is equal to any*
@foo matches a span of type foo
@foo? matches a span of type foo or the empty sequence
L means sequence can't be extended to left and still match
R means sequence can't be extended to right and still match
expr || expr is union
expr && expr is piping: generate with expr1, filter with expr2
The name's an acronym for My Information eXtraction and Understanding Package.
| Nested Class Summary | |
|---|---|
static class |
Mixup.MixupTokenizer
|
static class |
Mixup.ParseException
Signals an error in parsing a mixup document. |
| Field Summary | |
|---|---|
static int |
maxNumberOfMatches
Without constrains, the maximum number of times a mixup expression can extract something from a document of length N is O(N*N), since any token can be the begin or end of an extracted span. |
static int |
maxNumberOfMatchesPerToken
Without constraints, the maximum number of times a mixup expression can extract something from a document of length N is O(N*N), since any token can be the begin or end of an extracted span. |
static int |
minMatchesToApplyConstraints
Without constraints, the maximum number of times a mixup expression can extract something from a document of length N is O(N*N). |
static java.util.regex.Pattern |
tokenizerPattern
|
| Constructor Summary | |
|---|---|
Mixup(Mixup.MixupTokenizer tok)
|
|
Mixup(java.lang.String pattern)
Create a new mixup query. |
|
| Method Summary | |
|---|---|
java.util.Iterator<Span> |
extract(TextLabels labels,
java.util.Iterator<Span> spanLooper)
Extract subspans from each generated span using the mixup expression. |
static void |
main(java.lang.String[] args)
|
java.lang.String |
toString()
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static int minMatchesToApplyConstraints
public static int maxNumberOfMatchesPerToken
public static int maxNumberOfMatches
public static final java.util.regex.Pattern tokenizerPattern
| Constructor Detail |
|---|
public Mixup(java.lang.String pattern)
throws Mixup.ParseException
Mixup.ParseException
public Mixup(Mixup.MixupTokenizer tok)
throws Mixup.ParseException
Mixup.ParseException| Method Detail |
|---|
public java.util.Iterator<Span> extract(TextLabels labels,
java.util.Iterator<Span> spanLooper)
public java.lang.String toString()
toString in class java.lang.Objectpublic static void main(java.lang.String[] args)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||