Skip to main content

Table 6 Reject and accept rules consecution for bigrams (2-grams)

From: Terminology spectrum analysis of natural-language chemical documents: term-like phrases retrieval routine

Description

Examples

GeneralChemTermRule (accept rule) (the same rule as for 1-grams)

StrictFilteringTagRule (reject rule) (the same rule as for 1-grams)

ShortTokensRule (reject rule) True if a 2-gram consists of only short tokens <3 characters

IdenticalTokensRule (reject rule) True if a 2-gram contains at least two identical tokens

UnitsRule (reject rule)

True if any token in a 2-gram ends with measurement unit string from the dictionary (Table 1)

It should be noted that measurement unit may be consisted of several tokens, for example, the “g/h” consists of three tokens [“g”, “/”, “h”]

PPM C7H14, 70ML MIN-1, CM3MIN-1 H2, MIN-1 FLOW, H-1 GAS, PPM N2O/AR, ML G-1MIN-1, MOL-1 HYDROLYSIS, PPM NOX/5%O2/N2

BiGramPOSRule (accept rule with exception)

True, if the fist token is tagged with one of the following POS tags: JJ, JJR, FW, VBG, VBD, VBN, NN, NNP, NNPS, NNS

and the second token is tagged with one of: FW, VBG, NN, NNP, NNPS, NNS

Exception—the following combinations are not allowed: «VBG,VBG» , «VBG,FW» , «NNP, FW»

Term-like: Andronov bifurcation, Na2CO3 impregnation, nickel catalyst; supported MgO, anchored lysine, stirred glass; carbonaceous particle, temperature-programmed adsorption, Fischer–Tropsch catalyst; in situ EXAF, UV–VIS spectroscopy, Raman spectroscopy

Filtered due to exception: involving reforming, reforming minimizing, using in, Shimada etc