Description | Examples |
---|---|
GeneralChemTermRule (accept rule) True if a 1-gram is a general chemistry scientific term | |
StrictFilteringTagRule (reject rule) True if a 1-gram consists of a token with the strict filtering tag «rubbish:true» | |
ShortTokensRule (reject rule) True if a 1-gram consists of a short token of length less than three characters This rule is to exclude noise existing in documents such as axes labels and so on | |
UnitsRule (reject rule) True if a 1-gram contains a string being a measurement unit from the dictionary (Table 1) | |
ChemUnigramRule (accept rule) True if a 1-gram is tagged by any OSCAR tag and by one of the following POS tags: FW, NNP, or tagged by tag COMP. Selected unigrams are assumed and marked to have a chemical sense. Term-like: barium, phenanthrene, pentanol, xanes | |
GeneralEnglishDictRule (reject rule) True, if a 1-gram is in the General English Dictionary (Table 1) | Filtered: topography, paint, plateau, pool, searching, file, addenda, improvement, theme … Term-like: hydrocalcite, acetylacetone, cracking, ageing |
UnigramPOSRule (reject rule) True, if a 1-gram is not a noun or a gerund. Term-like 1-gram must be tagged with the following POS tags: VBG, NN, NNPS, NNS | Filtered: schematized, suddenly, skeletal, behind Term-like: ethylene, hydrocalcite, leaching, 12n-decylhexadecanamide, sulfamethoxazole, anchoring |
UnigramAddRules (reject rules) Set of regular expressions to filter unigrams denoting various ions, signs, captions and etc. | Filtered: M(O2), GA15.6, PW91, V2.1, G(D), TI(V), PD(I), PT0, P(X), BA2+, CE(3+), cm3, CH3, AA, Cu2+, Mo6+, Et-CP, GC–MS, Zn-Al |