Skip to main content

Table 4 The consolidated list of all tags assigned to tokens at different steps of the text preprocessing stage

From: Terminology spectrum analysis of natural-language chemical documents: term-like phrases retrieval routine

Group of tags

Tag

Explanation

Strict filtering

Morphological pattern

POS

 

JJ

Adjective

 

Yes (n-grams n > 1)

 

JJR

Adjective, comparative

 

Yes (n-grams n > 1)

 

VBG

Verb, gerund or present participle

 

Yes (n-grams n ≥ 1)

 

VBD

Verb, past tense includes the conditional form of the verb to be

 

Yes (n-grams n > 1)

 

VBN

Verb, past participle

 

Yes (n-grams n > 1)

 

NNP

Proper Noun, singular

 

Yes (n-grams n > 1)

 

NN

Noun, singular or mass

 

Yes (n-grams n ≥ 1)

 

NNPS

Proper Noun, plural

 

Yes (n-grams n ≥ 1)

 

NNS

Noun, plural

 

Yes (n-grams n ≥ 1)

 

IN

Preposition or subordinating conjunction

 

Yes (n-grams n > 1)

 

DT

Determiner

 

Yes (n-grams n > 1)

 

RB

Adverb

 

Yes (n-grams n > 2)

 

RBS

Adverb, superlative

 

Yes (n-grams n > 2)

 

FW

Foreign word

 

Yes (n-grams n > 1)

OSCAR

 

CM

Chemical matter

Yes

Yes (all n-grams)

 

ONT

Ontological term

Yes

Yes (all n-grams)

Own tags

 

COMP

Chemical composition

 

Yes (all n-grams)

 

rubbish

Token for which strict filtering to be applied

Yes

Yes (all n-grams)

 

GCST

General Chemistry Scientific Term

 

Yes (all n-grams)

  1. It is also indicated whether a tag is used in strict filtering or in term-like phrases retrieval procedure with help of POS-based rules