Skip to main content

Table 3 Description of common categories of textual features with some examples, summarized from [23]

From: Chemical named entities recognition: a review on approaches and applications

Features categories

Objectives and Examples

Linguistic

to find the prefix that is common to all variations of the term,

to find the root term of the variant word,

to assign each token to a grammatical category or

to divide the text into syntactical correlated parts of words,

(e.g chucking, lemmatization, stemming and Part-of-speech (POS) tagging)

Orthographic

to capture knowledge on word formation by the presence of these features, (e.g capitalization and symbols)

Morphological

to reflect common structures and/or sub-sequences of characters among entities, (e.g suffixes and prefixes, char n-gram and word shape patterns)

Context

to establish a higher level of relationship between the tokens and the extracted features, e.g (windows and conjunctions)

Lexicons

to add domain knowledge to the set of features for optimizing the NER system. Dictionaries of domain term are used to match the entity names in the text and the resulting tags are used as features. Examples of the types of dictionaries used (target entity name and trigger name).