Skip to main content

Table 3 Description of common categories of textual features with some examples, summarized from [23]

From: Chemical named entities recognition: a review on approaches and applications

Features categories Objectives and Examples
Linguistic to find the prefix that is common to all variations of the term,
to find the root term of the variant word,
to assign each token to a grammatical category or
to divide the text into syntactical correlated parts of words,
(e.g chucking, lemmatization, stemming and Part-of-speech (POS) tagging)
Orthographic to capture knowledge on word formation by the presence of these features, (e.g capitalization and symbols)
Morphological to reflect common structures and/or sub-sequences of characters among entities, (e.g suffixes and prefixes, char n-gram and word shape patterns)
Context to establish a higher level of relationship between the tokens and the extracted features, e.g (windows and conjunctions)
Lexicons to add domain knowledge to the set of features for optimizing the NER system. Dictionaries of domain term are used to match the entity names in the text and the resulting tags are used as features. Examples of the types of dictionaries used (target entity name and trigger name).