Chemical named entities recognition: a review on approaches and applications

Table 3 Description of common categories of textual features with some examples, summarized from [23]

Features categories	Objectives and Examples
Linguistic	to find the prefix that is common to all variations of the term,
	to find the root term of the variant word,
	to assign each token to a grammatical category or
	to divide the text into syntactical correlated parts of words,
	(e.g chucking, lemmatization, stemming and Part-of-speech (POS) tagging)
Orthographic	to capture knowledge on word formation by the presence of these features, (e.g capitalization and symbols)
Morphological	to reflect common structures and/or sub-sequences of characters among entities, (e.g suffixes and prefixes, char n-gram and word shape patterns)
Context	to establish a higher level of relationship between the tokens and the extracted features, e.g (windows and conjunctions)
Lexicons	to add domain knowledge to the set of features for optimizing the NER system. Dictionaries of domain term are used to match the entity names in the text and the resulting tags are used as features. Examples of the types of dictionaries used (target entity name and trigger name).

ISSN: 1758-2946