Skip to main content

Table 1 Comparison of Model 1 and Model 2.

From: tmChem: a high performance approach for chemical named entity recognition and normalization

Aspect Model 1 Model 2
System adapted BANNER [22] tmVar [24]
Unicode transliteration No Yes
Tokenization whitespace
lowercase to uppercase
lowercase to uppercase
uppercase to lowercase
Sentence segmentation Java BreakIterator None
Conditional random field configuration and settings
Implementation MALLET [25] CRF++ [23]
Order 1 2
Label model IOB with one entity label IOB with one entity label
Regularization L2 L2
Gaussian prior variance (σ) 1.0 4.0
Feature frequency threshold 0 3
Individual tokens Yes Yes
Morphology Lemmatization Stemming
Part of speech Yes No
Word shapes Yes Yes
Characters N-grams length 2 - 4 Prefixes and suffixes length 2 - 5
Character counts None Total characters, digits, uppercase, lowercase
ChemSpot [4] Yes No
Semantic affixes None Suffixes, alkane stems, trivial rings, simple multipliers, etc.
Chemical elements Name and symbol Name
Amino acids Name, 3-char abbreviation, 1-char abbreviation None
Chemical formulas Within a single token None
Amino acid sequences Across tokens None
Context window 2 3
Post processing
Consistency Yes No
Abbreviation resolution Yes Yes
Parenthesis balancing Yes Yes
Chemical identifiers Yes Yes
  1. This table compares the setup and configuration of Model 1 and Model 2.