From: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization
Task [model] | Context | [Repetition pattern] output |
---|---|---|
Sentence completion [GPT2] ref. [10] | Prefix: Lyrically the song has excerpts of different languages including French, Spanish, | [Single-word repetition] German, Italian, Portuguese, Spanish, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, . |
Summarization [Transformer] ref. [52] | Article | [Phrase-level repetition] ...cristiano ronaldo’s best footballer whereas barcelona talisman lionel messi’s best footballer whereas barcelona’s best footballer whereas barcelona talisman lionel lee whereas’s best footballer’s. (Truncated) |
Generation from an initial tag line [GPT-2] ref. [9] | Tag: So what’s new in my life? 09/11/18 - Just got back from vacation | [Sentence-level repetition] Just got back from vacation. 09/10/18 - Just got back from vacation. Just got back from vacation. 09/09/18 - Just got back from vacation. Just got back from vacation. 09/08/18 - Just got back from vacation. Just got back from vacation. |
Product review generation [GPT-2] ref. [53] | Initial context | [Structural repetition] Great movie, although took a while to see at first it held my interest and kept me interested, plus i thought it was extremely good. also it was very good. |
Protein sequence generation [ProtGPT-2] ref. [54] | No context | [Subsequential repetition] MSNDTPTHDPTPPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPE. |
Molecule captioning [Transformer] ref. [55] | SMILES: CC[N+](CC)=C1C=CC2=N C3=C(OC2=C1)C=C(N)C(C) =C3 | [Single-word repetition] the molecule is a deuterated compound that is is is is is an isotopologue of chloroform in which the four hydrogen atoms have been replaced by deuterium. (Truncated) |