{"id":717,"date":"2018-01-17T15:02:16","date_gmt":"2018-01-17T15:02:16","guid":{"rendered":"http:\/\/www.marekrei.com\/blog\/?p=717"},"modified":"2019-09-27T23:22:28","modified_gmt":"2019-09-27T23:22:28","slug":"paper-summaries","status":"publish","type":"post","link":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/","title":{"rendered":"57 Summaries of Machine Learning and NLP Research"},"content":{"rendered":"<p>Staying on top of recent work is an important part of being a good researcher, but this can be quite difficult. Thousands of new papers are published <a href=\"https:\/\/www.marekrei.com\/blog\/ml-nlp-publications-in-2017\/\">every year at the main ML and NLP conferences<\/a>, not to mention all the specialised workshops and everything that shows up on ArXiv. Going through all of them, even just to find the papers that you want to read in more depth, can be very time-consuming.<\/p>\n<p>In this post, I have summarised 50 papers. After going through a paper, if I had the chance, I would write down a few notes and summarise the work in a couple of sentences. These are not meant as reviews &#8211; I&#8217;m not commenting on whether I think the paper is good or not. But I do try to present the crux of the paper as bluntly as possible, without unnecessary sales tactics. Hopefully this can give you the general idea of 50 papers, in roughly 20 minutes of reading time.<\/p>\n<p>The papers are not selected or ordered based on any criteria. It is not a list of the best papers I have read, more like a random sample. The only filter that I applied was to exclude papers older than 2016, as the goal is to give an overview of the more recent work.<\/p>\n<p>I set out to summarise 50 papers. Once I was done, I thought this would be a sensible place to summarise my own work as well. So at the end of the list you will also find brief summaries of the papers I published in 2017.<\/p>\n<p>Let&#8217;s get started.<\/p>\n<p><strong>1. A Thorough Examination of the CNN\/Daily Mail Reading Comprehension Task<\/strong><br \/>\nDanqi Chen, Jason Bolton, Christopher D. Manning. Stanford. ACL 2016.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1606.02858.pdf\">https:\/\/arxiv.org\/pdf\/1606.02858.pdf<\/a><\/p>\n<p>Hermann et al (2015) created a dataset for testing reading comprehension by extracting summarised bullet points from CNN and Daily Mail. All the entities in the text are anonymised and the task is to place correct entities into empty slots\u00a0based on the news article.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail.png\" rel=\"attachment wp-att-1058\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png\" alt=\"cnn_daily_mail\" width=\"300\" height=\"151\" class=\"aligncenter size-medium wp-image-1058\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-150x76.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-768x388.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail.png 951w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>This paper has hand-reviewed 100 samples from the dataset and concludes that around 25% of the questions are difficult or impossible to answer even for a human, mostly due to the anonymisation process. They present a simple classifier that achieves unexpectedly good results, and a neural network based on attention that beats all previous results by quite a margin.<\/p>\n<p><strong>2. Word Translation Without Parallel Data<\/strong><br \/>\nAlexis Conneau, Guillaume Lample, Marc&#8217;Aurelio Ranzato, Ludovic Denoyer, Herv\u00e9 J\u00e9gou. Facebook, Le Mans, Sorbonne. ArXiv 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1710.04087.pdf\">https:\/\/arxiv.org\/pdf\/1710.04087.pdf<\/a><\/p>\n<p>Inducing word translations using only monolingual corpora for two languages. Separate embeddings are trained for each language and a mapping is learned though an adversarial objective, along with an orthogonality constraint on the most frequent words. A strategy for an unsupervised stopping criterion is also proposed.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Word-Translation-Without-Parallel-Data.png\" rel=\"attachment wp-att-1095\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Word-Translation-Without-Parallel-Data-300x64.png\" alt=\"Word Translation Without Parallel Data\" width=\"300\" height=\"64\" class=\"aligncenter size-medium wp-image-1095\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Word-Translation-Without-Parallel-Data-300x64.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Word-Translation-Without-Parallel-Data-150x32.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Word-Translation-Without-Parallel-Data-768x164.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Word-Translation-Without-Parallel-Data.png 813w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><!--more--><\/p>\n<p><strong>3. A Nested Attention Neural Hybrid Model for Grammatical Error Correction<\/strong><br \/>\nJianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong, Jianfeng Gao. ACL 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1070.pdf\">http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1070.pdf<\/a><\/p>\n<p>Proposing character-based extensions to a neural MT system for grammatical error correction. OOV words are represented in the encoder and decoder using character-based RNNs. They evaluate on the CoNLL-14 dataset, integrate probabilities from a large language model, and achieve good results.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/A-Nested-Attention-Neural-Hybrid-Model-for-Grammatical-Error-Correction.png\" rel=\"attachment wp-att-1083\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/A-Nested-Attention-Neural-Hybrid-Model-for-Grammatical-Error-Correction-300x201.png\" alt=\"A Nested Attention Neural Hybrid Model for Grammatical Error Correction\" width=\"300\" height=\"201\" class=\"aligncenter size-medium wp-image-1083\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/A-Nested-Attention-Neural-Hybrid-Model-for-Grammatical-Error-Correction-300x201.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/A-Nested-Attention-Neural-Hybrid-Model-for-Grammatical-Error-Correction-150x100.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/A-Nested-Attention-Neural-Hybrid-Model-for-Grammatical-Error-Correction.png 603w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>4. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems<br \/>\n<\/strong>Pei-Hao\u00a0Su,\u00a0Milica\u00a0Gasic,\u00a0Nikola\u00a0Mrksic,\u00a0Lina\u00a0Rojas-Barahona,\u00a0Stefan\u00a0Ultes,\u00a0David\u00a0Vandyke,\u00a0Tsung-Hsien\u00a0Wen,\u00a0Steve\u00a0Young. Cambridge. ACL 2016.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/P\/P16\/P16-1230.pdf\">http:\/\/aclweb.org\/anthology\/P\/P16\/P16-1230.pdf<\/a><\/p>\n<p>The goal\u00a0is to improve the training process for a spoken dialogue system, more specifically a telephone-based system providing restaurant information for the Cambridge (UK) area. They train a supervised system which tries to predict the success on the current dialogue &#8211; if the model is certain about the outcome, the predicted label is used for training the dialogue system; if the model is uncertain, the user is asked to provide a label. Essentially it reduces the amount of annotation that is required, by choosing which examples should be annotated through active learning.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue.png\" rel=\"attachment wp-att-1065\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue-300x158.png\" alt=\"dialogue\" width=\"300\" height=\"158\" class=\"aligncenter size-medium wp-image-1065\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue-300x158.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue-150x79.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue.png 710w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>The dialogue is mapped to a vector representation using a bidirectional LSTM\u00a0trained like an autoencoder, and a Gaussian Process\u00a0is used for modelling dialogue success.<\/p>\n<p><strong>5.\u00a0Vision and Feature Norms: Improving automatic feature norm learning through cross-modal maps<\/strong><br \/>\nLuana Bulat, Douwe Kiela, Stephen Clark. Cambridge. NAACL 2016.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/N\/N16\/N16-1071.pdf\">http:\/\/aclweb.org\/anthology\/N\/N16\/N16-1071.pdf<\/a><\/p>\n<p>The task is to predict feature norms &#8211; object properties, for example <em>is_yellow<\/em> and <em>is_edible<\/em> for the word\u00a0<em>banana<\/em>. They experiment with adding in image recognition features, in addition to using distributional word vectors.<\/p>\n<p>An input word is used to retrieve 10 images from Google, these are passed through an ImageNet classifier to get feature vectors, and then averaged to get a vector representation for that word. A supervised\u00a0model (partial least-squares regression) is then trained to predict vectors of feature norms based on the input vectors (image-based, distributional, or a combination). Including the image information helps quite a bit, especially for detecting properties like colour and shape.<\/p>\n<figure id=\"attachment_734\" aria-describedby=\"caption-attachment-734\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/feature-norms.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-734 size-medium\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/feature-norms-300x200.png\" alt=\"Examples of predicted feature norms using the visual features.\" width=\"300\" height=\"200\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/feature-norms-300x200.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/feature-norms-150x100.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/feature-norms-768x512.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/feature-norms.png 828w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-734\" class=\"wp-caption-text\">Examples of predicted feature norms using the visual features.<\/figcaption><\/figure>\n<p><strong>6.\u00a0Adversarial examples in the physical world<\/strong><br \/>\nAlexey Kurakin, Ian J. Goodfellow, Samy Bengio. Google, OpenAI. ArXiv.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1607.02533\">https:\/\/arxiv.org\/abs\/1607.02533<\/a><\/p>\n<p>Adversarial examples are datapoints that are designed to fool a classifier. For example, we can take an image that is classified correctly using a neural network, then backprop through the model to find which changes we need to make in order for it to be classified as something else. And these changes can be quite small, such that a human would hardly notice a difference.<\/p>\n<figure id=\"attachment_737\" aria-describedby=\"caption-attachment-737\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/adversarial.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-737 size-large\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/adversarial-1024x292.png\" alt=\"Examples of adversarial image\" width=\"1024\" height=\"292\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/adversarial-1024x292.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/adversarial-150x43.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/adversarial-300x86.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/adversarial-768x219.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"caption-attachment-737\" class=\"wp-caption-text\">Examples of adversarial images.<\/figcaption><\/figure>\n<p>In this paper, they show that much of this property holds even when the images are fed into the classifier from the real world &#8211; after being photographed with a cell phone camera. While the accuracy goes from 85.3% to 36.3% when adversarial modifications are applied on the source images, the performance still drops from 79.8% to 36.4% when the images are photographed. They also propose two modifications to the process of generating adversarial images \u00a0&#8211; making it into a more gradual iterative process, and optimising for a specific adversarial class.<\/p>\n<p><strong>7.\u00a0Extracting token-level signals of syntactic processing from fMRI &#8211; with an application to POS induction<\/strong><br \/>\nJoachim\u00a0Bingel,\u00a0Maria Barrett,\u00a0Anders\u00a0S\u00f8gaard. Copenhagen. ACL 2016.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/P\/P16\/P16-1071.pdf\">http:\/\/aclweb.org\/anthology\/P\/P16\/P16-1071.pdf<\/a><\/p>\n<p>They incorporate fMRI features into POS tagging, under the assumption that reading semantically\/functionally different words will activate the brain in different ways. For this they use a dataset of fMRI recordings, where the subjects were reading a chapter of Harry Potter. The main issue is that fMRI has very low temporal resolution &#8211; there is only one fMRI reading per 4 tokens, and\u00a0in general it takes around 4-14 seconds for something to show up in fMRI. Nevertheless, they construct token-level vectors by using a Gaussian weighted average, integrate them into an unsupervised POS tagger, and show that it is able to improve performance.<\/p>\n<figure id=\"attachment_741\" aria-describedby=\"caption-attachment-741\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/brain.png\" rel=\"attachment wp-att-741\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-741 size-medium\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/brain-300x218.png\" alt=\"Neural activity by brain region, from Wehbe et al. (2014).\" width=\"300\" height=\"218\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/brain-300x218.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/brain-150x109.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/brain.png 740w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-741\" class=\"wp-caption-text\">Neural activity by brain region, from Wehbe et al. (2014).<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p><strong>8. Joint Extraction of Events and Entities within a Document Context<\/strong><br \/>\nBishan Yang, Tom\u00a0Mitchell. Carnegie Mellon. NAACL 2016.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/N\/N16\/N16-1033.pdf\">http:\/\/aclweb.org\/anthology\/N\/N16\/N16-1033.pdf<\/a><\/p>\n<p>They propose a joint model for 1) identifying event keywords in a text, 2) identifying entities, and 3) identifying the connections between these events and entities. They also do this\u00a0across different sentences, jointly for the whole text.<\/p>\n<figure id=\"attachment_743\" aria-describedby=\"caption-attachment-743\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/joint_event_naacl16.png\" rel=\"attachment wp-att-743\"><img loading=\"lazy\" decoding=\"async\" class=\"size-large wp-image-743\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/joint_event_naacl16-1024x82.png\" alt=\"Example of the entity and event annotation that the system is modelling.\" width=\"1024\" height=\"82\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/joint_event_naacl16-1024x82.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/joint_event_naacl16-150x12.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/joint_event_naacl16-300x24.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/joint_event_naacl16-768x61.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/joint_event_naacl16.png 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"caption-attachment-743\" class=\"wp-caption-text\">Example of the entity and event annotation that the system is modelling.<\/figcaption><\/figure>\n<p>The entity detection part is done with a CRF; the structure of an event is learned with a probabilistic graphical model; information is integrated\u00a0from surrounding sentences using a Stanford coreference system; and these are all tied together across the whole document using Integer Linear Programming.<\/p>\n<p><strong>9.\u00a0Candidate re-ranking for SMT-based grammatical error correction<\/strong><br \/>\nZheng Yuan, Ted\u00a0Briscoe,\u00a0Mariano Felice. Cambridge. BEA Workshop 2016.<br \/>\n<a href=\"https:\/\/www.aclweb.org\/anthology\/W\/W16\/W16-0530.pdf\">https:\/\/www.aclweb.org\/anthology\/W\/W16\/W16-0530.pdf<\/a><\/p>\n<p>They improve\u00a0an existing error correction system by re-ranking its predictions. The basic approach uses machine translation to perform error correction on learner texts &#8211; the incorrect text is essentially translated into correct text. Here, they include a ranking SVM to score and reorder the n-best lists from the translation model.<\/p>\n<p>The reranking features include various internal scores from the translation model, the rank in the original ordering, language model probabilities trained on large corpora, language model scores based on only the n-best list, word-level translation probabilities, and sentence length features. They show improvement on two error correction datasets.<\/p>\n<figure id=\"attachment_745\" aria-describedby=\"caption-attachment-745\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/W16-0530.png\" rel=\"attachment wp-att-745\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-745 size-medium\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/W16-0530-300x129.png\" alt=\"Example output from the models.\" width=\"300\" height=\"129\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/W16-0530-300x129.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/W16-0530-150x65.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/W16-0530-768x331.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/W16-0530-1024x441.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/09\/W16-0530.png 1200w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-745\" class=\"wp-caption-text\">Example output from the models.<\/figcaption><\/figure>\n<p><strong>10.\u00a0Variational Neural Machine Translation<\/strong><br \/>\nBiao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, Min Zhang.\u00a0Soochow University,\u00a0Xiamen University. ArXiv.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1605.07869\">https:\/\/arxiv.org\/abs\/1605.07869<\/a><\/p>\n<p>They start with the neural machine translation model using alignment, by Bahdanau et al. (2014), and add an extra variational component.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/vnmt.png\" rel=\"attachment wp-att-751\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-751\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/vnmt-1024x410.png\" alt=\"vnmt\" width=\"1024\" height=\"410\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/vnmt-1024x410.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/vnmt-150x60.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/vnmt-300x120.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/vnmt-768x308.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/vnmt.png 1250w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p>The authors use two neural variational components to model a distribution over latent variables z that captures the semantics of a sentence being translated. First, they model the posterior probability of z, conditioned on both input and output. Then they also model the prior of z, conditioned only on the input. During training, these two distributions are optimised to be similar using Kullback-Leibler distance, and during testing the prior is used. They report improvements on Chinese-English and English-German translation, compared to\u00a0using the original encoder-decoder NMT framework.<\/p>\n<p><strong>11.\u00a0Numerically Grounded Language Models for Semantic Error Correction<\/strong><br \/>\nGeorgios P. Spithourakis,\u00a0Isabelle Augenstein,\u00a0Sebastian Riedel. UCL. EMNLP 2016.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1608.04147\">https:\/\/arxiv.org\/abs\/1608.04147<\/a><\/p>\n<p>They create an LSTM neural language model that 1) has better handling of numerical values, and 2) is conditioned on a knowledge base.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/numerical_grounding.png\" rel=\"attachment wp-att-756\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-756 size-medium\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/numerical_grounding-300x257.png\" alt=\"numerical_grounding\" width=\"300\" height=\"257\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/numerical_grounding-300x257.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/numerical_grounding-150x128.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/numerical_grounding.png 582w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>First the the numerical value each token is given as an additional signal to the network at each time step. While we normally represent token &#8220;25&#8221; as a normal word embedding, we now also have an extra feature with numerical value float(25).\u00a0Second, they condition the language model on text in a knowledge base. All the information in the KB is converted to a string, passed through an LSTM and then used to condition the main LM.<\/p>\n<p>They evaluate on a dataset of 16,003 clinical records which come paired with small KB tuples of 20 possible attributes. The numerical grounding helps quite a bit, and the best results are obtained when the KB conditioning is also added.<\/p>\n<p><strong>12.\u00a0Black Holes and White Rabbits : Metaphor Identification with Visual Features<\/strong><br \/>\nEkaterina Shutova,\u00a0Douwe Kiela,\u00a0Jean Maillard. Cambridge. NAACL 2016.<br \/>\n<a href=\"https:\/\/www.aclweb.org\/anthology\/N\/N16\/N16-1020.pdf\">https:\/\/www.aclweb.org\/anthology\/N\/N16\/N16-1020.pdf<\/a><\/p>\n<p>They build a system for detecting metaphors (&#8220;blind alley&#8221;, &#8220;honest meal&#8221;, etc) from literal word pairs.<\/p>\n<figure id=\"attachment_760\" aria-describedby=\"caption-attachment-760\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1020.png\" rel=\"attachment wp-att-760\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-760 size-medium\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1020-300x251.png\" alt=\"Annotated metaphor examples from Tsvetkov et al. (2014), used in this work.\" width=\"300\" height=\"251\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1020-300x251.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1020-150x126.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1020.png 537w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-760\" class=\"wp-caption-text\">Annotated metaphor examples from Tsvetkov et al. (2014), used in this work.<\/figcaption><\/figure>\n<p>The basic system uses word embedding similarity &#8211; cosine between the word embeddings. Then they explore variations using phrase embeddings, cos(phrase-word2, word2), which is similar to the operations with word regularities by Mikolov.<\/p>\n<p>Finally, they create vector representations for words and phrases using visual information. The words are used as queries in Google Image Search, and the returned images are passed through an image detection network in order to obtain vector representations. The best final system performs the task separately using linguistic and visual vectors, and then combines the resulting scores.<\/p>\n<p><strong>13.\u00a0Counter-fitting Word Vectors to Linguistic Constraints<\/strong><br \/>\nNikola Mrk\u0161i\u0107,\u00a0Diarmuid \u00d3 S\u00e9aghdha,\u00a0Blaise Thomson,\u00a0Milica Ga\u0161i\u0107,\u00a0Lina Rojas-Barahona,\u00a0Pei-Hao Su,\u00a0David Vandyke,\u00a0Tsung-Hsien Wen,\u00a0Steve Young. Cambridge, Apple. NAACL 2016.<br \/>\n<a href=\"http:\/\/www.aclweb.org\/anthology\/N16-1018\">http:\/\/www.aclweb.org\/anthology\/N16-1018<\/a><\/p>\n<p>They describe a method for augmenting existing word embeddings with knowledge of semantic constraints. The idea is similar to retrofitting by Faruqui et al. (2015), but using additional constraints and a different optimisation function.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/1603.00892v1.png\" rel=\"attachment wp-att-763\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-763 size-medium\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/1603.00892v1-300x192.png\" alt=\"1603-00892v1\" width=\"300\" height=\"192\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/1603.00892v1-300x192.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/1603.00892v1-150x96.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/1603.00892v1.png 600w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><br \/>\nExisting word vectors are further optimised to 1) have high similarity for known synonyms, 2) have low similarity for known antonyms, and 3) have high similarity to words that were highly similar in the original space. They evaluate on SimLex-999, showing state-of-the-art performance. Also, they use the method to improve a dialogue tracking system.<\/p>\n<p><strong>14.\u00a0Bidirectional RNN for Medical Event Detection in Electronic Health Records<\/strong><br \/>\nAbhyuday N. Jagannatha,\u00a0Hong Yu. University of Massachusetts. NAACL 2016.<br \/>\n<a href=\"https:\/\/www.aclweb.org\/anthology\/N\/N16\/N16-1056.pdf\">https:\/\/www.aclweb.org\/anthology\/N\/N16\/N16-1056.pdf<\/a><\/p>\n<p>The authors have a dataset of 780 electronic health records and they use it to detect various medical events such as adverse drug events, drug dosage, etc. The task is done by assigning a label to each word in the document.<\/p>\n<figure id=\"attachment_765\" aria-describedby=\"caption-attachment-765\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1056.png\" rel=\"attachment wp-att-765\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-765\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1056-300x220.png\" alt=\"Annotation statistics for the corpus of health records.\" width=\"300\" height=\"220\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1056-300x220.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1056-150x110.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/N16-1056.png 479w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-765\" class=\"wp-caption-text\">Annotation statistics for the corpus of health records.<\/figcaption><\/figure>\n<p>They look at CRFs, LSTMs and GRUs. Both LSTMs and GRUs outperform the CRF, but the best performance is achieved by a GRU trained on whole documents.<\/p>\n<p><strong>15.\u00a0Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives<\/strong><br \/>\nRoy Schwartz,\u00a0Roi Reichart,\u00a0Ari Rappoport.\u00a0The Hebrew Universit, IIT. NAACL 2016.<br \/>\n<a href=\"http:\/\/www.aclweb.org\/anthology\/N16-1060\">http:\/\/www.aclweb.org\/anthology\/N16-1060<\/a><\/p>\n<p>They train word2vec skip-gram embeddings using coordinations as context. They use 11 manual patterns to extract coordinations (eg &#8220;X and Y&#8221;, &#8220;either X or Y&#8221;, etc). From &#8220;boats or planes&#8221;, &#8220;boats&#8221; will be a context of &#8220;planes&#8221; and &#8220;planes&#8221; will be a context of &#8220;boats&#8221;.<\/p>\n<p>They evaluate on SimLex-999 and find that this performs badly on nouns. However, it beats normal skip-gram and dependency-based skip-gram on verbs and adjectives.<\/p>\n<p><strong>16.\u00a0Comparing Data Sources and Architectures for Deep Visual Representation Learning in Semantics<\/strong><br \/>\nDouwe Kiela,\u00a0Anita L. Ver\u0151,\u00a0Stephen Clark. Cambridge. EMNLP 2016.<br \/>\n<a href=\"https:\/\/aclweb.org\/anthology\/D\/D16\/D16-1043.pdf\">https:\/\/aclweb.org\/anthology\/D\/D16\/D16-1043.pdf<br \/>\n<\/a><\/p>\n<p>The authors compare different image recognition models and image data sources for multimodal word representation learning.<\/p>\n<figure id=\"attachment_771\" aria-describedby=\"caption-attachment-771\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/D16-1043.png\" rel=\"attachment wp-att-771\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-771 size-medium\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/D16-1043-300x101.png\" alt=\"d16-1043\" width=\"300\" height=\"101\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/D16-1043-300x101.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/D16-1043-150x51.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/10\/D16-1043.png 568w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-771\" class=\"wp-caption-text\">Image recognition models used for vector generation<\/figcaption><\/figure>\n<p>Experiments are performed on SimLex-999 (similarity) and MEN (relatedness). The performance of different models (AlexNet, GoogLeNet, VGGNet) is found to be quite similar, with VGGNet performing slightly better at the cost of requiring more computation. Using search engines for image sources gives good coverage; ImageNet performs quite well with VGGNet; ESP Game dataset gave the lowest performance. Combining visual and linguistic vectors was found to be beneficial on both English and Italian.<\/p>\n<p><strong>17.\u00a0Named Entity Recognition for Novel Types by Transfer Learning<\/strong><br \/>\nLizhen Qu,\u00a0Gabriela Ferraro,\u00a0Liyuan Zhou,\u00a0Weiwei Hou,\u00a0Timothy Baldwin. Melbourne. EMNLP 2016.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/D16-1087\">http:\/\/aclweb.org\/anthology\/D16-1087<\/a><\/p>\n<p>The authors tackle the problem of domain adaptation for NER, where the label set of the target domain is different from the source domain.<\/p>\n<p>They first train a CRF model on the source domain. Next, they train a LR classifier to predict labels in the target domain, based on predicted label scores from the model. Finally, the weights from the classifier are used to initialise another CRF model, which is then fine-tuned on the target domain data.<\/p>\n<figure id=\"attachment_774\" aria-describedby=\"caption-attachment-774\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/ner_adaptation.png\" rel=\"attachment wp-att-774\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-774 size-large\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/ner_adaptation-1024x267.png\" alt=\"Performance of the proposed model (TransInit) compared to baselines\" width=\"1024\" height=\"267\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/ner_adaptation-1024x267.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/ner_adaptation-150x39.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/ner_adaptation-300x78.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/ner_adaptation-768x201.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/ner_adaptation.png 1896w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"caption-attachment-774\" class=\"wp-caption-text\">Performance of the proposed model (TransInit) compared to baselines<\/figcaption><\/figure>\n<p><strong>18.\u00a0Hybrid computing using a neural network with dynamic external memory<\/strong><br \/>\nAlex Graves,\u00a0Greg Wayne,\u00a0Malcolm Reynolds et al. DeepMind. Nature.<br \/>\n<a href=\"http:\/\/www.nature.com\/nature\/journal\/v538\/n7626\/full\/nature20101.html\">http:\/\/www.nature.com\/nature\/journal\/v538\/n7626\/full\/nature20101.html<\/a><\/p>\n<p>The DeepMind guys present an extension to the Neural Turing Machine architecture.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/nature20101.png\" rel=\"attachment wp-att-777\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-777\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/nature20101-300x157.png\" alt=\"nature20101\" width=\"300\" height=\"157\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/nature20101-300x157.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/nature20101-150x79.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/nature20101-768x402.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/nature20101.png 888w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>They call it a Differentiable Neural Computer (DNC) and it uses 1) an attention mechanism to access information in a matrix that acts as a memory, 2) an attention mechanism to save information to that memory, and 3) a transition matrix that stores information about the order in which rows in the memory are modified, in order to better handle sequential information. They test on the bAbI question answering dataset, a graph inference task, and on solving a puzzle of arranging blocks.<\/p>\n<p><strong>19.\u00a0A Neural Approach to Automated Essay Scoring<\/strong><br \/>\nKaveh Taghipour,\u00a0Hwee Tou Ng. Singapore. EMNLP 2016.<br \/>\n<a href=\"https:\/\/aclweb.org\/anthology\/D\/D16\/D16-1193.pdf\">https:\/\/aclweb.org\/anthology\/D\/D16\/D16-1193.pdf<\/a><\/p>\n<p>The authors construct a neural network for automated essay scoring.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1193.png\" rel=\"attachment wp-att-779\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-779\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1193-300x127.png\" alt=\"d16-1193\" width=\"300\" height=\"127\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1193-300x127.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1193-150x63.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1193-768x324.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1193.png 940w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Convolution window of 3 is passed over the text, which is used as input to an LSTM. The output of the LSTM is averaged over all timesteps and then a single value in the range of [0,1] is predicted as a scaled-down score for the essay. They evaluate by measuring quadratic weighted Kappa on the Kaggle essay scoring dataset.<\/p>\n<p><strong>20.\u00a0Globally Coherent Text Generation with Neural Checklist Models<\/strong><br \/>\nChloe Kiddon,\u00a0Luke Zettlemoyer,\u00a0Yejin Choi. Washington. EMNLP 2016.<br \/>\n<a href=\"https:\/\/aclweb.org\/anthology\/D\/D16\/D16-1032.pdf\">https:\/\/aclweb.org\/anthology\/D\/D16\/D16-1032.pdf<\/a><\/p>\n<p>They describe a neural model for text generation, which keeps track of a checklist of items that need to be mentioned in the text.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1032.png\" rel=\"attachment wp-att-783\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-783\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1032-300x206.png\" alt=\"\" width=\"300\" height=\"206\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1032-300x206.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1032-150x103.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1032.png 633w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><br \/>\nThe basic system is an encoder-decoder GRU model for text generation. On top of that, the model uses attention over items that need to be mentioned and items that have already been mentioned, both of which are encoded as vectors. An additional cost objective encourages the checklist to be\u00a0filled by the end of the text. Evaluation is performed on recipe and dialogue generation.<\/p>\n<p><strong>21.\u00a0Automatic Features for Essay Scoring \u2013 An Empirical Study<\/strong><br \/>\nFei Dong,\u00a0Yue Zhang. Singapore. EMNLP 2016.<br \/>\n<a href=\"https:\/\/www.aclweb.org\/anthology\/D\/D16\/D16-1115.pdf\">https:\/\/www.aclweb.org\/anthology\/D\/D16\/D16-1115.pdf<\/a><\/p>\n<p>The authors investigate convolutional networks for essay scoring. They use a two-level convolution &#8211; first over words and then over sentences. Evaluation is performed on the Kaggle ASAP dataset, training separate models on individual topics, and also reporting some cross-topic results.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1115.png\" rel=\"attachment wp-att-786\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-786\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1115-300x102.png\" alt=\"\" width=\"300\" height=\"102\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1115-300x102.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1115-150x51.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1115-768x262.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1115-1024x349.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2016\/11\/D16-1115.png 1584w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><strong>22.\u00a0Learning Deep Structure-Preserving Image-Text Embeddings<\/strong><br \/>\nLiwei Wang,\u00a0Yin Li,\u00a0Svetlana Lazebnik.\u00a0University of Illinois, Georgia Tech. CVPR 2016.<br \/>\n<a href=\"http:\/\/www.cv-foundation.org\/openaccess\/content_cvpr_2016\/papers\/Wang_Learning_Deep_Structure-Preserving_CVPR_2016_paper.pdf\">http:\/\/www.cv-foundation.org\/&#8230;CVPR_2016_paper.pdf<\/a><\/p>\n<p>The authors present a neural model that maps images and sentences into the same space, in order to perform cross-modal retrieval &#8211; find images based on a sentence or find sentences based on an image.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/01\/1511.06078v2-1.png\" rel=\"attachment wp-att-969\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-971 size-large\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/01\/1511.06078v2-1-1024x580.png\" alt=\"\" width=\"1024\" height=\"580\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/01\/1511.06078v2-1-1024x580.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/01\/1511.06078v2-1-150x85.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/01\/1511.06078v2-1-300x170.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/01\/1511.06078v2-1-768x435.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/01\/1511.06078v2-1.png 1155w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p>The image vectors come from a pre-trained VGG image detection network. The sentence vectors are constructed using Fisher vectors, but they also explore simpler options, such as mean word2vec vectors and tfidf. Both are then mapped through nonlinearities and normalised, and Euclidean distance is used to measure vector similarity. They also investigate the task of mapping noun phrases from the image caption to specific areas of the image.<\/p>\n<p><strong>23.\u00a0Understanding deep learning requires rethinking generalization<\/strong><br \/>\nChiyuan Zhang,\u00a0Samy Bengio,\u00a0Moritz Hardt,\u00a0Benjamin Recht,\u00a0Oriol Vinyals. Google Brain, DeepMind. ICLR 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1611.03530.pdf\">https:\/\/arxiv.org\/pdf\/1611.03530.pdf<\/a><\/p>\n<p>The authors investigate the generalisation properties of several well-known image recognition networks.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.03530.png\" rel=\"attachment wp-att-974\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-974\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.03530-1024x367.png\" alt=\"1611.03530\" width=\"1024\" height=\"367\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.03530-1024x367.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.03530-150x54.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.03530-300x108.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.03530-768x276.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.03530.png 1296w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p>They show that these networks are able to overfit to the training set with 100% accuracy even if the labels on the images are random, or if the pixels are randomly generated. Regularisation, such as weight decay and dropout, doesn&#8217;t stop overfitting as much as expected, still resulting in ~90% accuracy on random training data. They then argue that these models likely make use of massive memorization, in combination with learning low-complexity patterns, in order to perform well on these tasks.<\/p>\n<p><strong>24.\u00a0Reinforcement Learning with Unsupervised Auxiliary Tasks<\/strong><br \/>\nMax Jaderberg,\u00a0Volodymyr Mnih, Wojciech Marian Czarnecki Tom Schaul, Joel Z Leibo, David Silver &amp; Koray Kavukcuoglu. DeepMind. ICLR 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1611.05397\">https:\/\/arxiv.org\/abs\/1611.05397<\/a><\/p>\n<p>They describe a version of reinforcement learning where the system also learns to solve some auxiliary tasks, which helps with the main objective.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.05397.png\" rel=\"attachment wp-att-976\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-976 size-medium\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.05397-300x186.png\" alt=\"1611.05397\" width=\"300\" height=\"186\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.05397-300x186.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.05397-150x93.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.05397-768x476.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/1611.05397.png 993w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>In addition to normal Q-learning, which predicts the downstream reward, they have the system learning 1) a separate policy for maximally changing the pixels on the screen, 2) maximally activating units in a hidden layer, and 3) predicting the reward at the next step, using biased sampling. They show that this improves learning speed and performance on Atari games and Labyrinth (a Quake-like 3D game).<\/p>\n<p><strong>25.\u00a0Modelling metaphor with attribute-based semantics<\/strong><br \/>\nLuana Bulat,\u00a0Stephen Clark,\u00a0Ekaterina Shutova. Cambridge. EACL 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/E\/E17\/E17-2084.pdf\">http:\/\/aclweb.org\/anthology\/E\/E17\/E17-2084.pdf<\/a><\/p>\n<p>They propose using attribute-based vectors for detecting metaphorical word pairs.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/eacl2017.png\" rel=\"attachment wp-att-979\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-979\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/eacl2017.png\" alt=\"eacl2017\" width=\"1013\" height=\"435\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/eacl2017.png 1013w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/eacl2017-150x64.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/eacl2017-300x129.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/02\/eacl2017-768x330.png 768w\" sizes=\"auto, (max-width: 1013px) 100vw, 1013px\" \/><\/a><br \/>\nTraditional embeddings (word2vec and count-based) are mapped to attribute vectors, using a supervised system trained on McRae norms. These vectors for a word pair are then given as input to an SVM classifier and trained to detect metaphorical (black humour) vs literal (black dress) word pairs. They show that using the attribute vectors gives higher F score over using the original vector space.<\/p>\n<p><strong>26.\u00a0Enriching Word Vectors with Subword Information<\/strong><br \/>\nPiotr Bojanowski,\u00a0Edouard Grave,\u00a0Armand Joulin,\u00a0Tomas Mikolov. Facebook. ArXiv 2016.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1607.04606\">https:\/\/arxiv.org\/abs\/1607.04606<\/a><\/p>\n<p>They extend skip-grams for word embeddings to use character n-grams. Each word is represented as a bag of character n-grams, 3-6 characters long, plus the word itself. Each of these has their own embedding which gets optimised to predict the surrounding context words using skip-gram optimisation. They evaluate on word similarity and analogy tasks, in different languages, and show improvement on most benchmarks.<\/p>\n<p><strong>27.\u00a0Learning to Compose Words into Sentences with Reinforcement Learning<\/strong><br \/>\nDani Yogatama,\u00a0Phil Blunsom,\u00a0Chris Dyer,\u00a0Edward Grefenstette,\u00a0Wang Ling. DeepMind. ICLR 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1611.09100\">https:\/\/arxiv.org\/abs\/1611.09100<\/a><\/p>\n<p>The aim is to have the system discover a method for parsing that would benefit a downstream task.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/03\/1611.09100.png\" rel=\"attachment wp-att-981\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-981\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/03\/1611.09100.png\" alt=\"1611.09100\" width=\"813\" height=\"258\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/03\/1611.09100.png 813w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/03\/1611.09100-150x48.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/03\/1611.09100-300x95.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2017\/03\/1611.09100-768x244.png 768w\" sizes=\"auto, (max-width: 813px) 100vw, 813px\" \/><\/a><\/p>\n<p>They construct a neural shift-reduce parser &#8211; as it&#8217;s moving through the sentence, it can either shift the word to the stack or reduce two words on top of the stack by combining them. A Tree-LSTM is used for composing the nodes recursively. The whole system\u00a0is trained using reinforcement learning, based on an objective function of the downstream task. The model learns parse rules that are beneficial for that specific task, either without any prior knowledge of parsing or by initially training it to act as a regular parser.<\/p>\n<p><strong>28. Identifying beneficial task relations for multi-task learning in deep neural networks<\/strong><br \/>\nJoachim Bingel, Anders S\u00f8gaard. Copenhagen. EACL 2017.<br \/>\n<a href=\"http:\/\/www.aclweb.org\/anthology\/E17-2026\">http:\/\/www.aclweb.org\/anthology\/E17-2026<\/a><\/p>\n<p>The authors investigate the benefit of different task combinations when performing multi-task learning.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/multitask_tagging.png\" rel=\"attachment wp-att-1069\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/multitask_tagging-300x288.png\" alt=\"multitask_tagging\" width=\"300\" height=\"288\" class=\"aligncenter size-medium wp-image-1069\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/multitask_tagging-300x288.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/multitask_tagging-150x144.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/multitask_tagging.png 666w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>They experiment with all possible pairs of 10 sequence labeling datasets, switching between the datasets during training. They find that multi-task learning helps more when the main task quickly plateaus while the auxiliary task does not, likely helping the model out of local minima.<br \/>\nThere does not seem to be any auxiliary task that would help on all main tasks, but chunking and semantic tagging seem to perform best.<\/p>\n<p><strong>29. Literal and Metaphorical Senses in Compositional Distributional Semantic Models<\/strong><br \/>\nE. Dar\u00edo Guti\u00e9rrez, Ekaterina Shutova, Tyler Marghetis, Benjamin K. Bergen. UCSD, Cambridge, Bloomington. ACL 2016.<br \/>\n<a href=\"http:\/\/www.aclweb.org\/anthology\/P16-1018\">http:\/\/www.aclweb.org\/anthology\/P16-1018<\/a><\/p>\n<p>The paper investigates compositional semantic models specialised for metaphors.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/metaphor_composition.png\" rel=\"attachment wp-att-1071\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/metaphor_composition-300x195.png\" alt=\"metaphor_composition\" width=\"300\" height=\"195\" class=\"aligncenter size-medium wp-image-1071\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/metaphor_composition-300x195.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/metaphor_composition-150x98.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/metaphor_composition-768x500.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/metaphor_composition.png 811w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>They construct a dataset of 8592 adjective-noun phrases, covering 23 different adjectives, annotated for being metaphorical or literal. They then train compositional models to predict the phrase vector based on the noun vector, as a linear combination with an adjective-specific weight matrix. They show that it&#8217;s better to learn separate adjective matrices for literal and metaphorical uses of each adjective, even though the amount of training data is smaller.<\/p>\n<p><strong>30. Data Noising as Smoothing in Neural Network Language Models<\/strong><br \/>\nZiang Xie, Sida I. Wang, Jiwei Li, Daniel Levy, Aiming Nie, Daniel Jurafsky, Andrew Y. Ng. Stanford. ICLR 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1703.02573\">https:\/\/arxiv.org\/abs\/1703.02573<\/a><\/p>\n<p>The paper investigates better noising techniques for RNN language models.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/lm_noising.png\" rel=\"attachment wp-att-1073\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/lm_noising-300x122.png\" alt=\"lm_noising\" width=\"300\" height=\"122\" class=\"aligncenter size-medium wp-image-1073\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/lm_noising-300x122.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/lm_noising-150x61.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/lm_noising-768x311.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/lm_noising-1024x415.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/lm_noising.png 1037w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>A noising technique from previous work would be to randomly replace words in the context or replace them with a blank token. Here they investigate ways of choosing better which words to replace and choosing the replacements from a better distribution, inspired by methods in n-gram smoothing. They show improvement on language modeling (PTB and text8) and machine translation (English-German).<\/p>\n<p><strong>31. Neural Belief Tracker: Data-Driven Dialogue State Tracking<\/strong><br \/>\nNikola Mrk\u0161i\u0107, Diarmuid \u00d3 S\u00e9aghdha, Tsung-Hsien Wen, Blaise Thomson, Steve Young. Cambridge, Apple. ACL 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1163.pdf\">http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1163.pdf<\/a><\/p>\n<p>They propose neural models for dialogue state tracking, making a binary decision for each possible slot-value pair, based on the latest context from the user and the system. The context utterances and the slot-value option are encoded into vectors, either by summing word representations or using a convnet. These vectors are then further combined to produce a binary output. The systems are evaluated on two dialogue datasets and show improvement over baselines that use hand-constructed lexicons.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue_state_tracking.png\" rel=\"attachment wp-att-1075\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue_state_tracking-300x148.png\" alt=\"dialogue_state_tracking\" width=\"300\" height=\"148\" class=\"aligncenter size-medium wp-image-1075\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue_state_tracking-300x148.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue_state_tracking-150x74.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue_state_tracking-768x378.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/dialogue_state_tracking.png 894w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>32. Neural Architectures for Fine-grained Entity Type Classification<\/strong><br \/>\nSonse Shimaoka, Pontus Stenetorp, Kentaro Inui, Sebastian Riedel. Tohoku, UCL. EACL 2017.<br \/>\n<a href=\"https:\/\/www.aclweb.org\/anthology\/E17-1119\">https:\/\/www.aclweb.org\/anthology\/E17-1119<\/a><\/p>\n<p>They propose a neural architecture for assigning fine-grained labels to detected entity types. The model combines bidirectional LSTMs, attention over the context sequence, hand-engineered features, and the label hierarchy. They evaluate on Figer and OntoNotes datasets, showing improvements from each of the extensions.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/neural_arch_for_ne.png\" rel=\"attachment wp-att-1077\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/neural_arch_for_ne-300x243.png\" alt=\"neural_arch_for_ne\" width=\"300\" height=\"243\" class=\"aligncenter size-medium wp-image-1077\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/neural_arch_for_ne-300x243.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/neural_arch_for_ne-150x121.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/neural_arch_for_ne.png 712w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>33. Recurrent Additive Networks<\/strong><br \/>\nKenton Lee, Omer Levy, Luke Zettlemoyer. Washington, Allen Institute. ArXiv 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1705.07393\">https:\/\/arxiv.org\/abs\/1705.07393<\/a><\/p>\n<p>The authors propose a simplified version of LSTMs. Some non-linearities and weighted components are removed, in order to arrive at the recurrent additive network (RAN). The model is evaluated on 3 language modeling datasets: PTB, the billion word benchmark, and character-level Text8.<\/p>\n<p><strong>34. A Sensitivity Analysis of (and Practitioners&#8217; Guide to) Convolutional Neural Networks for Sentence Classification<\/strong><br \/>\nYe Zhang, Byron Wallace. UT Austin. IJCNLP 2017.<br \/>\n<a href=\"https:\/\/www.aclweb.org\/anthology\/I\/I17\/I17-1026.pdf\">https:\/\/www.aclweb.org\/anthology\/I\/I17\/I17-1026.pdf<\/a><\/p>\n<p>The authors perform a hyperparameter search for a single-layer CNN on 9 different sentence classification datasets.<br \/>\nThey find that the optimal embedding initialisation, filter size and number of feature maps depends on the dataset and should be chosen through a search; ReLU and tanh are the best activation functions; 1-max pooling is the pooling method; dropout may help when the number of feature maps gets large.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sensitivity_analysis.png\" rel=\"attachment wp-att-1078\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sensitivity_analysis-283x300.png\" alt=\"sensitivity_analysis\" width=\"283\" height=\"300\" class=\"aligncenter size-medium wp-image-1078\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sensitivity_analysis-283x300.png 283w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sensitivity_analysis-142x150.png 142w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sensitivity_analysis.png 598w\" sizes=\"auto, (max-width: 283px) 100vw, 283px\" \/><\/a><\/p>\n<p><strong>35. On Using Monolingual Corpora in Neural Machine Translation<\/strong><br \/>\nCaglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio. Montreal, METech, Maine. Computer Speech and Language 2016.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1503.03535\">https:\/\/arxiv.org\/abs\/1503.03535<\/a><\/p>\n<p>The authors extend a seq2seq model for MT with a language model. They first pre-train a seq2seq model and a neural language model, then train a separate feedforward component that takes the hidden states from both and combines them together to make a prediction. They compare to simply combining the output probabilities from both models (shallow fusion) and show improvement on different MT datasets.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/monolingual_corpora_for_mt.png\" rel=\"attachment wp-att-1079\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/monolingual_corpora_for_mt-300x178.png\" alt=\"monolingual_corpora_for_mt\" width=\"300\" height=\"178\" class=\"aligncenter size-medium wp-image-1079\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/monolingual_corpora_for_mt-300x178.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/monolingual_corpora_for_mt-150x89.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/monolingual_corpora_for_mt.png 764w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>36. Semi-supervised sequence tagging with bidirectional language models<\/strong><br \/>\nMatthew E. Peters, Waleed Ammar, Chandra Bhagavatula, Russell Power. Allen Institute. ACL 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1161.pdf\">http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1161.pdf<\/a><\/p>\n<p>The paper proposes integrating a pre-trained language model into a sequence labeling model. The baseline model for sequence labeling is a two-layer LSTM\/GRU. They concatenate the hidden states from pre-trained language models onto the output of the first LSTM layer. This provides an improvement on NER and chunking tasks.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-sequence-tagging-with-bidirectional-language-models.png\" rel=\"attachment wp-att-1082\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-sequence-tagging-with-bidirectional-language-models-300x156.png\" alt=\"Semi-supervised sequence tagging with bidirectional language models\" width=\"300\" height=\"156\" class=\"aligncenter size-medium wp-image-1082\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-sequence-tagging-with-bidirectional-language-models-300x156.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-sequence-tagging-with-bidirectional-language-models-150x78.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-sequence-tagging-with-bidirectional-language-models-768x400.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-sequence-tagging-with-bidirectional-language-models.png 920w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>37.\u00a0Weakly Supervised Part-of-speech Tagging Using Eye-tracking Data<br \/>\n<\/strong>Maria Barrett,\u00a0Joachim\u00a0Bingel,\u00a0Frank Keller,\u00a0Anders S\u00f8gaard. Copenhagen. ACL 2016.<br \/>\n<a href=\"https:\/\/www.aclweb.org\/anthology\/P\/P16\/P16-2094.pdf\">https:\/\/www.aclweb.org\/anthology\/P\/P16\/P16-2094.pdf<\/a><\/p>\n<p>The paper explores\u00a0the usefulness of eye tracking for the task of POS tagging.\u00a0The assumption is that readers skip quickly over closed class words, and fixate longer on rare on ambiguous words.<\/p>\n<p>The experiments are performed on unsupervised POS tagging &#8211; a second-order HMM uses constraints on possible tags for each word (based on a dictionary), but no explicit annotated data is required. They show that including the eye tracking features improves performance by quite a bit. Surprisingly, it seems to be better to average eye tracking features over all training tokens of the same type, as opposed to using using the data for each individual token, which means eye tracking is only used during the training stage.<\/p>\n<p><strong>38. Massive Exploration of Neural Machine Translation Architectures<\/strong><br \/>\nDenny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le. Google Brain. EMNLP 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/D17-1151\">http:\/\/aclweb.org\/anthology\/D17-1151<\/a><\/p>\n<p>Investigates different parameter choices for encoder-decoder NMT models. They find that LSTM is better than GRU, 2 bidirectional layers is enough, additive attention is the best, and a well-tuned beam search is important. They achieve good results on the WMT15 English->German task and release the code.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Massive-Exploration-of-Neural-Machine-Translation-Architectures.png\" rel=\"attachment wp-att-1085\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Massive-Exploration-of-Neural-Machine-Translation-Architectures-300x158.png\" alt=\"Massive Exploration of Neural Machine Translation Architectures\" width=\"300\" height=\"158\" class=\"aligncenter size-medium wp-image-1085\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Massive-Exploration-of-Neural-Machine-Translation-Architectures-300x158.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Massive-Exploration-of-Neural-Machine-Translation-Architectures-150x79.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Massive-Exploration-of-Neural-Machine-Translation-Architectures-768x405.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Massive-Exploration-of-Neural-Machine-Translation-Architectures.png 774w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>39. Learning to Reason: End-to-End Module Networks for Visual Question Answering<\/strong><br \/>\nRonghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko. Berkeley, Facebook, Boston. ICCV 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1704.05526.pdf\">https:\/\/arxiv.org\/pdf\/1704.05526.pdf<\/a><\/p>\n<p>A modular neural architecture for visual question answering. A seq2seq component predicts the sequence of neural modules (eg find() and compare()) based on the textual question, which are then dynamically combined and trained end-to-end. Achieves good results on three separate benchmarks that focus on reasoning about the image.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-to-Reason-End-to-End-Module-Networks-for-Visual-Question-Answering.png\" rel=\"attachment wp-att-1087\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-to-Reason-End-to-End-Module-Networks-for-Visual-Question-Answering-300x122.png\" alt=\"Learning to Reason: End-to-End Module Networks for Visual Question Answering\" width=\"300\" height=\"122\" class=\"aligncenter size-medium wp-image-1087\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-to-Reason-End-to-End-Module-Networks-for-Visual-Question-Answering-300x122.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-to-Reason-End-to-End-Module-Networks-for-Visual-Question-Answering-150x61.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-to-Reason-End-to-End-Module-Networks-for-Visual-Question-Answering-768x312.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-to-Reason-End-to-End-Module-Networks-for-Visual-Question-Answering.png 942w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>40. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction<\/strong><br \/>\nChristopher Bryant, Mariano Felice, Ted Briscoe. Cambridge. ACL 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1074.pdf\">http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1074.pdf<\/a><\/p>\n<p>A toolkit for automatically annotating error correction data with error types. It takes original and corrected sentences as input, aligns them to infer error spans, and uses rules to assign error types. They use the tool to perform fine-grained evaluation of CoNLL-14 shared task participants.<\/p>\n<p><strong>41. Dynamic Evaluation of Neural Sequence Models<\/strong><br \/>\nBen Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals. Edinburgh. ArXiv 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1709.07432\">https:\/\/arxiv.org\/abs\/1709.07432<\/a><\/p>\n<p>Updating the parameters in a LSTM language model based on the observed sequence during testing. A slice of text is first processed and then used for a gradient descent update step. A regularisation term is also proposed which draws the parameters back towards the original model.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Dynamic-Evaluation-of-Neural-Sequence-Models.png\" rel=\"attachment wp-att-1089\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Dynamic-Evaluation-of-Neural-Sequence-Models-300x225.png\" alt=\"Dynamic Evaluation of Neural Sequence Models\" width=\"300\" height=\"225\" class=\"aligncenter size-medium wp-image-1089\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Dynamic-Evaluation-of-Neural-Sequence-Models-300x225.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Dynamic-Evaluation-of-Neural-Sequence-Models-150x113.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Dynamic-Evaluation-of-Neural-Sequence-Models.png 587w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>42. Unsupervised Machine Translation Using Monolingual Corpora Only<\/strong><br \/>\nGuillaume Lample, Ludovic Denoyer, Marc&#8217;Aurelio Ranzato. Facebook, Sorbonne. ArXiv 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/1711.00043\">https:\/\/arxiv.org\/abs\/1711.00043<\/a><\/p>\n<p>The model learns to translate using a seq2seq model, an autoencoder objective, and an adversarial objective for language identification.<br \/>\nThe system is trained to correct noisy versions of its own output and iteratively improves performance.<br \/>\nDoes not require parallel corpora, but relies on a separate method for inducing a parallel dictionary that bootstraps the translation.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Unsupervised-Machine-Translation-Using-Monolingual-Corpora-Only.png\" rel=\"attachment wp-att-1090\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Unsupervised-Machine-Translation-Using-Monolingual-Corpora-Only-300x72.png\" alt=\"Unsupervised Machine Translation Using Monolingual Corpora Only\" width=\"300\" height=\"72\" class=\"aligncenter size-medium wp-image-1090\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Unsupervised-Machine-Translation-Using-Monolingual-Corpora-Only-300x72.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Unsupervised-Machine-Translation-Using-Monolingual-Corpora-Only-150x36.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Unsupervised-Machine-Translation-Using-Monolingual-Corpora-Only-768x183.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Unsupervised-Machine-Translation-Using-Monolingual-Corpora-Only.png 834w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>43. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies<\/strong><br \/>\nTal Linzen, Emmanuel Dupoux, Yoav Goldberg. ENS, Bar Ilan. TACL 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/Q\/Q16\/Q16-1037.pdf\">http:\/\/aclweb.org\/anthology\/Q\/Q16\/Q16-1037.pdf<\/a><\/p>\n<p>Investigation of how well LSTMs capture long-distance dependencies. The task is to predict verb agreement (singular or plural) when the subject noun is separated by different numbers of distractors. They find that an LSTM trained explicitly for this task manages to handle even most of the difficult cases, but a regular language model is more prone to being misled by the distractors.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Assessing-the-Ability-of-LSTMs-to-Learn-Syntax-Sensitive-Dependencies.png\" rel=\"attachment wp-att-1092\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Assessing-the-Ability-of-LSTMs-to-Learn-Syntax-Sensitive-Dependencies-300x137.png\" alt=\"Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies\" width=\"300\" height=\"137\" class=\"aligncenter size-medium wp-image-1092\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Assessing-the-Ability-of-LSTMs-to-Learn-Syntax-Sensitive-Dependencies-300x137.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Assessing-the-Ability-of-LSTMs-to-Learn-Syntax-Sensitive-Dependencies-150x68.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Assessing-the-Ability-of-LSTMs-to-Learn-Syntax-Sensitive-Dependencies-768x350.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Assessing-the-Ability-of-LSTMs-to-Learn-Syntax-Sensitive-Dependencies.png 931w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>44. Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis<\/strong><br \/>\nStefanos Angelidis, Mirella Lapata. Edinburgh. ArXiv 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1711.09645.pdf\">https:\/\/arxiv.org\/pdf\/1711.09645.pdf<\/a><\/p>\n<p>A model for document sentiment classification which can also return sentence-level sentiment predictions. They construct sentence-level representations using a convnet, use this to predict a sentence-level probability distribution over possible sentiment labels, and then combine these over all sentences either with a fixed weight vector or using an attention mechanism. They release a new dataset of 200 documents annotated on the level of sentences and discourse units.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Multiple-Instance-Learning-Networks-for-Fine-Grained-Sentiment-Analysis.png\" rel=\"attachment wp-att-1093\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Multiple-Instance-Learning-Networks-for-Fine-Grained-Sentiment-Analysis-300x155.png\" alt=\"Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis\" width=\"300\" height=\"155\" class=\"aligncenter size-medium wp-image-1093\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Multiple-Instance-Learning-Networks-for-Fine-Grained-Sentiment-Analysis-300x155.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Multiple-Instance-Learning-Networks-for-Fine-Grained-Sentiment-Analysis-150x78.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Multiple-Instance-Learning-Networks-for-Fine-Grained-Sentiment-Analysis-768x398.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Multiple-Instance-Learning-Networks-for-Fine-Grained-Sentiment-Analysis.png 846w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>45. Learning how to Active Learn: A Deep Reinforcement Learning Approach<\/strong><br \/>\nMeng Fang, Yuan Li, Trevor Cohn. Melbourne. EMNLP 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/D\/D17\/D17-1063.pdf\">http:\/\/aclweb.org\/anthology\/D\/D17\/D17-1063.pdf<\/a><\/p>\n<p>Active learning (choosing which examples to annotate for training) is proposed as a reinforcement learning problem. The Q-learning network predicts for each sentence whether it should be annotated, and is trained based on the performance improvement from the main task. Evaluation is done on NER, with experiments on transferring the trained Q-learning function to other languages.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-how-to-Active-Learn-A-Deep-Reinforcement-Learning-Approach.png\" rel=\"attachment wp-att-1094\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-how-to-Active-Learn-A-Deep-Reinforcement-Learning-Approach-300x300.png\" alt=\"Learning how to Active Learn: A Deep Reinforcement Learning Approach\" width=\"300\" height=\"300\" class=\"aligncenter size-medium wp-image-1094\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-how-to-Active-Learn-A-Deep-Reinforcement-Learning-Approach-300x300.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-how-to-Active-Learn-A-Deep-Reinforcement-Learning-Approach-150x150.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Learning-how-to-Active-Learn-A-Deep-Reinforcement-Learning-Approach.png 537w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>46. On the State of the Art of Evaluation in Neural Language Models<\/strong><br \/>\nG\u00e1bor Melis, Chris Dyer, Phil Blunsom. Deepmind, Oxford. ArXiv 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1707.05589.pdf\">https:\/\/arxiv.org\/pdf\/1707.05589.pdf<\/a><\/p>\n<p>Comparison of three recurrent architectures for language modelling: LSTMs, Recurrent Highway Networks and the NAS architecture. Each model goes through a substantial hyperparameter search, under the constraint that the total number of parameters is kept constant. They conclude that basic LSTMs still outperform other architectures and achieve state-of-the-art perplexities on two datasets.<\/p>\n<p><strong>47. Dynamic Routing Between Capsules<\/strong><br \/>\nSara Sabour, Nicholas Frosst, Geoffrey E Hinton. Google Brain. NIPS 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1710.09829.pdf\">https:\/\/arxiv.org\/pdf\/1710.09829.pdf<\/a><\/p>\n<p>An attention-based architecture for combining information from different convolutional layers. The attention values are calculated using an iterative process, making use of a custom squashing function. The evaluations on MNIST show robustness to affine transformations.<\/p>\n<p><strong>48. Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss<\/strong><br \/>\nBarbara Plank,\u00a0Anders S\u00f8gaard,\u00a0Yoav Goldberg. Groningen, Copenhagen, Bar-Ilan. ACL 2016.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/P\/P16\/P16-2067.pdf\">http:\/\/aclweb.org\/anthology\/P\/P16\/P16-2067.pdf<\/a><\/p>\n<p>Doing\u00a0POS tagging using a bidirectional LSTM with word- and character-based embeddings. They add an extra component to the loss function &#8211; predicting a frequency class for each word, together with their POS tag. Results show that overall performance remains\u00a0similar, but there&#8217;s an improvement in tagging accuracy for low-frequency words.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sequence_tagging.png\" rel=\"attachment wp-att-1062\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sequence_tagging-300x219.png\" alt=\"sequence_tagging\" width=\"300\" height=\"219\" class=\"aligncenter size-medium wp-image-1062\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sequence_tagging-300x219.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sequence_tagging-150x110.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/sequence_tagging.png 754w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>49. Emergent Translation in Multi-Agent Communication<\/strong><br \/>\nJason Lee, Kyunghyun Cho, Jason Weston, Douwe Kiela. Facebook. ArXiv 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1710.06922.pdf\">https:\/\/arxiv.org\/pdf\/1710.06922.pdf<\/a><\/p>\n<p>Learning to translate using two monolingual image captioning datasets and pivoting through images. The model encodes an image and generates a caption in language A, this is then encoded into the same space as language B and the representation is optimised to be similar to the correct image. The model is trained end-to-end using Gumbel-softmax.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Emergent-Translation-in-Multi-Agent-Communication.png\" rel=\"attachment wp-att-1096\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Emergent-Translation-in-Multi-Agent-Communication-300x144.png\" alt=\"Emergent Translation in Multi-Agent Communication\" width=\"300\" height=\"144\" class=\"aligncenter size-medium wp-image-1096\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Emergent-Translation-in-Multi-Agent-Communication-300x144.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Emergent-Translation-in-Multi-Agent-Communication-150x72.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Emergent-Translation-in-Multi-Agent-Communication-768x369.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Emergent-Translation-in-Multi-Agent-Communication.png 940w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>50. Efficient softmax approximation for GPUs<\/strong><br \/>\nEdouard Grave, Armand Joulin, Moustapha Ciss\u00e9, David Grangier, Herv\u00e9 J\u00e9gou. Facebook. ICML 2017.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/1609.04309.pdf\">https:\/\/arxiv.org\/pdf\/1609.04309.pdf<\/a><\/p>\n<p>Modification of the 2-level hierarchical softmax for better efficiency. An equation of computational complexity is used to find the optimal number of words in each class. In addition, the most common words are considered on the same level as other classes.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Efficient-softmax-approximation-for-GPUs.png\" rel=\"attachment wp-att-1097\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Efficient-softmax-approximation-for-GPUs-300x155.png\" alt=\"Efficient softmax approximation for GPUs\" width=\"300\" height=\"155\" class=\"aligncenter size-medium wp-image-1097\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Efficient-softmax-approximation-for-GPUs-300x155.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Efficient-softmax-approximation-for-GPUs-150x78.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Efficient-softmax-approximation-for-GPUs.png 609w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>51. Semi-supervised Multitask Learning for Sequence Labeling<\/strong><br \/>\nMarek Rei. Cambridge. ACL 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1194.pdf\">http:\/\/aclweb.org\/anthology\/P\/P17\/P17-1194.pdf<\/a><\/p>\n<p>Incorporating an unsupervised language modeling objective to help train a bidirectional LSTM for sequence labeling. At the same time as training the tagger, the forward-facing LSTM is optimised to predict the next word and the backward-facing LSTM is optimised to predict the previous word. The model learns a better composition function and improves performance on NER, error detection, chunking and POS-tagging, without using additional data.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-Multitask-Learning-for-Sequence-Labeling.png\" rel=\"attachment wp-att-1100\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-Multitask-Learning-for-Sequence-Labeling.png\" alt=\"Semi-supervised Multitask Learning for Sequence Labeling\" width=\"936\" height=\"279\" class=\"aligncenter size-full wp-image-1100\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-Multitask-Learning-for-Sequence-Labeling.png 936w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-Multitask-Learning-for-Sequence-Labeling-150x45.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-Multitask-Learning-for-Sequence-Labeling-300x89.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Semi-supervised-Multitask-Learning-for-Sequence-Labeling-768x229.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><\/a><\/p>\n<p><strong>52. Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection<\/strong><br \/>\nMarek Rei, Luana Bulat, Douwe Kiela, Ekaterina Shutova. Cambridge, Facebook. EMNLP 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/D\/D17\/D17-1162.pdf\">http:\/\/aclweb.org\/anthology\/D\/D17\/D17-1162.pdf<\/a><\/p>\n<p>A specialised architecture for detecting metaphorical phrases. Uses a gating mechanism to condition one word based on the other, a neural version of weighted cosine similarity to make a prediction and hinge loss to optimise the model. Achieves high results on detecting metaphorical adjective-noun, verb-object and verb-subject phrases.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Grasping-the-Finer-Point-A-Supervised-Similarity-Network-for-Metaphor-Detection.png\" rel=\"attachment wp-att-1102\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Grasping-the-Finer-Point-A-Supervised-Similarity-Network-for-Metaphor-Detection-300x100.png\" alt=\"Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection\" width=\"300\" height=\"100\" class=\"aligncenter size-medium wp-image-1102\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Grasping-the-Finer-Point-A-Supervised-Similarity-Network-for-Metaphor-Detection-300x100.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Grasping-the-Finer-Point-A-Supervised-Similarity-Network-for-Metaphor-Detection-150x50.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Grasping-the-Finer-Point-A-Supervised-Similarity-Network-for-Metaphor-Detection.png 653w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>53. Neural Sequence-Labelling Models for Grammatical Error Correction<\/strong><br \/>\nHelen Yannakoudakis, Marek Rei, \u00d8istein E. Andersen, Zheng Yuan. Cambridge. EMNLP 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/D\/D17\/D17-1297.pdf\">http:\/\/aclweb.org\/anthology\/D\/D17\/D17-1297.pdf<\/a><\/p>\n<p>Using error detection to improve error correction. A neural sequence labeling model is used to find correctness probabilities for every token, which are then used to rerank possible correction candidates. The process consistently improves the performance of different correction systems.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Neural-Sequence-Labelling-Models-for-Grammatical-Error-Correction.png\" rel=\"attachment wp-att-1103\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Neural-Sequence-Labelling-Models-for-Grammatical-Error-Correction-255x300.png\" alt=\"Neural Sequence-Labelling Models for Grammatical Error Correction\" width=\"255\" height=\"300\" class=\"aligncenter size-medium wp-image-1103\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Neural-Sequence-Labelling-Models-for-Grammatical-Error-Correction-255x300.png 255w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Neural-Sequence-Labelling-Models-for-Grammatical-Error-Correction-127x150.png 127w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Neural-Sequence-Labelling-Models-for-Grammatical-Error-Correction.png 322w\" sizes=\"auto, (max-width: 255px) 100vw, 255px\" \/><\/a><\/p>\n<p><strong>54. Artificial Error Generation with Machine Translation and Syntactic Patterns<\/strong><br \/>\nMarek Rei, Mariano Felice, Zheng Yuan, Ted Briscoe. Cambridge. BEA 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/W\/W17\/W17-5032.pdf\">http:\/\/aclweb.org\/anthology\/W\/W17\/W17-5032.pdf<\/a><\/p>\n<p>Investigating methods for generating artificial data in order to train better systems for detecting grammatical errors. The first approach uses regular machine translation, essentially translating from correct English to incorrect English. The second method uses local patterns with slots and POS tags to insert errors into new text.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Artificial-Error-Generation-with-Machine-Translation-and-Syntactic-Patterns.png\" rel=\"attachment wp-att-1105\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Artificial-Error-Generation-with-Machine-Translation-and-Syntactic-Patterns.png\" alt=\"Artificial Error Generation with Machine Translation and Syntactic Patterns\" width=\"770\" height=\"117\" class=\"aligncenter size-full wp-image-1105\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Artificial-Error-Generation-with-Machine-Translation-and-Syntactic-Patterns.png 770w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Artificial-Error-Generation-with-Machine-Translation-and-Syntactic-Patterns-150x23.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Artificial-Error-Generation-with-Machine-Translation-and-Syntactic-Patterns-300x46.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Artificial-Error-Generation-with-Machine-Translation-and-Syntactic-Patterns-768x117.png 768w\" sizes=\"auto, (max-width: 770px) 100vw, 770px\" \/><\/a><\/p>\n<p><strong>55. Auxiliary Objectives for Neural Error Detection Models<\/strong><br \/>\nMarek Rei, Helen Yannakoudakis. Cambridge. BEA 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/W\/W17\/W17-5004.pdf\">http:\/\/aclweb.org\/anthology\/W\/W17\/W17-5004.pdf<\/a><\/p>\n<p>Investigating a range of auxiliary objectives for training a sequence labeling system for error detection. Automatically generated dependency relations and POS tags perform surprisingly well as gold labels for multi-task learning. Learning different objectives at the same time works better than doing them in sequence or switching.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Auxiliary-Objectives-for-Neural-Error-Detection-Models.png\" rel=\"attachment wp-att-1106\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Auxiliary-Objectives-for-Neural-Error-Detection-Models-300x192.png\" alt=\"Auxiliary Objectives for Neural Error Detection Models\" width=\"300\" height=\"192\" class=\"aligncenter size-medium wp-image-1106\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Auxiliary-Objectives-for-Neural-Error-Detection-Models-300x192.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Auxiliary-Objectives-for-Neural-Error-Detection-Models-150x96.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Auxiliary-Objectives-for-Neural-Error-Detection-Models.png 438w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>56. An Error-Oriented Approach to Word Embedding Pre-Training<\/strong><br \/>\nYoumna Farag, Marek Rei, Ted Briscoe. Cambridge. BEA 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/W\/W17\/W17-5016.pdf\">http:\/\/aclweb.org\/anthology\/W\/W17\/W17-5016.pdf<\/a><\/p>\n<p>Introduces a process for pre-training word embeddings with an objective that optimises them to distinguish between grammatical and ungrammatical sequences. This is then extended to also distinguish between correct and incorrect versions of the same sentence. The embeddings are then used in a network for essay scoring, improving performance compared to previous methods.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/An-Error-Oriented-Approach-to-Word-Embedding-Pre-Training.png\" rel=\"attachment wp-att-1107\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/An-Error-Oriented-Approach-to-Word-Embedding-Pre-Training-300x183.png\" alt=\"An Error-Oriented Approach to Word Embedding Pre-Training\" width=\"300\" height=\"183\" class=\"aligncenter size-medium wp-image-1107\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/An-Error-Oriented-Approach-to-Word-Embedding-Pre-Training-300x183.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/An-Error-Oriented-Approach-to-Word-Embedding-Pre-Training-150x91.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/An-Error-Oriented-Approach-to-Word-Embedding-Pre-Training.png 735w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><strong>57. Detecting Off-topic Responses to Visual Prompts<\/strong><br \/>\nMarek Rei. Cambridge. BEA 2017.<br \/>\n<a href=\"http:\/\/aclweb.org\/anthology\/W\/W17\/W17-5020.pdf\">http:\/\/aclweb.org\/anthology\/W\/W17\/W17-5020.pdf<\/a><\/p>\n<p>A neural architecture for detecting off-topic written responses, with respect to visual prompts. The text is composed with an LSTM and then used to condition the image representation. The two representations are then compared to calculate a confidence score for the text being written in response to the prompt image.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Detecting-Off-topic-Responses-to-Visual-Prompts.png\" rel=\"attachment wp-att-1108\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Detecting-Off-topic-Responses-to-Visual-Prompts.png\" alt=\"Detecting Off-topic Responses to Visual Prompts\" width=\"774\" height=\"327\" class=\"aligncenter size-full wp-image-1108\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Detecting-Off-topic-Responses-to-Visual-Prompts.png 774w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Detecting-Off-topic-Responses-to-Visual-Prompts-150x63.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Detecting-Off-topic-Responses-to-Visual-Prompts-300x127.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/Detecting-Off-topic-Responses-to-Visual-Prompts-768x324.png 768w\" sizes=\"auto, (max-width: 774px) 100vw, 774px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Staying on top of recent work is an important part of being a good researcher, but this can be quite difficult. Thousands of new papers&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-717","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>57 Summaries of Machine Learning and NLP Research - Marek Rei<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"57 Summaries of Machine Learning and NLP Research - Marek Rei\" \/>\n<meta property=\"og:description\" content=\"Staying on top of recent work is an important part of being a good researcher, but this can be quite difficult. Thousands of new papers&hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/\" \/>\n<meta property=\"og:site_name\" content=\"Marek Rei\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-17T15:02:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-09-27T23:22:28+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png\" \/>\n<meta name=\"author\" content=\"Marek\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marek\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"30 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/\",\"url\":\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/\",\"name\":\"57 Summaries of Machine Learning and NLP Research - Marek Rei\",\"isPartOf\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png\",\"datePublished\":\"2018-01-17T15:02:16+00:00\",\"dateModified\":\"2019-09-27T23:22:28+00:00\",\"author\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/a145eb0a06ed4acf5b0f84a24b7a1191\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#primaryimage\",\"url\":\"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png\",\"contentUrl\":\"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.marekrei.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"57 Summaries of Machine Learning and NLP Research\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/#website\",\"url\":\"https:\/\/www.marekrei.com\/blog\/\",\"name\":\"Marek Rei\",\"description\":\"Thoughts on Machine Learning and Natural Language Processing\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.marekrei.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/a145eb0a06ed4acf5b0f84a24b7a1191\",\"name\":\"Marek\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/48a65414bfda6485aaa0703e548de0ed25292b5fe0d979ed8c28ad83cf5a82c0?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/48a65414bfda6485aaa0703e548de0ed25292b5fe0d979ed8c28ad83cf5a82c0?s=96&d=mm&r=g\",\"caption\":\"Marek\"},\"url\":\"https:\/\/www.marekrei.com\/blog\/author\/marek\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"57 Summaries of Machine Learning and NLP Research - Marek Rei","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/","og_locale":"en_US","og_type":"article","og_title":"57 Summaries of Machine Learning and NLP Research - Marek Rei","og_description":"Staying on top of recent work is an important part of being a good researcher, but this can be quite difficult. Thousands of new papers&hellip;","og_url":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/","og_site_name":"Marek Rei","article_published_time":"2018-01-17T15:02:16+00:00","article_modified_time":"2019-09-27T23:22:28+00:00","og_image":[{"url":"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png"}],"author":"Marek","twitter_misc":{"Written by":"Marek","Est. reading time":"30 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/","url":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/","name":"57 Summaries of Machine Learning and NLP Research - Marek Rei","isPartOf":{"@id":"https:\/\/www.marekrei.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#primaryimage"},"image":{"@id":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#primaryimage"},"thumbnailUrl":"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png","datePublished":"2018-01-17T15:02:16+00:00","dateModified":"2019-09-27T23:22:28+00:00","author":{"@id":"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/a145eb0a06ed4acf5b0f84a24b7a1191"},"breadcrumb":{"@id":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.marekrei.com\/blog\/paper-summaries\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#primaryimage","url":"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png","contentUrl":"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2018\/01\/cnn_daily_mail-300x151.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.marekrei.com\/blog\/paper-summaries\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.marekrei.com\/blog\/"},{"@type":"ListItem","position":2,"name":"57 Summaries of Machine Learning and NLP Research"}]},{"@type":"WebSite","@id":"https:\/\/www.marekrei.com\/blog\/#website","url":"https:\/\/www.marekrei.com\/blog\/","name":"Marek Rei","description":"Thoughts on Machine Learning and Natural Language Processing","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.marekrei.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/a145eb0a06ed4acf5b0f84a24b7a1191","name":"Marek","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/48a65414bfda6485aaa0703e548de0ed25292b5fe0d979ed8c28ad83cf5a82c0?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/48a65414bfda6485aaa0703e548de0ed25292b5fe0d979ed8c28ad83cf5a82c0?s=96&d=mm&r=g","caption":"Marek"},"url":"https:\/\/www.marekrei.com\/blog\/author\/marek\/"}]}},"_links":{"self":[{"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/posts\/717","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/comments?post=717"}],"version-history":[{"count":94,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/posts\/717\/revisions"}],"predecessor-version":[{"id":1293,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/posts\/717\/revisions\/1293"}],"wp:attachment":[{"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/media?parent=717"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/categories?post=717"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/tags?post=717"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}