{"id":1932,"date":"2024-11-04T11:47:27","date_gmt":"2024-11-04T11:47:27","guid":{"rendered":"https:\/\/www.marekrei.com\/blog\/?p=1932"},"modified":"2024-11-04T11:47:27","modified_gmt":"2024-11-04T11:47:27","slug":"68-summaries-of-machine-learning-and-nlp-research","status":"publish","type":"post","link":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/","title":{"rendered":"68 Summaries of Machine Learning and NLP Research"},"content":{"rendered":"<p>I have written short summaries of 68 different research papers published in the areas of Machine Learning and Natural Language Processing. They cover a wide range of different topics, authors and venues. These are not meant to be reviews showing my subjective opinion, but instead I aim to provide a blunt and concise overview of the core contribution of each publication. At the end of the list I have also included a selection of my own papers, published together with my students and collaborators.<\/p>\n<p>Given how many papers are published in our area every year, it is getting more and more difficult to keep track of all of them. The goal of this post is to save some time for both new and experienced readers in the field and allow them to get a quick overview of 68 research papers.<\/p>\n<p>These summaries are written in a way that tries to relay the core idea and main takeaway of the paper without any overhype or marketing wrapper. <\/p>\n<p>It is probably good to also to mention that I wrote all of these summaries myself and they are not generated by any language models.<\/p>\n<p>Here we go.<\/p>\n<h4>1. PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts<\/h4>\n<p>Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie. Microsoft Research, CAS, CMU, Peking University, Westlake University, Duke University. ArXiv 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2306.04528\">https:\/\/arxiv.org\/abs\/2306.04528<\/a><\/p>\n<p>The paper investigates LLM robustness to prompt perturbations, measuring how much task performance drops for different models with different attacks. Prompts are changed by introducing spelling errors, replacing synonyms, concatenating irrelevant information or translating from a different language. Word replacement attacks are found to be most effective, with average 33% performance drop. Character-level attacks rank second. GPT-4 and UL2 outperformed other investigated models in terms of robustness.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png\" alt=\"\" width=\"300\" height=\"214\" class=\"aligncenter size-medium wp-image-1933\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-1024x732.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-150x107.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-768x549.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-1536x1098.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench.png 1680w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>2. System 2 Attention (is something you might need too)<\/h4>\n<p>Jason Weston, Sainbayar Sukhbaatar. Meta. ArXiv 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2311.11829\">https:\/\/arxiv.org\/abs\/2311.11829<\/a><\/p>\n<p>The paper proposes query rewriting as the solution to the problem of LLMs being overly affected by irrelevant information in the prompts. The first step asks the LLM to rewrite the prompt to remove the irrelevant parts. This edited prompt is then given to the LLM to get the final answer, which improves robustness when the prompts include irrelevant information.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/system2-attention.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/system2-attention-1024x359.png\" alt=\"\" width=\"843\" height=\"296\" class=\"aligncenter size-large wp-image-1940\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/system2-attention-1024x359.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/system2-attention-300x105.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/system2-attention-150x53.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/system2-attention-768x269.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/system2-attention-1536x538.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/system2-attention.png 1773w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>3. Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval<\/h4>\n<p>Jo\u00e3o Coelho, Bruno Martins, Jo\u00e3o Magalh\u00e3es, Jamie Callan, Chenyan Xiong. CMU, University of Lisbon, NOVA School of Science and Technology. ArXiv 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2404.04163\">https:\/\/arxiv.org\/abs\/2404.04163<\/a><\/p>\n<p>The paper investigates positional biases when encoding long documents into a vector for similarity-based retrieval. They start with a pre-trained T5-based model and show that this by itself isn&#8217;t biased. However, when using contrastive training (either unsupervised or supervised) to optimize the model for retrieval, the model starts to perform much better when the evidence is in the beginning of the text. They find evidence to indicate that this bias is part of the task itself &#8211; important information tends to be towards the beginning of the texts, so that is where the models learns to look more.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/dwell.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/dwell-300x161.png\" alt=\"\" width=\"300\" height=\"161\" class=\"aligncenter size-medium wp-image-1944\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/dwell-300x161.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/dwell-1024x548.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/dwell-150x80.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/dwell-768x411.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/dwell.png 1531w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>4. Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention<\/h4>\n<p>Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal. Google. ArXiv 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2404.07143\">https:\/\/arxiv.org\/abs\/2404.07143<\/a><\/p>\n<p>The paper extends the attention in transformers (which have inefficient quadratic complexity) with a memory component that considerably increases the effective context length. The memory approximates a key-value store by recursively adding previous values into a matrix of parameters as the model moves through context. They show state-of-the-art results on long-context language modelling, finding a hidden passcode from a 1M token length context, and summarizing 500K length books.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/leavenocontext.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/leavenocontext-1024x259.png\" alt=\"\" width=\"843\" height=\"213\" class=\"aligncenter size-large wp-image-1946\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/leavenocontext-1024x259.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/leavenocontext-300x76.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/leavenocontext-150x38.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/leavenocontext-768x194.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/leavenocontext-1536x389.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/leavenocontext-2048x519.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>5. BooookScore: A systematic exploration of book-length summarization in the era of LLMs<\/h4>\n<p>Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer. UMass Amherst, AllenAI, Princeton. ICLR 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2310.00785\">https:\/\/arxiv.org\/abs\/2310.00785<\/a><\/p>\n<p>The paper investigates two strategies for LLM evaluation of reading full-length books: hierarchically combining chunk-level summaries and incrementally building up a summary while going through the book. They focus on coherence, as opposed to correctness, and develop an automated LLM-based score (BooookScore) for assessing summaries. They first have humans assess each sentence of a sample of generated summaries, then check that the automated metric correlates with the human assessment. The results indicate that hierarchical summarisation produces more coherent summaries while incremental summarisation leads to more details being retained in the summary.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/booookscore.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/booookscore-1024x438.png\" alt=\"\" width=\"843\" height=\"361\" class=\"aligncenter size-large wp-image-1950\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/booookscore-1024x438.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/booookscore-300x128.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/booookscore-150x64.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/booookscore-768x329.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/booookscore-1536x658.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/booookscore-2048x877.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>6. DE-COP: Detecting Copyrighted Content in Language Models Training Data<\/h4>\n<p>Andr\u00e9 V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li. ULisboa, UCSB, CMU. ICML 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2402.09910\">https:\/\/arxiv.org\/abs\/2402.09910<\/a><\/p>\n<p>The paper proposes a simple approach to determining whether a particular book has been used for training an LLM. A paragraph from the book is presented to the model, along with multiple paraphrases. The model is then asked to choose which paragraph came from the book. The results are surprisingly good, improving over baselines using probabilities and perplexities.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/decop.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/decop-1024x593.png\" alt=\"\" width=\"843\" height=\"488\" class=\"aligncenter size-large wp-image-1953\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/decop-1024x593.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/decop-300x174.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/decop-150x87.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/decop-768x445.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/decop-1536x890.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/decop.png 2044w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>7. STaR-GATE: Teaching Language Models to Ask Clarifying Questions<\/h4>\n<p>Chinmaya Andukuri, Jan-Philipp Fr\u00e4nken, Tobias Gerstenberg, Noah D. Goodman. Stanford. ArXiv 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2403.19154\">https:\/\/arxiv.org\/abs\/2403.19154<\/a><\/p>\n<p>The paper focuses on teaching LLMs to ask clarifying questions instead of trying to immediately answer user questions and instructions in one turn. They set up two LLMs to chat with each other &#8211; the Roleplayer has a secret persona and asks a scripted question, while the Questioner is tasked with answering the question after asking clarifying questions. They sample alternative dialogue traces, identify the one that led to the best final answer, then supervise the Questioner with this best dialogue. The results show that the trained Questioner is able to provide better persona-specific answers.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/stargate.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/stargate-300x195.png\" alt=\"\" width=\"300\" height=\"195\" class=\"aligncenter size-medium wp-image-1955\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/stargate-300x195.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/stargate-1024x665.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/stargate-150x97.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/stargate-768x499.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/stargate.png 1298w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>8. Are Emergent Abilities of Large Language Models a Mirage?<\/h4>\n<p>Rylan Schaeffer, Brando Miranda, Sanmi Koyejo. Stanford. NeurIPS 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2304.15004\">https:\/\/arxiv.org\/abs\/2304.15004<\/a><\/p>\n<p>This paper questions the claim that LLMs have emergent abilities &#8211; unexpected skills that suddenly appear only with models that are sufficiently large. They show that this property is mostly due to the use of discontinuous metrics, which only credit models once they reach sufficiently high levels of abilities. When using smooth continuous metrics, and increasing test sets to sufficient size, the &#8220;sudden appearance&#8221; of abilities is replaced by gradual improvements.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mirage.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mirage-1024x624.png\" alt=\"\" width=\"843\" height=\"514\" class=\"aligncenter size-large wp-image-1957\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mirage-1024x624.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mirage-300x183.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mirage-150x91.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mirage-768x468.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mirage-1536x936.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mirage.png 1622w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>9. Distinguishing the Knowable from the Unknowable with Language Models<\/h4>\n<p>Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin L. Edelman. Harvard. ICML 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2402.03563\">https:\/\/arxiv.org\/abs\/2402.03563<\/a><\/p>\n<p>The paper investigates the possibility of distinguishing epistemic uncertainty (due to lack of knowledge) from aleatoric uncertainty (due to entropy in the underlying distribution) in LLMs. They make the assumption that a large LLM has no (or less) epistemic uncertainty and train a probe to predict the uncertainty of a large LLM based on the frozen activations of a smaller LLM. This is largely successful, indicating that the model activations in the smaller model are able to differentiate between the different uncertainty types. They also propose that in-context information affects the LLM probabilities more in the case of epistemic uncertainty, and less with aleatoric uncertainty.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/distinguishing.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/distinguishing-1024x454.png\" alt=\"\" width=\"843\" height=\"374\" class=\"aligncenter size-large wp-image-1959\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/distinguishing-1024x454.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/distinguishing-300x133.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/distinguishing-150x67.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/distinguishing-768x341.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/distinguishing.png 1298w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>10. Do Large Language Models Latently Perform Multi-Hop Reasoning?<\/h4>\n<p>Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, Sebastian Riedel. DeepMind, UCL, Google Research, Tel Aviv University. ArXiv 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2402.16837\">https:\/\/arxiv.org\/abs\/2402.16837<\/a><\/p>\n<p>The paper investigates the ability of LLMs to perform latent multi-hop reasoning, by completing sentences such as &#8220;The author of the novel Ubik was born in the city of &#8230;&#8221;. They construct experiments to investigate each hop separately: 1) whether the internal representation for the intermediate entity (&#8220;Philip K. Dick&#8221;) required for the first hop strengthens, and 2) whether increased recall of the intermediate entity improves the consistency of the final answer. In these experiments they find strong evidence for the first hop and moderate evidence for the second hop. They construct a dataset of 45,595 pairs of one-hop and two-hop prompts, to be released. <\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/latently.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/latently-274x300.png\" alt=\"\" width=\"274\" height=\"300\" class=\"aligncenter size-medium wp-image-1961\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/latently-274x300.png 274w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/latently-935x1024.png 935w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/latently-137x150.png 137w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/latently-768x841.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/latently.png 1317w\" sizes=\"auto, (max-width: 274px) 100vw, 274px\" \/><\/a><\/p>\n<h4>11. PhaseEvo: Towards Unified In-Context Prompt Optimization for Large Language Models<\/h4>\n<p>Wendi Cui, Jiaxin Zhang, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley Malin, Sricharan Kumar. Intuit, Vanderbilt, Cambridge. ArXiv 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2402.11347\">https:\/\/arxiv.org\/abs\/2402.11347<\/a><\/p>\n<p>The paper describes an evolution strategy for finding optimal LLM prompts for specific tasks. Prompts are first initialised either by experts or by asking an LLM to recover the prompt based on example input-output pairs. These initial prompts then go through multiple mutation steps, by having the LLM rewrite the prompts according to different strategies. The fitness of the prompts is measured by performance on the dev set and the best prompt is then evaluated on the test set.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/phaseevo.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/phaseevo-300x262.png\" alt=\"\" width=\"300\" height=\"262\" class=\"aligncenter size-medium wp-image-1963\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/phaseevo-300x262.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/phaseevo-1024x893.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/phaseevo-150x131.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/phaseevo-768x670.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/phaseevo.png 1197w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>12. Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs<\/h4>\n<p>Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren. Fudan University, University of Washington, USC, AllenAI. ArXiv 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2402.11442\">https:\/\/arxiv.org\/abs\/2402.11442<\/a><\/p>\n<p>The paper first constructs a dataset of 14000 commonsense logical rules, using LLMs to generate and check the rules. They then assess LLM understanding of these rules by turning the rules into a binary entailment classification task, showing that all models decrease in performance as the complexity of the rules increases. Finally, they train an LLM with these rules and show benefit on downstream tasks that require commonsense reasoning.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/scaffolding.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/scaffolding-1024x366.png\" alt=\"\" width=\"843\" height=\"301\" class=\"aligncenter size-large wp-image-1966\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/scaffolding-1024x366.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/scaffolding-300x107.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/scaffolding-150x54.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/scaffolding-768x275.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/scaffolding-1536x549.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/scaffolding-2048x732.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>13. An LLM Compiler for Parallel Function Calling<\/h4>\n<p>Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami. UC BErkeley, ICSI, LBNL. ICML 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2312.04511\">https:\/\/arxiv.org\/abs\/2312.04511<\/a><\/p>\n<p>While most LLMs perform tool calls sequentially, in a linear iteration loop, this paper investigates executing tool calls in parallel. The planner stage first predicts which tool calls will be needed and what are the dependencies between the tool calls. These tools are then called in parallel and the results are then combined for the final answer. They show performance improvements in some settings and speed improvements in all evaluated settings, showing particular usefulness in settings where the LLM needs to retrieve information about multiple entities (e.g. do background research) in order to reach the final solution.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/llm_compiler.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/llm_compiler-1024x607.png\" alt=\"\" width=\"843\" height=\"500\" class=\"aligncenter size-large wp-image-1968\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/llm_compiler-1024x607.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/llm_compiler-300x178.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/llm_compiler-150x89.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/llm_compiler-768x455.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/llm_compiler-1536x911.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/llm_compiler-2048x1214.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>14. Interpreting Language Models with Contrastive Explanations<\/h4>\n<p>Kayo Yin, Graham Neubig. UC Berkeley, CMU. EMNLP 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.emnlp-main.14\/\">https:\/\/aclanthology.org\/2022.emnlp-main.14\/<\/a><\/p>\n<p>Proposes an explainability method for language modelling that explains why one word was predicted instead of a specific other word. Adapts three different explainability methods to this contrastive approach and evaluates on a dataset of minimally different sentences. The method is shown to better highlight the differences between these specific words, instead of just assigning most of the focus to the previous word.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/interpreting.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/interpreting-300x293.png\" alt=\"\" width=\"300\" height=\"293\" class=\"aligncenter size-medium wp-image-1970\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/interpreting-300x293.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/interpreting-1024x1000.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/interpreting-150x146.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/interpreting-768x750.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/interpreting.png 1302w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>15. Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset<\/h4>\n<p>Ashish V. Thapliyal, Jordi Pont-Tuset, Xi Chen, Radu Soricut. Google Research. EMNLP 2022.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2205.12522\">https:\/\/arxiv.org\/abs\/2205.12522<\/a><\/p>\n<p>Multilingual image captioning dataset containing captions in 36 languages for 3600 images. An annotation process has been designed to make sure all the annotations resemble the same style, similar to an automated captioning output. Results show that the dataset provides a much higher correlation to human judgements, compared to the silver annotations from COCO-dev.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/Crossmodal-3600.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/Crossmodal-3600-1024x535.png\" alt=\"\" width=\"843\" height=\"440\" class=\"aligncenter size-large wp-image-1971\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/Crossmodal-3600-1024x535.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/Crossmodal-3600-300x157.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/Crossmodal-3600-150x78.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/Crossmodal-3600-768x401.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/Crossmodal-3600-1536x803.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/Crossmodal-3600-2048x1070.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>16. Locating and Editing Factual Associations in GPT<\/h4>\n<p>Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov. MIT, Northeastern, Technion IIT. NeurIPS 2022.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2202.05262\">https:\/\/arxiv.org\/abs\/2202.05262<\/a><\/p>\n<p>Proposes a method for editing a specific relational fact in a pre-trained language model. The specific feedforward layer that is responsible for recalling the target of a specific fact is identified using noisy permutations of the model activations. The feedforward layer is then directly trained to produce a more optimal output when given the subject of that fact as input.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/locating.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/locating-1024x631.png\" alt=\"\" width=\"843\" height=\"519\" class=\"aligncenter size-large wp-image-1972\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/locating-1024x631.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/locating-300x185.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/locating-150x92.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/locating-768x474.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/locating-1536x947.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/locating-2048x1263.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>17. M2D2: A Massively Multi-Domain Language Modeling Dataset<\/h4>\n<p>Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer. The University of Tokyo, University of Washington. EMNLP 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.emnlp-main.63\/\">https:\/\/aclanthology.org\/2022.emnlp-main.63\/<\/a><\/p>\n<p>Assembling a dataset for language model domain adaptation, from Wikipedia and ArXiv, containing 22 broad domains and 145 fine-grained domains. Experiments show that when fine-tuning a model for in-domain data, it is best to tune on related broad domain data first, then further only on the specific fine-grained domain data. Out-of-domain performance is shown to strongly correlate with vocabulary overlap between the different domains.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/m2d2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/m2d2-285x300.png\" alt=\"\" width=\"285\" height=\"300\" class=\"aligncenter size-medium wp-image-1973\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/m2d2-285x300.png 285w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/m2d2-973x1024.png 973w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/m2d2-143x150.png 143w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/m2d2-768x808.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/m2d2.png 1206w\" sizes=\"auto, (max-width: 285px) 100vw, 285px\" \/><\/a><\/p>\n<h4>18. Binding Language Models in Symbolic Languages<\/h4>\n<p>Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu. The University of Hong Kong, Shanghai Jiao Tong University, University of Washington, AllenAI, University of Waterloo, Salesforce Research, Yale University, Meta AI. ICLR 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2210.02875\">https:\/\/arxiv.org\/abs\/2210.02875<\/a><\/p>\n<p>Uses a large pre-trained language model (GPT-3 Codex) to translate a natural language question into an SQL query. This query can then contain API calls, which also get executed by the language model, in order to populate additional columns of information in a database, over which the SQL can then operate. Outperforms previous methods on datasets of questions about data tables, using only contextual examples without fine-tuning the language model.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/binding.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/binding-1024x518.png\" alt=\"\" width=\"843\" height=\"426\" class=\"aligncenter size-large wp-image-1974\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/binding-1024x518.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/binding-300x152.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/binding-150x76.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/binding-768x389.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/binding-1536x777.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/binding-2048x1037.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>19. Hungry Hungry Hippos: Towards Language Modeling with State Space Models<\/h4>\n<p>Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher R\u00e9. Stanford, University at Buffalo. ICLR 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2212.14052\">https:\/\/arxiv.org\/abs\/2212.14052<\/a><\/p>\n<p>Proposes a modification for state space models, a version of RNN with linear operations that can be separated into components of cumulative sums. The modification gives them abilities similar to attention, being able to copy and compare tokens across the sequence. The architecture scales O(N log N) with sequence length N, as opposed to N^2 for regular attention.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/hippos.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/hippos-1024x497.png\" alt=\"\" width=\"843\" height=\"409\" class=\"aligncenter size-large wp-image-1976\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/hippos-1024x497.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/hippos-300x146.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/hippos-150x73.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/hippos-768x373.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/hippos-1536x746.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/hippos.png 2026w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>20. Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers<\/h4>\n<p>Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei. ACL 2023.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2023.findings-acl.247\/\">https:\/\/aclanthology.org\/2023.findings-acl.247\/<\/a><\/p>\n<p>Shows how the equations for attention during in-context learning (showing the model examples in the input) can be thought of as a form of gradient descent. Experiments indicate that there are also similarities in how these methods affect the model in practice.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/whycangpt.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/whycangpt-220x300.png\" alt=\"\" width=\"220\" height=\"300\" class=\"aligncenter size-medium wp-image-1977\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/whycangpt-220x300.png 220w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/whycangpt-752x1024.png 752w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/whycangpt-110x150.png 110w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/whycangpt-768x1046.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/whycangpt.png 991w\" sizes=\"auto, (max-width: 220px) 100vw, 220px\" \/><\/a><\/p>\n<h4>21. PAL: Program-aided Language Models<\/h4>\n<p>Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig. CMU, Inspired Cognition. ICML 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2211.10435\">https:\/\/arxiv.org\/abs\/2211.10435<\/a><\/p>\n<p>Question answering with a language model while generating a chain-of-thought, outputting the necessary reasoning steps to get to the answer. In addition to natural language reasoning steps, the model generates python syntax that is then executed in order to output the final answer. This is shown to improve results, particularly when the result requires mathematical arithmetic over large numbers.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/pal.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/pal-300x220.png\" alt=\"\" width=\"300\" height=\"220\" class=\"aligncenter size-medium wp-image-1980\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/pal-300x220.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/pal-1024x750.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/pal-150x110.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/pal-768x562.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/pal-1536x1125.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/pal.png 1674w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>22. Explaining black box text modules in natural language with language models<\/h4>\n<p>Chandan Singh, Aliyah R. Hsu, Richard Antonello, Shailee Jain, Alexander G. Huth, Bin Yu, Jianfeng Gao. MSR, UC Berkeley, UT Austin. NeurIPS XAIA 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2305.09863\">https:\/\/arxiv.org\/abs\/2305.09863<\/a><\/p>\n<p>A method for providing natural language explanations to black-box modules that take neurons as input and return a score as output. A large number of ngrams are passed through the model in order to identify ngrams that result in the largest score. These ngrams are sampled and fed into an LLM for summarization, generating explanation candidates. An LLM is then used for generating positive and negative example sentences based on each candidate explanation, and the score differences between these examples are used for selecting the best explanation.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/explaining-black-box.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/explaining-black-box-1024x595.png\" alt=\"\" width=\"843\" height=\"490\" class=\"aligncenter size-large wp-image-1981\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/explaining-black-box-1024x595.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/explaining-black-box-300x174.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/explaining-black-box-150x87.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/explaining-black-box-768x446.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/explaining-black-box-1536x892.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/explaining-black-box-2048x1190.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>23. Do CoNLL-2003 Named Entity Taggers Still Work Well in 2023?<\/h4>\n<p>Shuheng Liu, Alan Ritter. Georgia Institute of Technology. ACL 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2212.09747\">https:\/\/arxiv.org\/abs\/2212.09747<\/a><\/p>\n<p>A thorough investigation of well-known NER models and how their performance is affected on modern data when trained on CoNLL 2003. They annotate a new test set of news data from 2020 and find that performance of certain models holds up very well and the field luckily hasn&#8217;t overfitted to the CoNLL 2003 test set. For best results on modern data, large models pre-trained on contemporary corpora come out on top, with RoBERTa, T5 and LUKE standing out.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/conll03.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/conll03-1024x583.png\" alt=\"\" width=\"843\" height=\"480\" class=\"aligncenter size-large wp-image-1982\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/conll03-1024x583.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/conll03-300x171.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/conll03-150x85.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/conll03-768x437.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/conll03-1536x874.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/conll03-2048x1166.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>24. The False Promise of Imitating Proprietary LLMs<\/h4>\n<p>Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song. UC Berkeley. ArXiv 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2305.15717\">https:\/\/arxiv.org\/abs\/2305.15717<\/a><\/p>\n<p>Analyzing the performance of open-source LLMs fine-tuned on the outputs of proprietary LLMs. They conclude that when fine-tuned on general-purpose dialogues, the models learn to mimic the style of the teacher model and can fool human assessors, but lack core knowledge and can more easily generate factually incorrect claims. However, when fine-tuning only for a specific task, then this imitation strategy can reach near-parity with the teacher models.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise.png\" alt=\"\" width=\"2409\" height=\"1178\" class=\"aligncenter size-full wp-image-1983\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise.png 2409w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise-300x147.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise-1024x501.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise-150x73.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise-768x376.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise-1536x751.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/false-promise-2048x1001.png 2048w\" sizes=\"auto, (max-width: 2409px) 100vw, 2409px\" \/><\/a><\/p>\n<h4>25. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models<\/h4>\n<p>Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu. Misc. EMNLP 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.emnlp-main.39\">https:\/\/aclanthology.org\/2022.emnlp-main.39<\/a><\/p>\n<p>Unifies 21 generation tasks that require accessing structured information sources into a general sequence-to-sequence benchmark for language models. It includes datasets on semantic parsing, question answering, data-to-text, dialogue and fact verification. The input, structured data and and output is linearised for these datasets, so that a general-purpose language model can be used for all of them. A fine-tuned T5 model is shown to outperform existing SOTA models on many of these datasets.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/unifiedskg.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/unifiedskg-1024x497.png\" alt=\"\" width=\"843\" height=\"409\" class=\"aligncenter size-large wp-image-1984\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/unifiedskg-1024x497.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/unifiedskg-300x146.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/unifiedskg-150x73.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/unifiedskg-768x373.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/unifiedskg-1536x746.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/unifiedskg-2048x994.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>26. Toolformer: Language Models Can Teach Themselves to Use Tools<\/h4>\n<p>Timo Schick, Jane Dwivedi-Yu, Roberto Dess\u00ec, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom. Meta, Universitat Pompeu Fabra. NeurIPS 2023.<br \/>\n<a href=\"https:\/\/openreview.net\/forum?id=Yacmpz84TH\">https:\/\/openreview.net\/forum?id=Yacmpz84TH<\/a><\/p>\n<p>The paper describes a method for teaching large language models to use external tools. A supervised dataset is created by trying to insert results from API calls into various points in the text, then only retaining the cases where doing that improves perplexity. The LM is then fine-tuned on this dataset and manages to improve performance on several tasks by using tools such as a QA system, Wikipedia search and a calculator.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolformer.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolformer-1024x419.png\" alt=\"\" width=\"843\" height=\"345\" class=\"aligncenter size-large wp-image-1987\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolformer-1024x419.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolformer-300x123.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolformer-150x61.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolformer-768x314.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolformer-1536x628.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolformer-2048x838.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>27. ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings<\/h4>\n<p>Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu. UC San Diego, Mohamed bin Zayed University of Artificial Intelligence. NeurIPS 2023.<br \/>\n<a href=\"https:\/\/openreview.net\/forum?id=BHXsb69bSx\">https:\/\/openreview.net\/forum?id=BHXsb69bSx<\/a><\/p>\n<p>The paper presents ToolkenGPT, a framework for extending LMs with tool use. For each new tool, a new token is added to the output vocabulary of the LM and the embedding for that token is trained with annotated or synthetic examples. When the LM generates that token, the model is switched to a different mode and prompted with examples for that particular tool in order to generate the necessary arguments for that tool call.\u00a0<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolkengpt.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolkengpt-1024x603.png\" alt=\"\" width=\"843\" height=\"496\" class=\"aligncenter size-large wp-image-1988\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolkengpt-1024x603.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolkengpt-300x177.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolkengpt-150x88.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolkengpt-768x453.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolkengpt-1536x905.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolkengpt-2048x1207.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>28. Voyager: An Open-Ended Embodied Agent with Large Language Models<\/h4>\n<p>Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar. NVIDIA, Caltech, UT Austin, Stanford, UW Madison. TMLR 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2305.16291\">https:\/\/arxiv.org\/abs\/2305.16291<\/a><\/p>\n<p>Combining together black-box LLMs through different prompting strategies to create a capable agent for playing Minecraft. One component receives information about the current state, reasons about the next possible goals and formulates a suitable task in natural language. Another component receives the desired task along with descriptions of existing API functions and skills, then iteratively generates and improves a code function for performing that task.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/voyager.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/voyager-1024x609.png\" alt=\"\" width=\"843\" height=\"501\" class=\"aligncenter size-large wp-image-1990\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/voyager-1024x609.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/voyager-300x178.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/voyager-150x89.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/voyager-768x457.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/voyager-1536x913.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/voyager.png 1779w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>29. Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction<\/h4>\n<p>Steven Coyne, Keisuke Sakaguchi, Diana Galvan-Sosa, Michael Zock, Kentaro Inui. Tohoku University, RIKEN, Aix-Marseille University. ArXiv 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2303.14342\">https:\/\/arxiv.org\/abs\/2303.14342<\/a><\/p>\n<p>Evaluating well-known LLMs on two established benchmarks for grammatical error correction (GEC). They find that specific prompts matter quite a bit and lower temperature is better for GEC. The results show that LLMs tend to over-correct and perform fluency edits, achieving state-of-the-art performance on a dataset designed for this type of edits (JFLEG). However, performance is considerably lower on a benchmark that focuses on minimal edits and only fixing grammaticality (BEA-2019).<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gpt-gec.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gpt-gec-1024x388.png\" alt=\"\" width=\"843\" height=\"319\" class=\"aligncenter size-large wp-image-1991\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gpt-gec-1024x388.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gpt-gec-300x114.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gpt-gec-150x57.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gpt-gec-768x291.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gpt-gec-1536x581.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gpt-gec-2048x775.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>30. Backpack Language Models<\/h4>\n<p>John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang. Stanford. ACL 2023.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2023.acl-long.506\">https:\/\/aclanthology.org\/2023.acl-long.506<\/a><!-- Backpack Language Models - ACL Anthology --><\/p>\n<p>Proposes a language model architecture that maps each word to multiple sense vectors, uses a separate model to predict the weights for these senses, then directly outputs a prediction as a log-linear function. Experiments show that performance degrades, as many more parameters are required to reach a perplexity comparable to a transformer LM. They find that editing the weights of these sense vectors can be used for mitigating gender bias or editing specific information.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/backpack.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/backpack-300x232.png\" alt=\"\" width=\"300\" height=\"232\" class=\"aligncenter size-medium wp-image-1992\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/backpack-300x232.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/backpack-1024x792.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/backpack-150x116.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/backpack-768x594.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/backpack.png 1211w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>31. What the DAAM: Interpreting Stable Diffusion Using Cross Attention<\/h4>\n<p>Raphael Tang, Linqing Liu, Akshat Pandey, Zhiying Jiang, Gefei Yang, Karun Kumar, Pontus Stenetorp, Jimmy Lin, Ferhan Ture. Comcast Applied AI, UCL, University of Waterloo. ACL 2023.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2023.acl-long.310\">https:\/\/aclanthology.org\/2023.acl-long.310<\/a><\/p>\n<p>A method for analysing text-to-image models, indicating which areas of the generated image are attributed to a particular word in the input. They use the attention scores in Stable Diffusion between convolutional blocks and word embeddings, upscaling and aggregating them between different heads, layers and time steps. The result is competitive to supervised image segmentation models and they use it for linguistic analysis. For example, they show that cohyponyms (e.g. a giraffe and a zebra) in the input can have their concepts merged, resulting in only one of the objects being generated.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/daam.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/daam-1024x268.png\" alt=\"\" width=\"843\" height=\"221\" class=\"aligncenter size-large wp-image-1995\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/daam-1024x268.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/daam-300x78.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/daam-150x39.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/daam-768x201.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/daam-1536x401.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/daam-2048x535.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>32. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs<\/h4>\n<p>Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, Maosong Sun. Tsinghua University, ModelBest, Renmin University of China, Yale University, WeChat AI, Tencent, Zhihu. ICLR 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2307.16789\">https:\/\/arxiv.org\/abs\/2307.16789<\/a><\/p>\n<p>They crawl documentation for 16K APIs from RapidAPI and synthesize an instruction tuning dataset for using these APIs.<br \/>\nInstruction examples are generated using ChatGPT, by asking it to generate examples that make use of one or multiple sample APIs.<br \/>\nMultiple rounds of API calls and responses are allowed by the LLM, which are explored using a depth-first search strategy, until a final answer or termination is generated.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolllm.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolllm-1024x369.png\" alt=\"\" width=\"843\" height=\"304\" class=\"aligncenter size-large wp-image-1996\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolllm-1024x369.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolllm-300x108.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolllm-150x54.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolllm-768x277.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolllm-1536x553.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/toolllm-2048x738.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>33. ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding<\/h4>\n<p>Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, Omer Levy. Tel Aviv University, Meta AI. EMNLP 2023.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2023.findings-emnlp.536\">https:\/\/aclanthology.org\/2023.findings-emnlp.536<\/a><\/p>\n<p>Constructing a benchmark for zero-shot understanding of long texts with LLMs. Includes existing datasets (summarization, QA) from the Scrolls benchmark, along with two new tasks: determining the ratio of positive reviews in a set of reviews, and sorting a shuffled list of book chapter summaries. Results indicate that GPT-4 is best overall, even though it loses points in automatic evaluation as it doesn&#8217;t follow the instructed format.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/zeroscrolls.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/zeroscrolls-1024x435.png\" alt=\"\" width=\"843\" height=\"358\" class=\"aligncenter size-large wp-image-1997\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/zeroscrolls-1024x435.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/zeroscrolls-300x127.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/zeroscrolls-150x64.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/zeroscrolls-768x326.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/zeroscrolls-1536x653.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/zeroscrolls.png 2003w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>34. Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling<\/h4>\n<p>Subhendu Khatuya, Rajdeep Mukherjee, Akash Ghosh, Manjunath Hegde, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal. Indian Institute of Technology Kharagpur, Goldman Sachs. NAACL 2024.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2024.naacl-long.410\/\">https:\/\/aclanthology.org\/2024.naacl-long.410\/<\/a><\/p>\n<p>The paper approaches the task of tagging entities with a large number of labels using a generative model. The LLM is instruction-tuned to generate the label description of the label. The generated description is then mapped to the closest real label description using sentence embeddings and cosine similarity. The method achieves strong performance over other traditional tagging models on this task.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/extremefinancial-1.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/extremefinancial-1-1024x488.png\" alt=\"\" width=\"843\" height=\"402\" class=\"aligncenter size-large wp-image-1999\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/extremefinancial-1-1024x488.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/extremefinancial-1-300x143.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/extremefinancial-1-150x71.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/extremefinancial-1-768x366.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/extremefinancial-1-1536x732.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/extremefinancial-1-2048x975.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>35. CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction<\/h4>\n<p>Tara Safavi, Doug Downey, Tom Hope. University of Michigan, Allen Institute for Artificial Intelligence, Northwestern University, The Hebrew University of Jerusalem. AKBC 2022.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2205.08012\">https:\/\/arxiv.org\/abs\/2205.08012<\/a><\/p>\n<p>Proposes a pipeline of increasingly complex models for predicting links in graph. Simpler models are used to perform initial filtering, then more complex models that need much processing time are applied only on a small chosen sample. The number of candidates to retain at each stage is learned in a supervised way. The model makes it feasible to apply very large models to large tasks while also improving the SOTA results.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/cascader.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/cascader-1024x325.png\" alt=\"\" width=\"843\" height=\"268\" class=\"aligncenter size-large wp-image-2003\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/cascader-1024x325.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/cascader-300x95.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/cascader-150x48.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/cascader-768x243.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/cascader-1536x487.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/cascader-2048x649.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>36. Revisiting Transformer-based Models for Long Document Classification<\/h4>\n<p>Xiang Dai, Ilias Chalkidis, Sune Darkner, Desmond Elliott. CSIRO Data61, University of Copenhagen. EMNLP 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.findings-emnlp.534.pdf\">https:\/\/aclanthology.org\/2022.findings-emnlp.534.pdf<\/a><\/p>\n<p>Comparing sparse attention (Longformer) and hierarchical transformers for long document classification, focusing on electronic health records. They find that the performance is close, with Longformer slightly better out-of-the-box and hierarchical models better after tuning hyperparameters. Results also indicate that splitting long text into overlapping sections and using Label-Wise Attention Network helps improve performance.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/revisiting.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/revisiting-300x272.png\" alt=\"\" width=\"300\" height=\"272\" class=\"aligncenter size-medium wp-image-2006\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/revisiting-300x272.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/revisiting-1024x929.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/revisiting-150x136.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/revisiting-768x696.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/revisiting.png 1234w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>37. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language<\/h4>\n<p>Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence. Google. ICLR 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2204.00598\">https:\/\/arxiv.org\/abs\/2204.00598<\/a><\/p>\n<p>Constructs a pipeline of pre-trained language models with different modalities for scene understanding. Visual LMs rank possible locations and objects, audio LMs rank possible sounds, regular LMs take this information in a filled-out template and generate summaries or answers for QA. LMs can also be used to generate candidate activities, based on the detected places and objects, then these activities are re-ranked by a visual LM to choose ones that match the scene.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/socratic.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/socratic-300x227.png\" alt=\"\" width=\"300\" height=\"227\" class=\"aligncenter size-medium wp-image-2007\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/socratic-300x227.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/socratic-1024x776.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/socratic-150x114.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/socratic-768x582.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/socratic.png 1368w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>38. LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks<\/h4>\n<p>Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee. University of Wisconsin-Madison. NeurIPS 2022.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2206.06565\">https:\/\/arxiv.org\/abs\/2206.06565<\/a><\/p>\n<p>The paper investiagtes the application of pre-trained language models on the task of classifying non-textual data, without any architecture changes.<br \/>\nThe input features are linearized into a text-like sequence and given as context, the output is then collected as a language model prediction.<br \/>\nWhile it doesn&#8217;g perform best overall, it does achieve surprisingly competitive performance.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/lift.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/lift-1024x404.png\" alt=\"\" width=\"843\" height=\"333\" class=\"aligncenter size-large wp-image-2008\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/lift-1024x404.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/lift-300x118.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/lift-150x59.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/lift-768x303.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/lift-1536x606.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/lift-2048x807.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>39. Few-Shot Tabular Data Enrichment Using Fine-Tuned Transformer Architectures<\/h4>\n<p>Asaf Harari, Gilad Katz. Ben-Gurion University of the Negev. ACL 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.acl-long.111.pdf\">https:\/\/aclanthology.org\/2022.acl-long.111.pdf<\/a><\/p>\n<p>System for creating additional feature columns for a tabular dataset, which can then be useful for classification. Entities (rows) in the data are matched to wikipedia articles in order to retrieve plain text descriptions. Binary classifiers are then trained to classify this text according to properties from the tabular data, and the output probabilities are included as new features in the dataset. Evaluation is performed on the main classification task by using these new extra features.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/feste.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/feste-1024x597.png\" alt=\"\" width=\"843\" height=\"491\" class=\"aligncenter size-large wp-image-2009\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/feste-1024x597.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/feste-300x175.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/feste-150x87.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/feste-768x448.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/feste-1536x896.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/feste.png 1960w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>40. Counterfactual Memorization in Neural Language Models<\/h4>\n<p>Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tram\u00e8r, Nicholas Carlini. Google Research, Carnegie Mellon University, Google DeepMind, ETH Z\u00fcrich. NeurIPS 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2112.12938\">https:\/\/arxiv.org\/abs\/2112.12938<\/a><\/p>\n<p>The paper proposes a measure of counterfactual memorization for pre-trained language models.<br \/>\nA large number of different models are trained on subsets of the training set.<br \/>\nThe expected performance on that sentence is then calculated for examples containing a particular sentence, versus those that do not contain this sentence.<br \/>\nA high score difference indicates that the model tends to memorize this sentence when it is included in the training data.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/counterfactual.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/counterfactual-1024x334.png\" alt=\"\" width=\"843\" height=\"275\" class=\"aligncenter size-large wp-image-2010\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/counterfactual-1024x334.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/counterfactual-300x98.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/counterfactual-150x49.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/counterfactual-768x251.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/counterfactual-1536x502.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/counterfactual.png 1715w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>41. Relation-Constrained Decoding for Text Generation<\/h4>\n<p>Xiang Chen, Zhixian Yang, Xiaojun Wan. Peking University. NeurIPS 2022.<br \/>\n<a href=\"https:\/\/openreview.net\/forum?id=dIUQ5haSOI\">https:\/\/openreview.net\/forum?id=dIUQ5haSOI<\/a><\/p>\n<p>The paper describes a model for text generation, based on target dependency relations that should be in the output.<br \/>\nThe word-level output probabilties are modified to increase the likelihood of generating words that match the target relation.<br \/>\nDuring beam decoding, the candidate construction method also takes the target relations into account.<br \/>\nEvaluation is performed on several datasets, formulating the task as text generation based on dependency relations.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/relation-constrained.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/relation-constrained-1024x622.png\" alt=\"\" width=\"843\" height=\"512\" class=\"aligncenter size-large wp-image-2011\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/relation-constrained-1024x622.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/relation-constrained-300x182.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/relation-constrained-150x91.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/relation-constrained-768x467.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/relation-constrained-1536x933.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/relation-constrained-2048x1244.png 2048w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>42. Interpretability for Language Learners Using Example-Based Grammatical Error Correction<\/h4>\n<p>Masahiro Kaneko, Sho Takase, Ayana Niwa, Naoaki Okazaki. Tokyo Institute of Technology. ACL 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.acl-long.496\">https:\/\/aclanthology.org\/2022.acl-long.496<\/a><\/p>\n<p>Describes a system for performing error correction, while also returning examples of similar corrections from the training set. Each token in each sentence is encoded with the GEC model and the representation is used for finding other similar correction examples. The kNN-based similarity is also incorporated into the output distribution of the error correction model, improving performance on closed-class errors while reducing some performance on open-class errors.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/usingexample-based.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/usingexample-based-300x170.png\" alt=\"\" width=\"300\" height=\"170\" class=\"aligncenter size-medium wp-image-2013\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/usingexample-based-300x170.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/usingexample-based-1024x580.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/usingexample-based-150x85.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/usingexample-based-768x435.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/usingexample-based.png 1196w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>43. Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction<\/h4>\n<p>Maksym Tarnavskyi, Artem Chernodub, Kostiantyn Omelianchuk. Ukrainian Catholic University, Grammarly. ACL 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.acl-long.266\">https:\/\/aclanthology.org\/2022.acl-long.266<\/a><\/p>\n<p>The paper extends the GECToR sequence tagging architecture for grammatical error correction.<br \/>\nThe models are scaled up to bigger versions, multiple versions are ensembled together and then distilled back to a single model through generated training data. Improvements are shown on the BEA-2019 dataset both for the ensembled configuration and the single best model.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/ensembling.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/ensembling-300x232.png\" alt=\"\" width=\"300\" height=\"232\" class=\"aligncenter size-medium wp-image-2014\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/ensembling-300x232.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/ensembling-1024x793.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/ensembling-150x116.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/ensembling-768x595.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/ensembling.png 1518w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>44. Linguistic Parameters of Spontaneous Speech for Identifying Mild Cognitive Impairment and Alzheimer Disease<\/h4>\n<p>Veronika Vincze, Martina Katalin Szab\u00f3, Ildik\u00f3 Hoffmann, L\u00e1szl\u00f3 T\u00f3th, Magdolna P\u00e1k\u00e1ski, J\u00e1nos K\u00e1lm\u00e1n, G\u00e1bor Gosztolya. University of Szeged. Computational Linguistics 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.cl-1.5\">https:\/\/aclanthology.org\/2022.cl-1.5<\/a><\/p>\n<p>Developing a system for the detection of cognitive impairment based on linguistic features.<br \/>\nPatients were recorded when answering free-text questions, their answers transcribed, and a large number of features extracted for classification.<br \/>\nMorphological and statistical features perform well across different tasks and the overall classifier achieves 2-class F1 of 84-86%.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/linguisticparameters.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/linguisticparameters-1024x417.png\" alt=\"\" width=\"843\" height=\"343\" class=\"aligncenter size-large wp-image-2015\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/linguisticparameters-1024x417.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/linguisticparameters-300x122.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/linguisticparameters-150x61.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/linguisticparameters-768x312.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/linguisticparameters.png 1052w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>45. Black-box Prompt Learning for Pre-trained Language Models<\/h4>\n<p>Shizhe Diao, Zhichao Huang, Ruijia Xu, Xuechun Li, Yong Lin, Xiao Zhou, Tong Zhang. The Hong Kong University of Science and Technology, University of California San Diego. TMLR 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2201.08531\">https:\/\/arxiv.org\/abs\/2201.08531<\/a><\/p>\n<p>The paper proposes the task of tuning prompts for pre-trained models in a setting where the model weights or activations are not available and need to be treated as a black box.<br \/>\nA first stage of white-box fine-tuning using a small dataset is assumed, followed by a black-box tuning stage using additional data just for updating the prompts.<br \/>\nThe prompts are updated by randomly sampling permutations to the existing prompts, then approximating the gradient using the natural evolution strategy (NAS) algorithm.<br \/>\nEvaluation shows that having the extra black-box training step on additional data is beneficial over only doing white-box prompt tuning using a smaller dataset, but is outperformed by using the full dataset for white-box prompt tuning.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/blackboxpromptlearning.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/blackboxpromptlearning-300x239.png\" alt=\"\" width=\"300\" height=\"239\" class=\"aligncenter size-medium wp-image-2016\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/blackboxpromptlearning-300x239.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/blackboxpromptlearning-1024x817.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/blackboxpromptlearning-150x120.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/blackboxpromptlearning-768x613.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/blackboxpromptlearning.png 1371w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>46. Mind the gap: Challenges of deep learning approaches to Theory of Mind<\/h4>\n<p>Jaan Aru, Aqeel Labash, Oriol Corcoll, Raul Vicente. University of Tartu. ArXiv 2022.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2203.16540\">https:\/\/arxiv.org\/abs\/2203.16540<\/a><\/p>\n<p>An opinion paper on deep learning models in connection to the Theory of Mind &#8211; the skill of humans to understand the minds of others, imagine that they might have hidden knowledge or emotions. Gives a summary of different stages of this skill developing in humans, along with a review of this work in the deep learning field. Proposes that this is not a skill that will develop from one task, and that it should be evaluated through the interpretation of neural networks (for example whether a specific neuron can be identified to detect the emotion of others).<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mindthegap.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mindthegap-300x211.png\" alt=\"\" width=\"300\" height=\"211\" class=\"aligncenter size-medium wp-image-2017\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mindthegap-300x211.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mindthegap-1024x719.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mindthegap-150x105.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mindthegap-768x539.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/mindthegap.png 1209w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>47. Atomic Inference for NLI with Generated Facts as Atoms<\/h4>\n<p>Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Oana-Maria Camburu, Marek Rei. Imperial, Edinburgh, QMUL, UCL. EMNLP 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2305.13214\">https:\/\/arxiv.org\/abs\/2305.13214<\/a><\/p>\n<p>Long texts are broken down into individual self-contained facts using an LLM. A special architecture then learns to make decisions about the text by making individual entailment decisions about those facts. The resulting system is able to point to specific facts as explanations for the overall decision, as these are guaranteed to be a faithful explanation for the final prediction.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/factbased.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/factbased-1024x373.png\" alt=\"\" width=\"843\" height=\"307\" class=\"aligncenter size-large wp-image-2027\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/factbased-1024x373.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/factbased-300x109.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/factbased-150x55.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/factbased-768x280.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/factbased.png 1213w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>48. Continuous Predictive Modeling of Clinical Notes and ICD Codes in Patient Health Records<\/h4>\n<p>Mireia Hernandez Caralt, Clarence Boon Liang Ng, Marek Rei. Imperial. BioNLP 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/2405.11622\">https:\/\/arxiv.org\/pdf\/2405.11622<\/a><\/p>\n<p>Investigates the task of early prediction of hospital diagnoses and necessary procedures, based on textual notes in the electronic health records. A causal hierarchical model is created, which is able to make predictions about overall ICD codes at every timestep of the hospital stay. As the note sequences are very long, an extended context algorithm is proposed, which samples a subset of notes during training but is able to iteratively use the whole sequence during testing. <\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/lahst.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/lahst-300x210.png\" alt=\"\" width=\"300\" height=\"210\" class=\"aligncenter size-medium wp-image-2030\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/lahst-300x210.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/lahst-150x105.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/lahst-768x537.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/lahst.png 987w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>49. Modelling Temporal Document Sequences for Clinical ICD Coding<\/h4>\n<p>Clarence Boon Liang Ng, Diogo Santos, Marek Rei. Imperial, Transformative AI. EACL 2023.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2023.eacl-main.120.pdf\">https:\/\/aclanthology.org\/2023.eacl-main.120.pdf<\/a><\/p>\n<p>Assigning ICD codes to discharge summaries in electronic health records, which indicate the diagnoses and procedures for each patient. The model is designed to integrate additional information from the previous notes in the health record. Additive embeddings are used for representing metadata about each note. The system achieves state-of-the-art results on the task of ICD coding.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/htds.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/htds-1024x522.png\" alt=\"\" width=\"843\" height=\"430\" class=\"aligncenter size-large wp-image-2032\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/htds-1024x522.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/htds-300x153.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/htds-150x76.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/htds-768x392.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/htds.png 1265w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>50. When and Why Does Bias Mitigation Work ?<\/h4>\n<p>Abhilasha Ravichander, Joe Stacey, Marek Rei. Ai2, Imperial. EMNLP 2023.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2023.findings-emnlp.619\">https:\/\/aclanthology.org\/2023.findings-emnlp.619<\/a><\/p>\n<p>Targeted testing of different model debiasing methods, in order to investigate their effect on the model. Creating six datasets that contain very specific controlled biases for probing debiasing methods. Experiments show that specifically debiasing against one bias actually increases reliance against another bias.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/debiasing.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/debiasing-300x185.png\" alt=\"\" width=\"300\" height=\"185\" class=\"aligncenter size-medium wp-image-2034\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/debiasing-300x185.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/debiasing-1024x633.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/debiasing-150x93.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/debiasing-768x475.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/debiasing.png 1165w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>51. Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models<\/h4>\n<p>Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye. Imperial College London, Sense Street. USENIX Security Symposium 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2310.15007\">https:\/\/arxiv.org\/abs\/2310.15007<\/a><\/p>\n<p>Introducing the task of document-level membership inference for LLMs &#8211; determining whether a particular document (e.g. book or article) has been using during the LLM training, while only having query access to the resulting model. All token-level probabilities are collected from the language model, these are normalised by how rare each token is overall, and then aggregated into features and given to a supervised classifier. Experiments show that most documents can be detected, with the detection working better for longer documents.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/readmybook.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/readmybook-1024x342.png\" alt=\"\" width=\"843\" height=\"282\" class=\"aligncenter size-large wp-image-2021\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/readmybook-1024x342.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/readmybook-300x100.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/readmybook-150x50.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/readmybook-768x256.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/readmybook-1536x513.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/readmybook.png 1915w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>52. An Extended Sequence Tagging Vocabulary for Grammatical Error Correction<\/h4>\n<p>Stuart Mesham, Christopher Bryant, Marek Rei, Zheng Yuan. University of Cambridge, Imperial College London, King&#8217;s College London. EACL 2023.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2023.findings-eacl.119\">https:\/\/aclanthology.org\/2023.findings-eacl.119<\/a><\/p>\n<p>Introducing tool usage into a tagging-based grammatical error correction model. Instead of training the model to correct every type of error itself, the model detects when a word should be sent to a separate spellcheck or an inflection system. The modification improves performance on the targeted error types and overall, also leaving room for introducing additional tools for other error types.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gecvocab.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gecvocab-300x191.png\" alt=\"\" width=\"300\" height=\"191\" class=\"aligncenter size-medium wp-image-2023\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gecvocab-300x191.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gecvocab-1024x650.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gecvocab-150x95.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gecvocab-768x488.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/gecvocab.png 1110w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>53. On the application of Large Language Models for language teaching and assessment technology<\/h4>\n<p>Andrew Caines, Luca Benedetto, Shiva Taslimipoor, Christopher Davis, Yuan Gao, Oeistein Andersen, Zheng Yuan, Mark Elliott, Russell Moore, Christopher Bryant, Marek Rei, Helen Yannakoudakis, Andrew Mullooly, Diane Nicholls, Paula Buttery. Cambridge, KCL, CUPA, Writer Inc, Imperial, ELiT. AIED LLM 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2307.08393\">https:\/\/arxiv.org\/abs\/2307.08393<\/a><\/p>\n<p>Discussing the possible applications of generative language models in the area of language teaching. Covering tasks such as automated test creation, question difficulty estimation, automated essay scoring and feedback generation. The paper concludes that there is a lot of potential but for best results the automated systems need to be paired with human intervention.<\/p>\n<h4>54. Prompting open-source and commercial language models for grammatical error correction of English learner text<\/h4>\n<p>Christopher Davis, Andrew Caines, \u00d8istein Andersen, Shiva Taslimipoor, Helen Yannakoudakis, Zheng Yuan, Christopher Bryant, Marek Rei, Paula Buttery. Cambridge, Writer Inc, KCL, Imperial. ACL 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/2401.07702\">https:\/\/arxiv.org\/pdf\/2401.07702<\/a><\/p>\n<p>Investigating the abilities of LLMs to perform grammatical error correction. 7 open-source and 3 commercial models are evaluated with 11 different prompts on established error correction benchmarks. LLMs perform the best on fluency edits but do not come close to state-of-the-art performance on minimal corrections of grammatical errors.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/llmgec.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/llmgec-300x186.png\" alt=\"\" width=\"300\" height=\"186\" class=\"aligncenter size-medium wp-image-2037\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/llmgec-300x186.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/llmgec-1024x636.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/llmgec-150x93.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/llmgec-768x477.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/llmgec.png 1068w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h4>55. Control Prefixes for Parameter-Efficient Text Generation<\/h4>\n<p>Jordan Clive, Kris Cao, Marek Rei. Imperial, DeepMind, Cambridge. GEM 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.gem-1.31\">https:\/\/aclanthology.org\/2022.gem-1.31<\/a><\/p>\n<p>Introducing control prefixes, which are plug-and-play modules for influencing text generation into a particular direction. By switching on a particular control prefix, the model can generate text in a particular domain, in a particular style or at a specific length. Experiments also include predicting values for a previously unseen generation category. <\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/prefix-tuning.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/prefix-tuning-1024x492.png\" alt=\"\" width=\"843\" height=\"405\" class=\"aligncenter size-large wp-image-2039\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/prefix-tuning-1024x492.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/prefix-tuning-300x144.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/prefix-tuning-150x72.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/prefix-tuning-768x369.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/prefix-tuning-1536x738.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/prefix-tuning.png 1572w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>56. Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI Models<\/h4>\n<p>Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Marek Rei. Imperial College London, University of Edinburgh, UCL, Queen Mary University of London. EMNLP 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.emnlp-main.251\">https:\/\/aclanthology.org\/2022.emnlp-main.251<\/a><\/p>\n<p>A neural architecture for entailment detection (NLI) that has guarantees on the faithfulness of its explanations. Text is broken into shorter spans, the model makes predictions about each span separately and the final overall decision is found deterministically based on the span-level predictions. The updated model retains performance while being more explainable, and also outperforms all previous logical architectures for entailment detection.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/logical-reasoning.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/logical-reasoning-1024x498.png\" alt=\"\" width=\"843\" height=\"410\" class=\"aligncenter size-large wp-image-2020\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/logical-reasoning-1024x498.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/logical-reasoning-300x146.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/logical-reasoning-150x73.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/logical-reasoning-768x374.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/logical-reasoning-1536x748.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/logical-reasoning.png 1929w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>57. Probing for targeted syntactic knowledge through grammatical error detection<\/h4>\n<p>Christopher Davis, Christopher Bryant, Andrew Caines, Marek Rei, Paula Buttery. Cambridge, Imperial. CoNLL 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.conll-1.25\">https:\/\/aclanthology.org\/2022.conll-1.25<\/a><\/p>\n<p>Investigating how much pre-trained language models capture syntactic information and how well are they able to detect syntactic errors out-of-the-box. Different language models are frozen and small probes are trained on top of them to identify specific errors in text. Analysis shows that the final layers of ELECTRA and BERT capture subject-verb agreement errors best.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/gecprobes.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/gecprobes-1024x419.png\" alt=\"\" width=\"843\" height=\"345\" class=\"aligncenter size-large wp-image-2041\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/gecprobes-1024x419.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/gecprobes-300x123.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/gecprobes-150x61.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/gecprobes-768x314.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/10\/gecprobes.png 1321w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>58. Memorisation versus Generalisation in Pre-trained Language Models<\/h4>\n<p>Michael T\u00e4nzer, Sebastian Ruder, Marek Rei. Imperial, Google Research. ACL 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.acl-long.521\">https:\/\/aclanthology.org\/2022.acl-long.521<\/a><\/p>\n<p>Investigating the learning abilities of language models under controlled experiments. Results show that LMs are surprisingly resilient to noise in the training data, with the resulting performance being nearly unaffected as long as the learning is stopped at an optimal time. However, the models are not able to differentiate between label noise and low-resource classes, with overall performance deteriorating just as the rare classes start to be learned. The paper proposes a model based on class prototypes to get the best of both worlds.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/memorization.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/memorization-1024x509.png\" alt=\"\" width=\"843\" height=\"419\" class=\"aligncenter size-large wp-image-2045\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/memorization-1024x509.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/memorization-300x149.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/memorization-150x75.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/memorization-768x382.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/memorization.png 1389w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>59. Multimodal Conversation Modelling for Topic Derailment Detection<\/h4>\n<p>Zhenhao Li, Marek Rei, Lucia Specia. Imperial. EMNLP 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.findings-emnlp.376\">https:\/\/aclanthology.org\/2022.findings-emnlp.376<\/a><\/p>\n<p>Creating and releasing a dataset of reddit threads that contain images. Posts that derail the conversation are then identified and annotated with the derailment type, such as starting a new topic, spamming or making toxic comments. A multimodal architecture for this task is also described, which encodes the post, the image and the context in order to make accurate decisions.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/mmconv.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/mmconv-1024x614.png\" alt=\"\" width=\"843\" height=\"505\" class=\"aligncenter size-large wp-image-2043\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/mmconv-1024x614.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/mmconv-300x180.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/mmconv-150x90.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/mmconv-768x461.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/mmconv.png 1077w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>60. DiffuseDef: Improved Robustness to Adversarial Attacks<\/h4>\n<p>Zhenhao Li, Marek Rei, Lucia Specia. Imperial. ArXiv.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2407.00248\">https:\/\/arxiv.org\/abs\/2407.00248<\/a><\/p>\n<p>Introducing a defusion module as a denoising step for text representations in order to prevent adversarial attacks.<br \/>\nThe diffusion layer is trained on top of a frozen encoder to predict the randomly inserted noise in the representation.<br \/>\nDuring inference, this noise is then subtracted over multiple iterations to get a clean representation.<br \/>\nResults show state-of-the-art results for resisting adversarial attacks. <\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/diffusedef.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/diffusedef-1024x604.png\" alt=\"\" width=\"843\" height=\"497\" class=\"aligncenter size-large wp-image-2048\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/diffusedef-1024x604.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/diffusedef-300x177.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/diffusedef-150x88.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/diffusedef-768x453.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/diffusedef.png 1199w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>61. Supervising Model Attention with Human Explanations for Robust Natural Language Inference<\/h4>\n<p>Joe Stacey, Yonatan Belinkov, Marek Rei. Imperial, Technion. AAAI 2022.<br \/>\n<a href=\"https:\/\/cdn.aaai.org\/ojs\/21386\/21386-13-25399-1-2-20220628.pdf\">https:\/\/cdn.aaai.org\/ojs\/21386\/21386-13-25399-1-2-20220628.pdf<\/a><\/p>\n<p>Investigating how natural language explanations of particular label assignments can be used to improve model performance.<br \/>\nThe method identifies important words in the input, either based on explanation text or highlights, and then trains the model to assign higher self-attention weights to those tokens. Experiments show that this method consistently improves performance, making the model focus more on the relevant evidence.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/explanations.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/explanations-260x300.png\" alt=\"\" width=\"260\" height=\"300\" class=\"aligncenter size-medium wp-image-2049\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/explanations-260x300.png 260w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/explanations-130x150.png 130w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/explanations.png 676w\" sizes=\"auto, (max-width: 260px) 100vw, 260px\" \/><\/a><\/p>\n<h4>62. Guiding Visual Question Generation<\/h4>\n<p>Nihir Vedd, Zixu Wang, Marek Rei, Yishu Miao, Lucia Specia. Imperial. NAACL 2022.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2022.naacl-main.118.pdf\">https:\/\/aclanthology.org\/2022.naacl-main.118.pdf<\/a><\/p>\n<p>Investigates guiding the process of visual question generation, where the system generates questions about a given image.<br \/>\nThe architecture allows the user to specify categories or objects in the image, which the system will then ask about.<br \/>\nThese choices can also be modeled as latent variables inside the model, leading to state-of-the-art results in regular VQG.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/vqg.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/vqg-1024x301.png\" alt=\"\" width=\"843\" height=\"248\" class=\"aligncenter size-large wp-image-2051\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/vqg-1024x301.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/vqg-300x88.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/vqg-150x44.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/vqg-768x226.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/vqg-1536x452.png 1536w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/vqg.png 1563w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>63. Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers<\/h4>\n<p>Kamil Bujel, Andrew Caines, Helen Yannakoudakis, Marek Rei. Imperial, Cambridge, KCL. Arxiv 2023.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/2303.07991\">https:\/\/arxiv.org\/pdf\/2303.07991<\/a><\/p>\n<p>Designing a hierarchical transformer model that is able to explain itself by pointing to relevant tokens in the input.<br \/>\nIt indirectly supervises the attention on a certain number of tokens at each turn, in order to make the model behave like a binary importance classifier on the token level. Previous methods that work well on shorter texts (e.g. individual sentences) are shown to not work well when applied to longer texts.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/haystack.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/haystack-1024x595.png\" alt=\"\" width=\"843\" height=\"490\" class=\"aligncenter size-large wp-image-2053\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/haystack-1024x595.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/haystack-300x174.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/haystack-150x87.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/haystack-768x446.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/haystack.png 1116w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>64. Distilling Robustness into Natural Language Inference Models with Domain-Targeted Augmentation<\/h4>\n<p>Joe Stacey, Marek Rei. Imperial. ACL 2024.<br \/>\n<a href=\"https:\/\/aclanthology.org\/2024.findings-acl.132.pdf\">https:\/\/aclanthology.org\/2024.findings-acl.132.pdf<\/a><\/p>\n<p>Investigating model distillation methods that would be better able to generalise to previously unseen domains. The methods either generate new unlabeled examples or upweight existing examples from the training data that are more similar to a particular domain, then use these as input during the distilling process. Experiments show that this increases robustness also towards domains which are not considered during distillation.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/distillation.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/distillation.png\" alt=\"\" width=\"994\" height=\"639\" class=\"aligncenter size-full wp-image-2055\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/distillation.png 994w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/distillation-300x193.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/distillation-150x96.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/distillation-768x494.png 768w\" sizes=\"auto, (max-width: 994px) 100vw, 994px\" \/><\/a><\/p>\n<h4>65. The alignment of companies&#8217; sustainability behavior and emissions with global climate targets<\/h4>\n<p>Simone Cenci, Matteo Burato, Marek Rei, Maurizio Zollo. Imperial. Nature Communications 2024.<\/p>\n<p>Applying NLP systems to analyse thousands of company reports and the sustainability initiatives described in those reports.<br \/>\nThe system crawls public reports online, extracts sentences that refer to sustainability initiatives implemented by that company, and classifies them based on type, stakeholder and one of the 17 Sustainable Development Goals established by the UN.<br \/>\nAnalysis indicates that companies are mostly investing in risk-prevention initiatives, as opposed to innovation and cooperation.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/sdg.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/sdg-1024x399.png\" alt=\"\" width=\"843\" height=\"328\" class=\"aligncenter size-large wp-image-2058\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/sdg-1024x399.png 1024w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/sdg-300x117.png 300w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/sdg-150x58.png 150w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/sdg-768x299.png 768w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/sdg.png 1132w\" sizes=\"auto, (max-width: 843px) 100vw, 843px\" \/><\/a><\/p>\n<h4>66. Predicting cell type-specific epigenomic profiles accounting for distal genetic effects<\/h4>\n<p>Alan E Murphy, William Beardall, Marek Rei, Mike Phuycharoen, Nathan G Skene. Imperial, Manchester. Nature Communications 2024.<br \/>\n<a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2024.02.15.580484\">https:\/\/www.biorxiv.org\/content\/10.1101\/2024.02.15.580484<\/a><\/p>\n<p>Using architectures based on pre-trained transformer language models and extending them for the domain of genome modeling.<br \/>\nProposing and releasing Enformer Celltyping, which can incorporate long-range context of DNA base pairs and predict epigenetic signals while being cell type-agnostic.<\/p>\n<p><a href=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/celltype.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/celltype-273x300.png\" alt=\"\" width=\"273\" height=\"300\" class=\"aligncenter size-medium wp-image-2062\" srcset=\"https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/celltype-273x300.png 273w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/celltype-136x150.png 136w, https:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/11\/celltype.png 731w\" sizes=\"auto, (max-width: 273px) 100vw, 273px\" \/><\/a><\/p>\n<h4>67. SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)<\/h4>\n<p>Matthieu Meeus, Igor Shilov, Shubham Jain, Manuel Faysse, Marek Rei, Yves-Alexandre de Montjoye. Imperial, Sense Street, Universit\u00e9 Paris-Saclay. Arxiv 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/2406.17975\">https:\/\/arxiv.org\/pdf\/2406.17975<\/a><\/p>\n<p>In order to develop methods for detecting whether specific copyrighted work has been used for training a particular LLM, researchers have collected datasets of known documents that are (or are not) part of specific training sets. However, these datasets are normally collected post-hoc, leading to distibution differences. The experiments show that many detection models misleadingly react to those differences, instead of truly detecting the documents in the training data. Suggestions are provided for preventing this issue in future experiments.<\/p>\n<h4>68. StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models<\/h4>\n<p>Nikolai Rozanov, Marek Rei. Imperial. Arxiv 2024.<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2410.02810\">https:\/\/arxiv.org\/abs\/2410.02810<\/a><\/p>\n<p>Improving LLMs for long-range planning and reasoning, for tasks which require a large number of steps to complete.<br \/>\nAs the length of the few-shot examples and the generated trace grows longer, LLMs can lose track of what they are supposed to do and what they have done already. Periodically reminding them of the task and explicitly keeping track of their state provides consistent improvements in performance, establishing a new state-of-the-art result for the Alfworld benchmark.<\/p>\n<script>(function() {\n\twindow.mc4wp = window.mc4wp || {\n\t\tlisteners: [],\n\t\tforms: {\n\t\t\ton: function(evt, cb) {\n\t\t\t\twindow.mc4wp.listeners.push(\n\t\t\t\t\t{\n\t\t\t\t\t\tevent   : evt,\n\t\t\t\t\t\tcallback: cb\n\t\t\t\t\t}\n\t\t\t\t);\n\t\t\t}\n\t\t}\n\t}\n})();\n<\/script><!-- Mailchimp for WordPress v4.9.18 - https:\/\/wordpress.org\/plugins\/mailchimp-for-wp\/ --><form id=\"mc4wp-form-1\" class=\"mc4wp-form mc4wp-form-1930\" method=\"post\" data-id=\"1930\" data-name=\"Sign up\" ><div class=\"mc4wp-form-fields\"><br\/>\r\n\r\n<div style=\"background-color:#e7e7e7; padding: 5%; font-size: small; margin:0 10%\">\r\n<p style=\"color:#686868; ;\">If you would like to receive email updates about the latest articles, leave your details below.<\/p>\r\n\r\n<p>\r\n    <label style=\"color:#686868\">Name:<\/label>\r\n    <input type=\"text\" name=\"MMERGE6\">\r\n<\/p>\r\n<p>\r\n  <label style=\"color:#686868\">Email address:<\/label><br\/>\r\n  <input type=\"email\" name=\"EMAIL\" placeholder=\"\" required \/>\r\n<\/p>\r\n<input type=\"submit\" value=\"Sign up\" \/>\r\n<\/div><\/div><label style=\"display: none !important;\">Leave this field empty if you're human: <input type=\"text\" name=\"_mc4wp_honeypot\" value=\"\" tabindex=\"-1\" autocomplete=\"off\" \/><\/label><input type=\"hidden\" name=\"_mc4wp_timestamp\" value=\"1784065118\" \/><input type=\"hidden\" name=\"_mc4wp_form_id\" value=\"1930\" \/><input type=\"hidden\" name=\"_mc4wp_form_element_id\" value=\"mc4wp-form-1\" \/><div class=\"mc4wp-response\"><\/div><\/form><!-- \/ Mailchimp for WordPress Plugin -->\n","protected":false},"excerpt":{"rendered":"<p>I have written short summaries of 68 different research papers published in the areas of Machine Learning and Natural Language Processing. They cover a wide&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1932","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>68 Summaries of Machine Learning and NLP Research - Marek Rei<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"68 Summaries of Machine Learning and NLP Research - Marek Rei\" \/>\n<meta property=\"og:description\" content=\"I have written short summaries of 68 different research papers published in the areas of Machine Learning and Natural Language Processing. They cover a wide&hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/\" \/>\n<meta property=\"og:site_name\" content=\"Marek Rei\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-04T11:47:27+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png\" \/>\n<meta name=\"author\" content=\"Marek\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marek\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"44 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/\",\"url\":\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/\",\"name\":\"68 Summaries of Machine Learning and NLP Research - Marek Rei\",\"isPartOf\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png\",\"datePublished\":\"2024-11-04T11:47:27+00:00\",\"dateModified\":\"2024-11-04T11:47:27+00:00\",\"author\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/a145eb0a06ed4acf5b0f84a24b7a1191\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#primaryimage\",\"url\":\"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png\",\"contentUrl\":\"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.marekrei.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"68 Summaries of Machine Learning and NLP Research\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/#website\",\"url\":\"https:\/\/www.marekrei.com\/blog\/\",\"name\":\"Marek Rei\",\"description\":\"Thoughts on Machine Learning and Natural Language Processing\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.marekrei.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/a145eb0a06ed4acf5b0f84a24b7a1191\",\"name\":\"Marek\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/48a65414bfda6485aaa0703e548de0ed25292b5fe0d979ed8c28ad83cf5a82c0?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/48a65414bfda6485aaa0703e548de0ed25292b5fe0d979ed8c28ad83cf5a82c0?s=96&d=mm&r=g\",\"caption\":\"Marek\"},\"url\":\"https:\/\/www.marekrei.com\/blog\/author\/marek\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"68 Summaries of Machine Learning and NLP Research - Marek Rei","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/","og_locale":"en_US","og_type":"article","og_title":"68 Summaries of Machine Learning and NLP Research - Marek Rei","og_description":"I have written short summaries of 68 different research papers published in the areas of Machine Learning and Natural Language Processing. They cover a wide&hellip;","og_url":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/","og_site_name":"Marek Rei","article_published_time":"2024-11-04T11:47:27+00:00","og_image":[{"url":"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png"}],"author":"Marek","twitter_misc":{"Written by":"Marek","Est. reading time":"44 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/","url":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/","name":"68 Summaries of Machine Learning and NLP Research - Marek Rei","isPartOf":{"@id":"https:\/\/www.marekrei.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#primaryimage"},"image":{"@id":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#primaryimage"},"thumbnailUrl":"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png","datePublished":"2024-11-04T11:47:27+00:00","dateModified":"2024-11-04T11:47:27+00:00","author":{"@id":"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/a145eb0a06ed4acf5b0f84a24b7a1191"},"breadcrumb":{"@id":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#primaryimage","url":"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png","contentUrl":"http:\/\/www.marekrei.com\/blog\/wp-content\/uploads\/2024\/07\/promptbench-300x214.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.marekrei.com\/blog\/68-summaries-of-machine-learning-and-nlp-research\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.marekrei.com\/blog\/"},{"@type":"ListItem","position":2,"name":"68 Summaries of Machine Learning and NLP Research"}]},{"@type":"WebSite","@id":"https:\/\/www.marekrei.com\/blog\/#website","url":"https:\/\/www.marekrei.com\/blog\/","name":"Marek Rei","description":"Thoughts on Machine Learning and Natural Language Processing","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.marekrei.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/a145eb0a06ed4acf5b0f84a24b7a1191","name":"Marek","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.marekrei.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/48a65414bfda6485aaa0703e548de0ed25292b5fe0d979ed8c28ad83cf5a82c0?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/48a65414bfda6485aaa0703e548de0ed25292b5fe0d979ed8c28ad83cf5a82c0?s=96&d=mm&r=g","caption":"Marek"},"url":"https:\/\/www.marekrei.com\/blog\/author\/marek\/"}]}},"_links":{"self":[{"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/posts\/1932","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/comments?post=1932"}],"version-history":[{"count":68,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/posts\/1932\/revisions"}],"predecessor-version":[{"id":2070,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/posts\/1932\/revisions\/2070"}],"wp:attachment":[{"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/media?parent=1932"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/categories?post=1932"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.marekrei.com\/blog\/wp-json\/wp\/v2\/tags?post=1932"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}