Skip to main content

Showing 1–42 of 42 results for author: Mikolov, T

.
  1. arXiv:2404.02305  [pdf, other

    cs.CL cs.AI

    Collapse of Self-trained Language Models

    Authors: David Herel, Tomas Mikolov

    Abstract: In various fields of knowledge creation, including science, new ideas often build on pre-existing information. In this work, we explore this concept within the context of language models. Specifically, we explore the potential of self-training models on their own outputs, akin to how humans learn and build on their previous thoughts and actions. While this approach is intuitively appealing, our re… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  2. arXiv:2402.06196  [pdf, other

    cs.CL cs.AI

    Large Language Models: A Survey

    Authors: Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao

    Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffman… ▽ More

    Submitted 20 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2401.14423

  3. arXiv:2312.03735  [pdf, other

    cs.CL cs.AI

    Advancing State of the Art in Language Modeling

    Authors: David Herel, Tomas Mikolov

    Abstract: Generalization is arguably the most important goal of statistical language modeling research. Publicly available benchmarks and papers published with an open-source code have been critical to advancing the field. However, it is often very difficult, and sometimes even impossible, to reproduce the results fully as reported in publications. In this paper, we propose a simple framework that should he… ▽ More

    Submitted 28 November, 2023; originally announced December 2023.

  4. Preserving Semantics in Textual Adversarial Attacks

    Authors: David Herel, Hugo Cisneros, Tomas Mikolov

    Abstract: The growth of hateful online content, or hate speech, has been associated with a global increase in violent crimes against minorities [23]. Harmful online content can be produced easily, automatically and anonymously. Even though, some form of auto-detection is already achieved through text classifiers in NLP, they can be fooled by adversarial attacks. To strengthen existing systems and stay ahead… ▽ More

    Submitted 5 October, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 8 pages, 4 figures

    Journal ref: ECAI 2023

  5. arXiv:2210.02549  [pdf, other

    cs.LG

    Benchmarking Learning Efficiency in Deep Reservoir Computing

    Authors: Hugo Cisneros, Josef Sivic, Tomas Mikolov

    Abstract: It is common to evaluate the performance of a machine learning model by measuring its predictive power on a test dataset. This approach favors complicated models that can smoothly fit complex functions and generalize well from training data points. Although essential components of intelligence, speed and data efficiency of this learning process are rarely reported or compared between different can… ▽ More

    Submitted 29 September, 2022; originally announced October 2022.

    Comments: Conference on Lifelong Learning Agents, Aug 2022, Montreal, Canada

  6. arXiv:2207.04857  [pdf, other

    cs.NE cs.AI cs.LG

    Emergence of Novelty in Evolutionary Algorithms

    Authors: David Herel, Dominika Zogatova, Matej Kripner, Tomas Mikolov

    Abstract: One of the main problems of evolutionary algorithms is the convergence of the population to local minima. In this paper, we explore techniques that can avoid this problem by encouraging a diverse behavior of the agents through a shared reward system. The rewards are randomly distributed in the environment, and the agents are only rewarded for collecting them first. This leads to an emergence of a… ▽ More

    Submitted 3 August, 2022; v1 submitted 27 June, 2022; originally announced July 2022.

    Comments: ALIFE 2022

    Journal ref: Artificial Life Conference Proceedings 2022. MIT Press

  7. arXiv:2111.15588  [pdf, other

    cs.CL

    SimpleTRON: Simple Transformer with O(N) Complexity

    Authors: Uladzislau Yorsh, Alexander Kovalenko, Vojtěch Vančura, Daniel Vašata, Pavel Kordík, Tomáš Mikolov

    Abstract: In this paper, we propose that the dot product pairwise matching attention layer, which is widely used in Transformer-based models, is redundant for the model performance. Attention, in its original formulation, has to be seen rather as a human-level tool to explore and/or visualize relevancy scores in sequential data. However, the way how it is constructed leads to significant computational compl… ▽ More

    Submitted 28 June, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

  8. arXiv:2108.01573  [pdf, other

    cs.AI cs.NE nlin.CD nlin.CG

    Classification of Discrete Dynamical Systems Based on Transients

    Authors: Barbora Hudcová, Tomáš Mikolov

    Abstract: In order to develop systems capable of artificial evolution, we need to identify which systems can produce complex behavior. We present a novel classification method applicable to any class of deterministic discrete space and time dynamical systems. The method is based on classifying the asymptotic behavior of the average computation time in a given system before entering a loop. We were able to i… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: 15 pages. arXiv admin note: substantial text overlap with arXiv:2008.13503

  9. arXiv:2108.00415  [pdf, other

    cs.NE cs.AI nlin.CG

    Computational Hierarchy of Elementary Cellular Automata

    Authors: Barbora Hudcová, Tomáš Mikolov

    Abstract: The complexity of cellular automata is traditionally measured by their computational capacity. However, it is difficult to choose a challenging set of computational tasks suitable for the parallel nature of such systems. We study the ability of automata to emulate one another, and we use this notion to define such a set of naturally emerging tasks. We present the results for elementary cellular au… ▽ More

    Submitted 1 August, 2021; originally announced August 2021.

    Comments: 8 pages

    Journal ref: The 2021 Conference on Artificial Life Proceedings, 2021, 353--360

  10. Visualizing computation in large-scale cellular automata

    Authors: Hugo Cisneros, Josef Sivic, Tomas Mikolov

    Abstract: Emergent processes in complex systems such as cellular automata can perform computations of increasing complexity, and could possibly lead to artificial evolution. Such a feat would require scaling up current simulation sizes to allow for enough computational capacity. Understanding complex computations happening in cellular automata and other systems capable of emergence poses many challenges, es… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Journal ref: Artificial Life Conference Proceedings 2020 (pp. 239-247). MIT Press

  11. arXiv:2103.08245  [pdf, other

    nlin.AO cs.NE

    Emergence of Self-Reproducing Metabolisms as Recursive Algorithms in an Artificial Chemistry

    Authors: Germán Kruszewski, Tomas Mikolov

    Abstract: One of the main goals of Artificial Life is to research the conditions for the emergence of life, not necessarily as it is, but as it could be. Artificial Chemistries are one of the most important tools for this purpose because they provide us with a basic framework to investigate under which conditions metabolisms capable of reproducing themselves, and ultimately, of evolving, can emerge. While t… ▽ More

    Submitted 7 December, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: text overlap with arXiv:2003.07916

  12. Classification of Complex Systems Based on Transients

    Authors: Barbora Hudcova, Tomas Mikolov

    Abstract: In order to develop systems capable of modeling artificial life, we need to identify, which systems can produce complex behavior. We present a novel classification method applicable to any class of deterministic discrete space and time dynamical systems. The method distinguishes between different asymptotic behaviors of a system's average computation time before entering a loop. When applied to el… ▽ More

    Submitted 31 August, 2020; originally announced August 2020.

    Comments: 9 pages

    Journal ref: Artificial Life Conference Proceedings 32 (2020), 367-375

  13. arXiv:2004.03340  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Online Continual Learning with CALM

    Authors: Germán Kruszewski, Ionut-Teodor Sorodoc, Tomas Mikolov

    Abstract: Online Continual Learning (OCL) studies learning over a continuous data stream without observing any single example more than once, a setting that is closer to the experience of humans and systems that must learn "on-the-wild". Yet, commonly available benchmarks are far from these real-world conditions, because they explicitly signal different tasks, lack latent similarity structure or assume temp… ▽ More

    Submitted 1 February, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

  14. arXiv:2003.07916  [pdf, other

    nlin.AO cs.NE q-bio.MN

    Combinatory Chemistry: Towards a Simple Model of Emergent Evolution

    Authors: Germán Kruszewski, Tomas Mikolov

    Abstract: An explanatory model for the emergence of evolvable units must display emerging structures that (1) preserve themselves in time (2) self-reproduce and (3) tolerate a certain amount of variation when reproducing. To tackle this challenge, here we introduce Combinatory Chemistry, an Algorithmic Artificial Chemistry based on a minimalistic computational paradigm named Combinatory Logic. The dynamics… ▽ More

    Submitted 19 June, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

  15. Evolving Structures in Complex Systems

    Authors: Hugo Cisneros, Josef Sivic, Tomas Mikolov

    Abstract: In this paper we propose an approach for measuring growth of complexity of emerging patterns in complex systems such as cellular automata. We discuss several ways how a metric for measuring the complexity growth can be defined. This includes approaches based on compression algorithms and artificial neural networks. We believe such a metric can be useful for designing systems that could exhibit ope… ▽ More

    Submitted 18 March, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: IEEE Symposium Series on Computational Intelligence 2019 (IEEE SSCI 2019)

    Journal ref: Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence

  16. arXiv:1910.06241  [pdf, ps, other

    cs.CL

    Updating Pre-trained Word Vectors and Text Classifiers using Monolingual Alignment

    Authors: Piotr Bojanowski, Onur Celebi, Tomas Mikolov, Edouard Grave, Armand Joulin

    Abstract: In this paper, we focus on the problem of adapting word vector-based models to new textual data. Given a model pre-trained on large reference data, how can we adapt it to a smaller piece of data with a slightly different language distribution? We frame the adaptation problem as a monolingual word vector alignment problem, and simply average models after alignment. We align vectors using the RCSLS… ▽ More

    Submitted 15 October, 2019; v1 submitted 14 October, 2019; originally announced October 2019.

  17. arXiv:1910.04861  [pdf, other

    cs.CV

    Place Deduplication with Embeddings

    Authors: Carl Yang, Do Huy Hoang, Tomas Mikolov, Jiawei Han

    Abstract: Thanks to the advancing mobile location services, people nowadays can post about places to share visiting experience on-the-go. A large place graph not only helps users explore interesting destinations, but also provides opportunities for understanding and modeling the real world. To improve coverage and flexibility of the place graph, many platforms import places data from multiple sources, which… ▽ More

    Submitted 28 September, 2019; originally announced October 2019.

    Comments: Published at WWW 2019

  18. arXiv:1804.07745  [pdf, other

    cs.CL cs.LG

    Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

    Authors: Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave

    Abstract: Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space. Existing works typically solve a least-square regression problem to learn a rotation aligning a small bilingual lexicon, and use a retrieval criterion for inference. In this paper, we propose an unified formulation that directly optimizes a retrieval crit… ▽ More

    Submitted 5 September, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

  19. arXiv:1802.06893  [pdf, ps, other

    cs.CL cs.LG

    Learning Word Vectors for 157 Languages

    Authors: Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov

    Abstract: Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the successful application of these representations is to train them on very large corpora, and use these pre-trained models in downstream tasks. In this paper, we describe how we trained such high quality word repr… ▽ More

    Submitted 28 March, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

    Comments: Accepted to LREC

  20. arXiv:1802.02892  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Efficient Large-Scale Multi-Modal Classification

    Authors: D. Kiela, E. Grave, A. Joulin, T. Mikolov

    Abstract: While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities… ▽ More

    Submitted 6 February, 2018; originally announced February 2018.

    Comments: Published at AAAI-18, 7 pages

  21. arXiv:1712.09405  [pdf, ps, other

    cs.CL

    Advances in Pre-Training Distributed Word Representations

    Authors: Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin

    Abstract: Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available… ▽ More

    Submitted 26 December, 2017; originally announced December 2017.

  22. arXiv:1710.10881  [pdf, ps, other

    stat.ML cs.LG

    Fast Linear Model for Knowledge Graph Embeddings

    Authors: Armand Joulin, Edouard Grave, Piotr Bojanowski, Maximilian Nickel, Tomas Mikolov

    Abstract: This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings. By casting knowledge base completion and question answering as supervised classification problems, we observe that modeling co-occurences of entities and relations leads to state-of-the-art performance with a training time of a few minutes using the open sourced… ▽ More

    Submitted 30 October, 2017; originally announced October 2017.

    Comments: Submitted AKBC 2017

  23. arXiv:1703.08864  [pdf, ps, other

    cs.CL

    Learning Simpler Language Models with the Differential State Framework

    Authors: Alexander G. Ororbia II, Tomas Mikolov, David Reitter

    Abstract: Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling. Existing architectures that address the issue are often complex and costly to train. The Differential State Framework (DSF) is a simple and high-performing design that unifies previously introduced gated neural models. DSF models maintain longer-term… ▽ More

    Submitted 16 July, 2017; v1 submitted 26 March, 2017; originally announced March 2017.

    Comments: Edits/revisions applied throughout document

  24. arXiv:1701.08954  [pdf, ps, other

    cs.LG cs.AI cs.CL

    CommAI: Evaluating the first steps towards a useful general AI

    Authors: Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov

    Abstract: With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal. However, most current research focuses instead on important but narrow applications, such as image classification or machine translation. We believe this to be largely due to the lack of objective ways to measure progress towards broad machine intelligence. In or… ▽ More

    Submitted 27 March, 2017; v1 submitted 31 January, 2017; originally announced January 2017.

    Comments: Published in ICLR 2017 Workshop Track

  25. arXiv:1612.03651  [pdf, other

    cs.CL cs.LG

    FastText.zip: Compressing text classification models

    Authors: Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov

    Abstract: We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory. After considering different solutions inspired by the hashing literature, we propose a method built upon product quantization to store word embeddings. While the original technique leads to a loss in accuracy, we adapt this method to circumvent quantizati… ▽ More

    Submitted 12 December, 2016; originally announced December 2016.

    Comments: Submitted to ICLR 2017

  26. arXiv:1611.06188  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Variable Computation in Recurrent Neural Networks

    Authors: Yacine Jernite, Edouard Grave, Armand Joulin, Tomas Mikolov

    Abstract: Recurrent neural networks (RNNs) have been used extensively and with increasing success to model various types of sequential data. Much of this progress has been achieved through devising recurrent units and architectures with the flexibility to capture complex statistics in the data, such as long range dependency or localized attention phenomena. However, while many sequential data (such as video… ▽ More

    Submitted 2 March, 2017; v1 submitted 18 November, 2016; originally announced November 2016.

  27. arXiv:1607.04606  [pdf, other

    cs.CL cs.LG

    Enriching Word Vectors with Subword Information

    Authors: Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov

    Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgra… ▽ More

    Submitted 19 June, 2017; v1 submitted 15 July, 2016; originally announced July 2016.

    Comments: Accepted to TACL. The two first authors contributed equally

  28. arXiv:1607.01759  [pdf, ps, other

    cs.CL

    Bag of Tricks for Efficient Text Classification

    Authors: Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov

    Abstract: This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore~CPU, and classify half a… ▽ More

    Submitted 9 August, 2016; v1 submitted 6 July, 2016; originally announced July 2016.

  29. arXiv:1511.08130  [pdf, other

    cs.AI cs.CL

    A Roadmap towards Machine Intelligence

    Authors: Tomas Mikolov, Armand Joulin, Marco Baroni

    Abstract: The development of intelligent machines is one of the biggest unsolved challenges in computer science. In this paper, we propose some fundamental properties these machines should have, focusing in particular on communication and learning. We discuss a simple environment that could be used to incrementally teach a machine the basics of natural-language-based communication, as a prerequisite to more… ▽ More

    Submitted 26 February, 2016; v1 submitted 25 November, 2015; originally announced November 2015.

  30. arXiv:1511.07275  [pdf, other

    cs.AI cs.LG

    Learning Simple Algorithms from Examples

    Authors: Wojciech Zaremba, Tomas Mikolov, Armand Joulin, Rob Fergus

    Abstract: We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their abilit… ▽ More

    Submitted 23 November, 2015; v1 submitted 23 November, 2015; originally announced November 2015.

  31. arXiv:1511.06303  [pdf, ps, other

    cs.LG cs.CL

    Alternative structures for character-level RNNs

    Authors: Piotr Bojanowski, Armand Joulin, Tomas Mikolov

    Abstract: Recurrent neural networks are convenient and efficient models for language modeling. However, when applied on the level of characters instead of words, they suffer from several problems. In order to successfully model long-term dependencies, the hidden representation needs to be large. This in turn implies higher computational costs, which can become prohibitive in practice. We propose two alterna… ▽ More

    Submitted 24 November, 2015; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: First revision. Updated Table 3, extended Sec. 5.3 and added a paragraph to the conclusion,

  32. arXiv:1503.01007  [pdf, other

    cs.NE cs.LG

    Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

    Authors: Armand Joulin, Tomas Mikolov

    Abstract: Despite the recent achievements in machine learning, we are still very far from achieving real artificial intelligence. In this paper, we discuss the limitations of standard deep learning approaches and show that some of these limitations can be overcome by learning how to grow the complexity of a model in a structured way. Specifically, we study the simplest sequence prediction problems that are… ▽ More

    Submitted 1 June, 2015; v1 submitted 3 March, 2015; originally announced March 2015.

  33. arXiv:1502.05698  [pdf, ps, other

    cs.AI cs.CL stat.ML

    Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

    Authors: Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, Tomas Mikolov

    Abstract: One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent. To measure progress towards that goal, we argue for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering. Our tasks measure understanding in several ways: whether a system is a… ▽ More

    Submitted 31 December, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

  34. arXiv:1412.7753  [pdf, other

    cs.NE cs.LG

    Learning Longer Memory in Recurrent Neural Networks

    Authors: Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato

    Abstract: Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly… ▽ More

    Submitted 16 April, 2015; v1 submitted 24 December, 2014; originally announced December 2014.

  35. arXiv:1412.5335  [pdf, ps, other

    cs.CL cs.IR cs.LG cs.NE

    Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

    Authors: Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio

    Abstract: Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review). In the simplest settings, we discriminate only between positive and negative sentiment, turning the task into a standard binary classification problem. We compare several ma- chine learning approaches to this problem, and combine them to achieve the best… ▽ More

    Submitted 27 May, 2015; v1 submitted 17 December, 2014; originally announced December 2014.

  36. arXiv:1405.4053  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Distributed Representations of Sentences and Documents

    Authors: Quoc V. Le, Tomas Mikolov

    Abstract: Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equal… ▽ More

    Submitted 22 May, 2014; v1 submitted 16 May, 2014; originally announced May 2014.

  37. arXiv:1312.5650  [pdf, other

    cs.LG

    Zero-Shot Learning by Convex Combination of Semantic Embeddings

    Authors: Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean

    Abstract: Several recent publications have proposed methods for mapping images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents of… ▽ More

    Submitted 21 March, 2014; v1 submitted 19 December, 2013; originally announced December 2013.

  38. arXiv:1312.3005  [pdf, ps, other

    cs.CL

    One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

    Authors: Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson

    Abstract: We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the… ▽ More

    Submitted 4 March, 2014; v1 submitted 10 December, 2013; originally announced December 2013.

    Comments: Accompanied by a code.google.com project allowing anyone to generate the benchmark data, and use it to compare their language model against the ones described in the paper

  39. arXiv:1310.4546  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Distributed Representations of Words and Phrases and their Compositionality

    Authors: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean

    Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and… ▽ More

    Submitted 16 October, 2013; originally announced October 2013.

  40. arXiv:1309.4168  [pdf, other

    cs.CL

    Exploiting Similarities among Languages for Machine Translation

    Authors: Tomas Mikolov, Quoc V. Le, Ilya Sutskever

    Abstract: Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data. It uses… ▽ More

    Submitted 16 September, 2013; originally announced September 2013.

  41. arXiv:1301.3781  [pdf, other

    cs.CL

    Efficient Estimation of Word Representations in Vector Space

    Authors: Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean

    Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e.… ▽ More

    Submitted 6 September, 2013; v1 submitted 16 January, 2013; originally announced January 2013.

  42. arXiv:1211.5063  [pdf, other

    cs.LG

    On the difficulty of training Recurrent Neural Networks

    Authors: Razvan Pascanu, Tomas Mikolov, Yoshua Bengio

    Abstract: There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective s… ▽ More

    Submitted 15 February, 2013; v1 submitted 21 November, 2012; originally announced November 2012.

    Comments: Improved description of the exploding gradient problem and description and analysis of the vanishing gradient problem