Menu Close

Category: Uncategorized

Transforming Images to Feature Vectors

I’m keen to explore some challenges in multimodal learning, such as jointly learning visual and textual semantics. However, I would rather not start by attempting to train an image recognition system from scratch, and prefer to leave this part to researchers who are more experienced in vision and image analysis.

Therefore, the goal is to use an existing image recognition system, in order to extract useful features for a dataset of images, which can then be used as input to a separate machine learning system or neural network. We start with a directory of images, and create a text file containing feature vectors for each image.

1. Install Caffe

Caffe is an open-source neural network library developed in Berkeley, with a focus on image recognition. It can be used to construct and train your own network, or load one of the pretrained models. A web demo is available if you want to test it out.

Follow the installation instructions to compile Caffe. You will need to install quite a few dependencies (Boost, OpenCV, ATLAS, etc), but at least for Ubuntu 14.04 they were all available in public repositories.

Once you’re done, run

make test
make runtest

This will run the tests and make sure the installation is working properly.

Multilingual Semantic Models

In this post I’ll discuss a model for learning word embeddings, such that they end up in the same space in different languages. This means we can find the similarity between some English and German words, or even compare the meaning of two sentences in different languages. It is a summary and analysis of the paper by Karl Moritz Hermann and Phil Blunsom, titled “Multilingual Models for Compositional Distributional Semantics“, published at ACL 2014.

The Task

The goal of this work is to extend the distributional hypothesis to multilingual data and joint-space embeddings. This would give us the ability to compare words and sentences in different languages, and also make use of labelled training data from languages other than the target language. For example, below is an illustration of English words and their Estonian translations in the same semantic space.

vector_space_model_multilingual

 

This actually turns out to be a very difficult task, because the distributional hypothesis stops working across different languages. While “fish” is an important feature of “cat”, because they occur together often, “kass” never occurs with “fish”, because they are in different languages and therefore used in separate sets of documents.

In order to learn these representations in the same space, the authors construct a neural network that learns from parallel sentences (pairs of the same sentence in different languages). The model is then evaluated on the task of topic classification, training on one language and testing on the other.

Political ideology detection

Neural networks have a range of interesting applications, and here I will discuss on one them: recursive neural networks and the detection of political ideology. This post is a summary and analysis of a recent publication by Mohit Iyyer, Peter Anns, Jordan Boyd-Graber and Philip Resnik: “Political Ideology Detection Using Recursive Neural Networks“.

The Task

Given a sentence, we want the model to detect the political ideology expressed in that sentence. In this research, the authors deal with US politics, so the possible options are liberal (democrats) or conservative (republicans). As a practical application we might consider a system that processes a large amount of news articles or public speeches to detect and measure explicit or hidden political bias of the authors.

democrat-republican

A traditional approach to this problem is a simple bag-of-words model, where each word is treated as a separate feature, but this ignores any syntactic structure and even word order. As shown below, political ideology can be compositionally complicated – while certain sections of the sentence are locally conservative, the way they are used in context makes the overall sentence liberal.political_sample_1

Figure 1: Sample sentence from Iyyer et al. (2014). Blue nodes are liberal, red nodes are conservative, grey nodes are neutral.

Don’t count, predict

In the past couple of years, neural networks have nearly taken over the field of NLP, as they are being used in recent state-of-the-art systems for many tasks. One interesting application is distributional semantics, as they can be used to learn intelligent dense vector representations for words. Marco Baroni, Georgiana Dinu and German Kruszewski presented a paper in ACL 2014 called “Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors“, where they compare these new neural-network models with more traditional context vectors on a range of different tasks. Here I will try to give an overview and a summary of their work.

Distributional hypothesis

The goal is to find how similar two words are to each other semantically. The distributional hypothesis states:

Words which are similar in meaning occur in similar contexts
(Rubenstein & Goodenough, 1965).

Therefore, if we want to find a word similar to “magazine”, we can look for words that occur in similar contexts, such as “newspaper”.

I was reading a magazine today I was reading a newspaper today
The magazine published an article The newspaper published an article
He buys this magazine every day He buys this newspaper every day

 

Also, if we want to find how similar “magazine” and “newspaper” are, we can compare how similar are all the contexts in which they appear. For example, to find the similarity between two words, we can represent the contexts as feature vectors and calculate the cosine similarity between their corresponding vectors.