Research

Sequence Labeler : A neural network sequence labeling framework, which can be used for named entity recognition, POS-tagging, chunking, writing error detection, etc.

Parser Lexicalisation : We have developed a framework for integrating bilexical features with an unlexicalised parser, without using supervision. The system performs a series of graph modification to model higher-order dependencies, assigns confidence scores to every dependency relation, smooths these scores using distributional similarity, and combines them together to rerank alternative dependency graphs. As our novel self-learning framework requires only a large unannotated corpus, lexical features can be easily tuned to a specific domain or genre by selecting a suitable dataset.

SemGraph : SemGraph is a library for reading, writing and visualising graphs, mostly meant for dependency graphs of sentences. After parsing a large corpus of text, this can be used to conveniently iterate over sentences to collect features, build vector space models or analyse the parser output. It is designed so that the underlying parser can be easily changed without affecting the rest of the implementation. The visualiser creates a dynamic view of the graphs. An experimental feature can be enabled to also edit the graphs using the visualiser (e.g., correcting parses).

SemSim : SemSim is a Java library for creating semantic distributional models from text, and calculating semantic similarity scores between words. A corpus of parsed text can be used to create a distributional model, which in turn can be used to create distributional feature vectors. Finally, these vectors are the basis for finding the distributional similarity between two words. For example, this library can be used to find how semantically similar are words 'music' and 'song', or what are the most similar words to 'music'.

Lexical Vector Datasets : Publically available datasets of word vectors, generated using different methods and models on the same background corpus.

Hyponym Generation Dataset : A dataset for the task of hyponym generation, created based on WordNet.

Fragment Entailment Dataset : A manually annotated dataset for evaluating entailment detection between dependency graph fragments in the biomedical domain. It contains 100 entailment and 100 nonentailment relations. Fragment sizes range from 1 to 20 words, with the average of 2.86 words.

Tõnu : An application for displaying a 3D talking head with realistic mouth movements, created using C++, Ogre3D, CEGUI and Blender. It was part of my Bachelor's thesis and is designed for people who have hearing disabilities or work in a noisy environment. The project is set up to integrate with an estonian speech synthesizer, but the code and model can be configured to work with any language.

Web

Nifty.Events : An event recommendation system. It crawls various sources for upcoming events and provides a way to easily filter and manage them. When the user saves some events, the system will use machine learning to reorder the list and recommend other events the user would like. When connected with facebook, users can choose a friend and search for events that they would both like. The system uses neural networks and a custom algorithm for ranking the events and adapting to each individual user.

Encode-Explorer : A simple file browser written in PHP. It displays the list of files in a folder, allows the user to add/remove new files and folders, and includes a range of other small features (password protection, e-mail notifications, logging, thumbnails). It is designed to fit into a single file for easy installation and maintenance.

Slink : Slink is a short link management software written in PHP. The users can create a short link forwarding service in their own webspace, create backup mirrors of linked pages and files, upload their own files and manage user accounts.

ToDo List : A todo-list manager using Ajax. Allows the user to create an account, save items in the list, group and reorder them.

Pinpoint : An Ajax-based game that tests the user's knowledge of geographical locations. The player is given a score based on the speed and accuracy of their guesses, and the system keeps a list of highscores.

Android

WallSetter : An Android app that allows others to remotely change the background on your phone. Good for surprising people on their birthday or just for brightening their day with a fun photo. You decide who gets access by giving them a key, so choose wisely.