Blog posts

2020

Analyzing data augmentation for image classification

less than 1 minute read

Published: June 29, 2020

PCA analysis of image augmentation techniques used in the state of the art image classification models See full article here

Customize Classification Model Output Layer

less than 1 minute read

Published: March 11, 2020

Save classification labels and top confidences in a custom layer using Keras See full article here

Find Linear Transformation Based on Known Points

less than 1 minute read

Published: January 02, 2020

Code Snippet for Linear Algebra and Computer Vision See full article here

2019

BERT Visualization in Embedding Projector

less than 1 minute read

Published: December 04, 2019

This story shows how to visualize pre-trained BERT embeddings in Tensorflow’s Tensorboard Embedding Projector. The story uses around 50 unique sentences and their BERT embeddings generated with TensorFlow Hub BERT models. See full article here

Comparing Transformer Tokenizers

less than 1 minute read

Published: November 19, 2019

Comparing Tokenizer vocabularies of State-of-the-Art Transformers (BERT, GPT-2, RoBERTa, XLM) See full article here

How to Start Writing on Medium

less than 1 minute read

Published: November 16, 2019

Practical advice analysing the first month of a Towards Data Science writer See full article here

Simple BERT using TensorFlow 2.0

less than 1 minute read

Published: October 30, 2019

This story shows a simple usage of the BERT [1] embedding using TensorFlow 2.0. As TensorFlow 2.0 has been released recently, the module aims to use easy, ready-to-use models based on the high-level Keras API. The previous usage of BERT was described in a long Notebook implementing a Movie Review prediction. In this story, we will see a simple BERT embedding generator using Keras and the latest TensorFlow and TensorFlow Hub modules. All codes are available on Google Colab. See full article here

Machine Translation: Compare to SOTA

less than 1 minute read

Published: October 28, 2019

My previous story describes BLEU as the most used metric for Machine Translation (MT). This one aims to introduce the Conferences, Datasets and Competitions where you can compare your models with the State-of-the-art, you can collect knowledge from and where you can meet researchers from the field. See full article here

Identifying the right meaning of the words using BERT

less than 1 minute read

Published: October 21, 2019

An important reason for using contextualised word embeddings is that the standard embeddings assign one vector for every meaning of a word, however, there are multiple-meaning words. The hypothesis is that the use of the context can solve the problem of categorizing multiple-meaning words (homonyms and homographs) into the same embedding vector. In this story, we will analyse whether BERT embeddings can be used to classify different meanings of a word to prove that contextualised word embeddings solve the problem. See full article here

Machine Translation: A Short Overview

less than 1 minute read

Published: October 18, 2019

This story is an overview of the field of Machine Translation. The story introduces several highly cited literature and famous applications, but I’d like to encourage you to share your opinion in the comments. The aim of this story is to provide a good start for someone new to the field. It covers the three main approaches of machine translation as well as several challenges of the field. Hopefully, the literature mentioned in the story presents the history of the problem as well as the state-of-the-art solutions. See full article here

Visualisation of embedding relations (word2vec, BERT)

less than 1 minute read

Published: October 14, 2019

In this story, we will visualise the word embedding vectors to understand the relations between words described by the embeddings. This story focuses on word2vec [1] and BERT [2]. To understand the embeddings, I suggest reading a different introduction as this story does not aim to describe them. See full article here

BLEU-BERT-y: Comparing sentence scores

less than 1 minute read

Published: October 14, 2019

The goal of this story is to understand BLEU as it is a widely used measurement of MT models and to investigate its relation to BERT. See full article here

Prim’s algorithm with Numpy arrays

less than 1 minute read

Published: August 27, 2019

Minimum spanning tree using Numpy array operations. See full article here

Gergely Dániel Németh

Blog posts

2020

2019