About me

Gergely Dániel Németh is a PhD student in Artificial Intelligence at the ELLIS Unit Alicante. His PhD topics are AI Ethics and Federated Learning. His supervisors are Nuria Oliver (ELLIS Alicante), Miguel Angel Lozano (University of Alicante) and Novi Quadrianto (University of Sussex). He holds a MSc degree in Advanced Computer Science from The University of Manchester and a BSc in Computer Science Engineering from Budapest University of Technology and Economics.

Machine Learning Projects

University projects

Hyphenation using deep neural network

In this proof of concept experience, I showed that neural networks could learn the task of hyphenation. In Hungarian, hyphenation has a well-defined list of rules. The results showed that with enough example, the networks (FFNN, CNN, LSTM, seq2seq) could learn these rules. However, there are many exceptions and few-sample rules in the Hungarian hyphenation, and these were challenges those the models could not overcome. As an exiting experience, I trained a model with bilingual data (English and Hungarian), and the result showed that the model could handle the rules of both languages. This idea can facilitate developing new hyphenation algorithms as the current ones used in modern applications can only work with one defined language at a time. In the time of globalization, an average non-native English speaker often uses both English and his/her native language at the same time; therefore, I believe there would be demand for this application.

Argumentation mining with BERT

Language models like BERT opened a new era in NLP. They helped to achieve new state-of-the-art in many tasks. I showed that argumentation mining is one of them. I compared standard word embedding based models with new contextualised embedding generated by pre-trained BERT models as well as BERT transfer learning models. The best performing model based on sentence-level BERT embeddings. This suggests that for argumentation, where the focus is on the connection between discourse parts of the text, the token level description is too much information.

NLP with BERT

My main focus of using BERT in NLP is to get the most out of pre-trained models. I showed that only using the embeddings can identify the different meanings of a word. Using this, I am investigating the possibilities of using a word2vec-like static BERT embedding for words with multiple meaning. I believe that the relation between the words in context can help to build a WordNet-like structure using only general texts. However, while focusing on a small set of words, the separation is clear, scaling it up comes with challenges. Unsupervised clustering algorithms develop word meanings, word groups with considerable noise and require a lot of computational power.

Computer Vision

Working in a startup using computer vision in real-world applications, I developed a new awareness of real-world machine learning problems. My main focus here is to create new resources (data augmentation models, datasets, pre-trained models) to help existing models be more robust to the real-world challenges (weather conditions, lightning, etc.).

As an artist

Az első expedíció - My first published fiction novel

My first fiction novel takes place in the near future where humanity faces the challenges of contacting a new intelligent species. I wrote the original novel in the 10th grade of high school and revised it when I published it in 2018. The book is available in most major ebook stores thanks to PublishDrive’s services. See the book in Google Books here

Parlement of Foules - Translation

One of my late high school projets is a Hungarian translation of The Parlement of Foules by Geoffrey Chaucer (1382). For this I used the original and the modern English translation of the poem. Translating a 777 lines long mediaval English poem as a young Hungarian student - strugling with English at the time - has it’s own challanges but I’m proud of the outcome to this day. See the full translation here