Taking Natural Language Processing to greater heights
Natural Language Processing (NLP) has been around for more than 50 years, but has lately gotten more well-known, thanks to the likes of personal assistant applications Apple’s Siri and Amazon’s Alexa.
NLP has also been the driving force behind language translation applications such as Google Translate, but it has been no easy feat coming this far. It continues to be a complex job to enable computers to understand how humans naturally speak or type. Using machine learning algorithms, NLP extracts, analyses and infers useful information from large amounts of data from text. This information can then help to make predictions, finding hidden relations within data, and detecting anomalies.
“With NLP, we can even detect sentiment from social media, for instance how people feel about a certain issue or if they have an affinity towards a particular organisation. This can be very useful for organisations and governments in better understanding communities’ as well as society’s multifaceted needs,” explains Lu Wei, an Associate Professor with the Information Systems Technology and Design (ISTD) Pillar at SUTD, whose area of research expertise centres on NLP.
Tackling NLP Research Problems with a Unified Approach
Lu Wei, a two-time champion of the National Mathematical Olympiad Competition in China, has led several research initiatives into developing more efficient principled mathematical and algorithmic solutions to various fundamental NLP problems. Having spent years trying to come up with a unified approach to solving such NLP tasks, Lu Wei’s idea of hypergraphs sparked rather unexpectedly at a library one afternoon in 2014.
“I believe that an inquisitive mind-set, passion and patience help us researchers to get through the challenges that come with the unpredictable process of research,” Lu Wei shares while professing his deep appreciation for maths, machine learning and language.
The unified approach to natural language processing enabled the research team led by Lu Wei to build various novel algorithms for a variety of applications, including information extraction, sentiment analysis and semantic parsing. This includes his research on the segmental hypergraph model which was published in the Proceedings of the Empirical Methods in Natural Language Processing (EMNLP) conference 2018, one of the most highly regarded publications devoted exclusively to the computational and mathematical properties of language and the design and analysis of NLP algorithms and models.
“In building a natural language understanding system, it is very important to identify the basic semantic words and phrases in a body of text called entities. This is usually done through a fundamental task known as Named Entity Recognition, or NER. However, as entities may overlap with one another, it can be difficult to accurately extract all the entities due to the extensive list of possible combinations. So we designed a neural segmental hypergraph model which is able to extract all entities, including those that may exhibit non-conventional patterns, providing a much more efficient, yet robust solution,” says Lu Wei. He is confident of the model’s potential which can possibly be expanded for use in the biomedical domain as it requires a similar sequence modelling task.
Collaborating for Greater Impact
Lu Wei led a research team and worked with Alibaba, a Chinese multi-national conglomerate that specialises in e-commerce, to design several models around the theme of Chinese natural language processing.
In one of the projects, the team focused on the task of Chinese address parsing. Unlike English addresses, Chinese addresses are typically written in the form of a consecutive sequence of Chinese characters while probably intermixed with digits and English letters, resulting in incomplete or inaccurate information before analysis.
The Chinese address parsing model is able to encode the regular patterns among chunks that appear at the beginning of a Chinese address, while flexibly capturing the irregular patterns and rich dependencies among the chunks of different types that appear towards the end of the address. This is achieved by designing a novel structured representation integrating both a linear structure and a latent-variable tree structure. For their outstanding research, their paper was accepted at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), known to be one of the top conferences in the NLP field.
Separately, Lu Wei and his research team collaborated with Alibaba again on tackling the fundamental problem of incomplete annotation for named entity recognition. In the research paper, they identified limitations with the assumptions made by various previous approaches and proposed a principled solution to better tackle the problem. This research paper was also accepted by NAACL-HLT 2019.
Achieving Research Excellence through Teamwork
Another well-received paper which was published at EMNLP 2018 was awarded to Lu Wei and his student for their combined effort in building a dependency-based hybrid tree model for semantic parsing – one of the fundamental tasks within NLP. The resulting model allows for a new approach for mapping a sentence into its semantic representation with a principled, efficient algorithm which is shown to be more robust and is able to work across different languages.
Additionally, in a recent work published in 2019, Lu Wei and his research team proposed a new sequence labelling approach to solving math word problems – a long-standing artificial intelligence task. The research is the first of its kind that effectively tackles the problem from a sequence labelling perspective. It was published at the 57th Annual Meeting of the Association for Computational Linguistics (ACL), another top conference in the field of natural language processing.
All these models were developed based on the same hypergraph framework concept that Lu Wei developed.
“I feel very fortunate to work in a conducive and supportive environment. I am surrounded by colleagues, students and industry partners who are aligned closely with our vision to advance the field of NLP in Singapore and beyond. The synergy that such collaborations brings about is nothing short of powerful, allowing us to scratch beyond the surface so that we can form deeper insights into complex areas of research. I hope to continue cultivating this culture we have here, and further grow our team of NLP experts at SUTD,” shares Lu Wei.
Article by: Jessica Sasayiah
Contact Associate Professor Lu Wei at wei_lu@sutd.edu.sg, and Jessica Sasayiah at jessica_sasayiah@sutd.edu.sg