19 Opportunities for NLP Research

The Department of Chemistry has a summer research internship focusing on the construction of a high‑quality annotated text corpus to support Natural Language Processing (NLP) methods for automated extraction of polymer property data from scientific literature. The project forms part of a broader effort to enable data‑driven materials discovery by improving access to structured information on polymer chemistry, processing, and performance.

This summer research opportunity is based in Paola Carbone’s research group. Paola is a Professor of Computational and Theoretical Chemistry at the University of Manchester. CC BY-SA licsensed image of a Buckminsterfullerene (C₆₀) adapted from an original by Itamblyn on Wikimedia Commons w.wiki/LK8s

Figure 19.1: This summer research opportunity is based in Paola Carbone’s research group. Paola is a Professor of Computational and Theoretical Chemistry at the University of Manchester. CC BY-SA licsensed image of a Buckminsterfullerene (C₆₀) adapted from an original by Itamblyn on Wikimedia Commons w.wiki/LK8s

19.1 Project Overview

The successful candidate will assist in the identification, collection, and curation of relevant scientific documents, including journal articles, reports, and technical summaries related to polymeric materials. Working under the supervision of a PhD student in Paola Carbone’s group (see figure 19.1), the student will contribute to the design and implementation of an annotation framework for marking key entities and relationships, such as polymer types, compositional descriptors, mechanical and thermal properties, and experimental conditions.

19.2 Research Activities

  • Systematic collection and organisation of domain‑specific textual sources
  • Development and application of annotation guidelines for entity and property labelling
  • Quality control of annotated data, including inter‑annotator agreement checks
  • Exposure to contemporary NLP workflows for information extraction in materials science

19.3 Learning Outcomes

This summer internship provides an opportunity to gain experience in corpus development, annotation methodology, and the integration of domain knowledge with computational tools. Students will develop an understanding of how NLP techniques can be applied to accelerate research in polymer science and materials informatics.

This project is well suited to students with a pre-knowledge in data‑centric research, artificial intelligence and LLM models.

19.4 Interested?

How to apply:

  • Email your CV to paola.carbone@manchester.ac.uk highlighting any experience in NLP
  • Deadline for applications: June but application closes once a suitable student has been found so apply asap
  • Salary: £12.71 per hours 35 hours per week, expected project duration 6-8 weeks.

This project is funded by the royalsociety.org