SUTD researchers unravel cell biology through artificial intelligence

16 Aug 2022 Engineering Product Development Biomedical, Biotechnology

SUTD - Jyothsna Vasudevan, Chuanxia Zheng James G. Wan,Tat-Jen Cham, Lim Chwee Teck and Javier G. Fernandez 

Previously limited by the human interpretation of reality, one of the most basic principles of cell biology can now be demonstrated using machine learning.
 
For our cells to proliferate, differentiate or migrate, the nucleus needs the help of its cytoskeleton, the scaffold surrounding the nucleus which provides cells with shape and solid structure. The disruption of this strong coupling, such as the dislocation of the nucleus from its cytoskeleton, is usually a symptom of disease in the body.
 
However, this relationship between the placement of the nucleus and cytoskeleton organisation has never been demonstrated before due to the difficulty in being able to mathematically define the intricate design of the cytoskeleton.
 
Using conventional scientific methods, a scientist would need to first determine the parameters needed to define and measure the system that is being studied. This human interpretation of reality allows for the measuring of simple systems using well-known parameters such as size, speed and distance. However, for many complex systems, such as the mesh of fibers forming the cytoskeleton, defining the parameters that are important becomes an impossible task.

Positioning of nuclei generated from arrangements of actin filaments.
a) Results of the identification and matching of real and generated nuclei by an automatic counting of the whole generated dataset and by manual counting of a subset of images. Stained nuclei refer to those recorded directly using fluorescent microscopy. Generated nuclei are those produced by the neural network using actin filament arrangements. Matched nuclei are those generated at less than 4 μm of its real counterpart. 

b) Manual (left) and automatic (right) processing of the same image. In manual processing the profile of the nuclei is drawn to calculate the centroid and the nuclei matched by comparison with the real counterpart. On the other hand, the automatic processing automatically identified the nuclei and generated their bounding boxes, matching generated and real nuclei based on the maximization of the overlapping areas of the bounding boxes. 

c) Several examples of generated nuclei (red) and their corresponding real nuclei (blue). The first three images (green frame) correspond to nuclei generated within the average nuclear radius (4 μm) from their real position. The last image (red frame) corresponds to a mismatch, where the generated nucleus is too far from its real position. 

d) Example of a cell and the relative distance of 4 μm within the cytoplasm. The probability of randomly positioning the nucleus within the cytoplasm can be identified as the ratio of possible matched positions for the centroid (green area) with respect to all possible positions (orange area). Those possible positions of the centroid located at less than the nuclear radius from the edges of the cell (red area) are discarded under the premise that the nucleus cannot be positioned partially outside the cell. 

e) Distribution of the distances of the generated nuclei respect their real position. 71% of the nuclei are situated at less than 4 μm of their real position. 

f) Distribution of distances of the generated nuclei considered matched (<4 μm). 40% of the matched nuclei are located at less than 1μm from their real position.


“Interpreting such complex systems is difficult because we must fit them into our interpretation of reality and its predefined measurables. With the thousands of intermingled spaghetti-like fibers, it would be humanly impossible to tell where one starts and the other ends, let alone figure out the parameters of the study,” explained principal investigator Assoc Prof Fernandez from SUTD.
 
The researchers then decided to disentangle the issue from a completely new perspective, shifting their focus from the system, to the observer instead.
 
Assoc Prof Javier G. Fernandez and Ph.D. candidate Jyothsna Vasudevan from the Singapore University of Technology and Design (SUTD) collaborated with National University of Singapore and the Nanyang Technological University and successfully demonstrated the correlation between cytoskeleton organisation and nuclear position by turning to artificial intelligence. Their study, ‘From qualitative data to correlation using deep generative networks: Demonstrating the relation of nuclear position with the arrangement of actin filaments’ was published in PLOS.
 
To ensure that the study’s parameters would not be limited by human conceptualisation, they developed a unique generative algorithm to interpret the cytoskeleton of eukaryotic cells using qualitative data, without telling the system what it was observing and how to measure it. 
 
“We separated the information related to the nucleus and the fibers in independent databases of images, ensuring that there wasn’t any information about the nucleus found in the images of the fibers, so that the system couldn’t cheat. Then we trained the system to find the location of the nucleus using only information specific to fibers. To do so, the system had to take the qualitative data and figure out on its own if there was a relation between the organisation of the fibers and the position of the nucleus. This forced the programme to find the parameters defining the system, free from human interpretation and predefined concepts,” Assoc Prof Fernandez added.
 
The algorithm was able to successfully predict the presence and the location of the nuclei in more than 8,000 cells, with almost half of those predictions resulting in a deviation of less than 1 μm from their exact position. This demonstrated, with astounding significance, the hypothesis of a deterministic relation between the arrangements of the actin filaments and the position of the nucleus, one of the most basic relations in cell biology. Assoc Prof Fernandez believes that this has also resulted in an epistemological outcome.
 
“This study has transformed the way we think about adapting our scientific research methods to allow machine learning to not just be used as a tool to analyse data, but to also interpret reality. For the inherently complex systems in biology, this will undoubtedly accelerate the next technological revolution - the ‘biologisation’ of technology. This will enable the complexities and intricacies of biological systems to be truly unravelled and dominated using machine learning,” added Assoc Prof Fernandez.

Reference: 
From qualitative data to correlation using deep generative networks: Demonstrating the relation of nuclear position with the arrangement of actin filaments, PLoS ONE 17(7): e0271056. (DOI: 10.1371/journal.pone.0271056)