Note: This tutorial was originally written as an Jupyter Notebook (ipynb). If you would like to run or edit the notebook, it can be found here

Background

Example Dataset

The example dataset used in this tutorial contains 2D immunofluorescence images of the Drosophila intestine obtained using a spinning-disc confocal microscope. The dataset is describes a "MARCM" experiment from the following paper.

MARCM is a genetic technique whereby mitotic recombination generates GFP-marked homozygous mutant cells from heterozygous precursors. Over time, individually marked cells divide adjacent to each other and create "clones" of mutant cells, surrounded by unmarked wild-type cells:

Left:Genetic labelling by MARCM. Right: Labelled cells grow over time into 'clones'.

In this dataset, intestines were imaged using four channels:

C0 : DAPI (nuclear marker)
C1 : GFP (clone marker)
C2 : PDM1 staining (EnteroCyte marker)
C3 : Prospero staining (EndoEndocrine marker)

composite image

Experimental Aims

We aim to investigate: 1) Does gene of interest regulate cell proliferation?
2) Does our gene of interest control cell differentiation?
3) Does gene of interest effect local tissue cell arrangement and structure?

To address each of these aims we will:

Quantify the number and type of cells present in each image.
Classify whether each cell resides outside or inside a clone.
Calculate the number of neighbours each cell has and what type of cell each of those neighbours are.

These data will help us address of experimental aims since:

We can measure if our gene of interest regulates cell proliferation by comparing the number of cells per mutant clone, to the number of cells per control clone i.e. clones generated in wild-type animals lacking any mutation.
We can determine if our gene of interest regulates cell differentiation by comparing the percentage of each cell type inside vs. outside a clone (since only cells inside a clone are mutant for our gene of interest).
We can determine whether our mutation of interest affects local tissue cell arrangement and structure by comparing the number and type of cell neighbours inside vs outside a clone.

Walkthrough

Initiate Dask dashboard (optional)

Most functions in clonedetective are parallelised using Dask. When using the Dask distributed scheduler, we can access an interactive dashboard that details our computations' status, progress, and history. While especially useful when working on a remote cluster, the interactive dashboard is a great addition even when working locally (e.g. MacBook).

For example, here is what the dashboard looks like when measuring cell properties using the CloneCounter.make_measurements()) call described below. Each line of the 'Task Stream' corresponds to one of 8 logical cores on my MacBook Pro.

SegmentLocal

example Dask dashboard

N.B. Initiating a Dask scheduler is optional when working on your local machine. If you don't launch one, all computations will still run; you just won't have access to the fancy dashboard!

Launch the scheduler as follows on your local machine and click on the dashboard link:

from dask.distributed import Client

c = Client()
c

If you're working on a remote cluster, e.g. a SLURM HPC, please see the relevant Dask guides.

Import and instantiate CloneCounter subclass

The LazyCloneCounter subclass uses Dask to lazy load image series that may be too large to fit in RAM. If your data is small enough to fit into RAM, you can speed things up using the PersistentCloneCounter subclass.

For more details, see the advanced tutorial (coming soon).

from clonedetective.clone_counters import LazyCloneCounter

We intialise a LazyCloneCounter with four required arguments:

exp_name : str -> name of the experiment
img_name_regex : str -> regular expression used to extract unique identifies from image filenames
pixel_size : str -> pixel size in $\mu m^{2}$
tot_seg_ch : str -> image channel used to define the total number of cells e.g. DAPI channel.

In this experiment, images using the following pattern:

'a1' if from a wild-type animal. 'a2' if from a mutant animal.
'g01' for gut 1. 'g02' for gut 2 etc.
'p1' for position 1. 'p2' for position 2 etc.

Image names such as 'a1g01p1' can thus be parsed using the regular expression "a\dg\d\dp\d".

exp = LazyCloneCounter(
    exp_name="Marcm2a_E7F1",
    img_name_regex=r"a\dg\d\dp\d",
    pixel_size=0.275,
    tot_seg_ch="C0",
)

Load images and segmentations

We add images to the LazyCloneCounter instance by passing a filename "glob" for each of the image channel. For example, the directory "data/MARCM_experiment/images/C0/" contains a series of tif images for the C0 channel:

import os

# show the first 5 files in the directory
os.listdir("data/MARCM_experiment/images/C0/")[:5]

['a2g09p2.tif', 'a2g10p3.tif', 'a2g12p1.tif', 'a1g04p1.tif', 'a2g10p2.tif']

In this experiment, tifs for each image channel (C0, C1, C2 and C3) are stored in four separate folders.

exp.add_images(
    C0="data/MARCM_experiment/images/C0/*.tif",
    C1="data/MARCM_experiment/images/C1/*.tif",
    C2="data/MARCM_experiment/images/C2/*.tif",
    C3="data/MARCM_experiment/images/C3/*.tif",
)

Our image data is stored in a Xarray DataSet, which is a collection of Xarray DataArrays, each containing Numpy or dask arrays with explicitly labelled dimensions and coordinates:

exp.image_data

<xarray.Dataset>
Dimensions:       (img_channels: 4, img_name: 67, x: 810, y: 810)
Coordinates:
  * img_channels  (img_channels) <U2 'C0' 'C1' 'C2' 'C3'
  * img_name      (img_name) <U7 'a1g01p1' 'a1g01p2' ... 'a2g13p2' 'a2g13p3'
  * y             (y) float64 0.0 0.275 0.55 0.825 ... 221.7 221.9 222.2 222.5
  * x             (x) float64 0.0 0.275 0.55 0.825 ... 221.7 221.9 222.2 222.5
Data variables:
    images        (img_channels, img_name, y, x) uint16 dask.array<chunksize=(1, 1, 810, 810), meta=np.ndarray>

array(['C0', 'C1', 'C2', 'C3'], dtype='<U2')

array(['a1g01p1', 'a1g01p2', 'a1g01p3', 'a1g02p1', 'a1g02p2', 'a1g02p3',
       'a1g03p1', 'a1g03p2', 'a1g03p3', 'a1g04p1', 'a1g04p2', 'a1g04p3',
       'a1g05p1', 'a1g05p2', 'a1g05p3', 'a1g07p1', 'a1g07p2', 'a1g08p1',
       'a1g08p2', 'a1g08p3', 'a1g09p1', 'a1g09p2', 'a1g09p3', 'a1g10p1',
       'a1g11p1', 'a1g11p2', 'a1g11p3', 'a1g12p1', 'a1g12p2', 'a1g13p1',
       'a1g14p1', 'a1g15p1', 'a1g15p2', 'a1g15p3', 'a2g01p1', 'a2g01p2',
       'a2g01p3', 'a2g02p1', 'a2g02p2', 'a2g03p1', 'a2g03p2', 'a2g03p3',
       'a2g04p1', 'a2g04p2', 'a2g05p1', 'a2g05p2', 'a2g05p3', 'a2g06p1',
       'a2g06p2', 'a2g06p3', 'a2g07p1', 'a2g07p2', 'a2g07p3', 'a2g08p1',
       'a2g08p3', 'a2g09p1', 'a2g09p2', 'a2g09p3', 'a2g10p1', 'a2g10p2',
       'a2g10p3', 'a2g11p1', 'a2g11p3', 'a2g12p1', 'a2g12p2', 'a2g13p2',
       'a2g13p3'], dtype='<U7')

array([  0.   ,   0.275,   0.55 , ..., 221.925, 222.2  , 222.475])

array([  0.   ,   0.275,   0.55 , ..., 221.925, 222.2  , 222.475])

As an example of how these work, the coordinates of the "img_name" dimension are the individual image names, while the coordinates of "y" and "x" dimensions correspond to image size in $\mu m^{2}$:

exp.image_data["img_name"]

<xarray.DataArray 'img_name' (img_name: 67)>
array(['a1g01p1', 'a1g01p2', 'a1g01p3', 'a1g02p1', 'a1g02p2', 'a1g02p3',
       'a1g03p1', 'a1g03p2', 'a1g03p3', 'a1g04p1', 'a1g04p2', 'a1g04p3',
       'a1g05p1', 'a1g05p2', 'a1g05p3', 'a1g07p1', 'a1g07p2', 'a1g08p1',
       'a1g08p2', 'a1g08p3', 'a1g09p1', 'a1g09p2', 'a1g09p3', 'a1g10p1',
       'a1g11p1', 'a1g11p2', 'a1g11p3', 'a1g12p1', 'a1g12p2', 'a1g13p1',
       'a1g14p1', 'a1g15p1', 'a1g15p2', 'a1g15p3', 'a2g01p1', 'a2g01p2',
       'a2g01p3', 'a2g02p1', 'a2g02p2', 'a2g03p1', 'a2g03p2', 'a2g03p3',
       'a2g04p1', 'a2g04p2', 'a2g05p1', 'a2g05p2', 'a2g05p3', 'a2g06p1',
       'a2g06p2', 'a2g06p3', 'a2g07p1', 'a2g07p2', 'a2g07p3', 'a2g08p1',
       'a2g08p3', 'a2g09p1', 'a2g09p2', 'a2g09p3', 'a2g10p1', 'a2g10p2',
       'a2g10p3', 'a2g11p1', 'a2g11p3', 'a2g12p1', 'a2g12p2', 'a2g13p2',
       'a2g13p3'], dtype='<U7')
Coordinates:
  * img_name  (img_name) <U7 'a1g01p1' 'a1g01p2' ... 'a2g13p2' 'a2g13p3'

array(['a1g01p1', 'a1g01p2', 'a1g01p3', 'a1g02p1', 'a1g02p2', 'a1g02p3',
       'a1g03p1', 'a1g03p2', 'a1g03p3', 'a1g04p1', 'a1g04p2', 'a1g04p3',
       'a1g05p1', 'a1g05p2', 'a1g05p3', 'a1g07p1', 'a1g07p2', 'a1g08p1',
       'a1g08p2', 'a1g08p3', 'a1g09p1', 'a1g09p2', 'a1g09p3', 'a1g10p1',
       'a1g11p1', 'a1g11p2', 'a1g11p3', 'a1g12p1', 'a1g12p2', 'a1g13p1',
       'a1g14p1', 'a1g15p1', 'a1g15p2', 'a1g15p3', 'a2g01p1', 'a2g01p2',
       'a2g01p3', 'a2g02p1', 'a2g02p2', 'a2g03p1', 'a2g03p2', 'a2g03p3',
       'a2g04p1', 'a2g04p2', 'a2g05p1', 'a2g05p2', 'a2g05p3', 'a2g06p1',
       'a2g06p2', 'a2g06p3', 'a2g07p1', 'a2g07p2', 'a2g07p3', 'a2g08p1',
       'a2g08p3', 'a2g09p1', 'a2g09p2', 'a2g09p3', 'a2g10p1', 'a2g10p2',
       'a2g10p3', 'a2g11p1', 'a2g11p3', 'a2g12p1', 'a2g12p2', 'a2g13p2',
       'a2g13p3'], dtype='<U7')

array(['a1g01p1', 'a1g01p2', 'a1g01p3', 'a1g02p1', 'a1g02p2', 'a1g02p3',
       'a1g03p1', 'a1g03p2', 'a1g03p3', 'a1g04p1', 'a1g04p2', 'a1g04p3',
       'a1g05p1', 'a1g05p2', 'a1g05p3', 'a1g07p1', 'a1g07p2', 'a1g08p1',
       'a1g08p2', 'a1g08p3', 'a1g09p1', 'a1g09p2', 'a1g09p3', 'a1g10p1',
       'a1g11p1', 'a1g11p2', 'a1g11p3', 'a1g12p1', 'a1g12p2', 'a1g13p1',
       'a1g14p1', 'a1g15p1', 'a1g15p2', 'a1g15p3', 'a2g01p1', 'a2g01p2',
       'a2g01p3', 'a2g02p1', 'a2g02p2', 'a2g03p1', 'a2g03p2', 'a2g03p3',
       'a2g04p1', 'a2g04p2', 'a2g05p1', 'a2g05p2', 'a2g05p3', 'a2g06p1',
       'a2g06p2', 'a2g06p3', 'a2g07p1', 'a2g07p2', 'a2g07p3', 'a2g08p1',
       'a2g08p3', 'a2g09p1', 'a2g09p2', 'a2g09p3', 'a2g10p1', 'a2g10p2',
       'a2g10p3', 'a2g11p1', 'a2g11p3', 'a2g12p1', 'a2g12p2', 'a2g13p2',
       'a2g13p3'], dtype='<U7')

To access specific images, we can use the convenient .sel notation. Here, we index an image, load it into RAM (the .compute() call), and display it using matplotlib.

import matplotlib.pyplot as plt

# display the DAPI channel for the first image
ax = plt.subplots()[1]
ax.imshow(
    exp.image_data["images"].sel(img_channels="C0", img_name="a1g01p1").compute(),
    cmap="gray",
    vmax=12000,
)
ax.set_title("DAPI channel for img_name: a1g01p1")
ax.set_axis_off()

We can add segmentations in the same fashion...

TIP. At a minimum, clonedetective requires a segmentation for the "tot_seg_ch" (see above for definition). If you do not have a suitable segmentation, check out the example tutorial using StarDist.

exp.add_segmentations(C0="data/MARCM_experiment/segmentations/C0_stardist_segs/*.tif")

Segmentations have now been added to the Xarray dataset:

exp.image_data

<xarray.Dataset>
Dimensions:        (img_channels: 4, img_name: 67, seg_channels: 1, x: 810, y: 810)
Coordinates:
  * img_channels   (img_channels) <U2 'C0' 'C1' 'C2' 'C3'
  * img_name       (img_name) <U7 'a1g01p1' 'a1g01p2' ... 'a2g13p2' 'a2g13p3'
  * y              (y) float64 0.0 0.275 0.55 0.825 ... 221.7 221.9 222.2 222.5
  * x              (x) float64 0.0 0.275 0.55 0.825 ... 221.7 221.9 222.2 222.5
  * seg_channels   (seg_channels) <U2 'C0'
Data variables:
    images         (img_channels, img_name, y, x) uint16 dask.array<chunksize=(1, 1, 810, 810), meta=np.ndarray>
    segmentations  (seg_channels, img_name, y, x) uint16 dask.array<chunksize=(1, 1, 810, 810), meta=np.ndarray>

array(['C0', 'C1', 'C2', 'C3'], dtype='<U2')

array(['a1g01p1', 'a1g01p2', 'a1g01p3', 'a1g02p1', 'a1g02p2', 'a1g02p3',
       'a1g03p1', 'a1g03p2', 'a1g03p3', 'a1g04p1', 'a1g04p2', 'a1g04p3',
       'a1g05p1', 'a1g05p2', 'a1g05p3', 'a1g07p1', 'a1g07p2', 'a1g08p1',
       'a1g08p2', 'a1g08p3', 'a1g09p1', 'a1g09p2', 'a1g09p3', 'a1g10p1',
       'a1g11p1', 'a1g11p2', 'a1g11p3', 'a1g12p1', 'a1g12p2', 'a1g13p1',
       'a1g14p1', 'a1g15p1', 'a1g15p2', 'a1g15p3', 'a2g01p1', 'a2g01p2',
       'a2g01p3', 'a2g02p1', 'a2g02p2', 'a2g03p1', 'a2g03p2', 'a2g03p3',
       'a2g04p1', 'a2g04p2', 'a2g05p1', 'a2g05p2', 'a2g05p3', 'a2g06p1',
       'a2g06p2', 'a2g06p3', 'a2g07p1', 'a2g07p2', 'a2g07p3', 'a2g08p1',
       'a2g08p3', 'a2g09p1', 'a2g09p2', 'a2g09p3', 'a2g10p1', 'a2g10p2',
       'a2g10p3', 'a2g11p1', 'a2g11p3', 'a2g12p1', 'a2g12p2', 'a2g13p2',
       'a2g13p3'], dtype='<U7')

array([  0.   ,   0.275,   0.55 , ..., 221.925, 222.2  , 222.475])

array([  0.   ,   0.275,   0.55 , ..., 221.925, 222.2  , 222.475])

array(['C0'], dtype='<U2')

Similarly, we can access specific segmentations using the .sel notation:

from clonedetective.utils import generate_random_cmap

# display total segmentation for the first image
ax = plt.subplots()[1]
ax.imshow(
    exp.image_data["segmentations"]
    .sel(seg_channels="C0", img_name="a1g01p1")
    .compute(),
    cmap=generate_random_cmap(),
    interpolation="none",
)
ax.set_title("Total segmentation for img_name: a1g01p1")
ax.set_axis_off()

Measure cell properties

We next need to measure properties of each segmented cell (label) within our images:

exp.make_measurements()

Without providing an arguments, CloneCounter.make_measurements()) defaults to pairing the 'tot_seg_ch' segmentation channel to each of the image channels.

The segmentation-image channel pairs used for cell measurements can be accessed via the CloneCounter.seg_img_channel_pairs attribute.

exp.seg_img_channel_pairs

TIP: If these segmentation-image channel pairs are unsuitable for your experiment, they can be customised as detailed in the advanced tutorial (coming soon).

TIP: As a default,CloneCounter.make_measurements()) calculates every cell's mean intensity, centroid, and area. If we require additional measurements, we can supply the extra_properties with a list containing any 'regionprops' metrics from the scikit-image measure module. For example, we can additionally calculate 'eccentricity' as follows:

exp.make_measurements(extra_properties=["eccentricity"])

The primary result from the CloneCounter.make_measurements()) call is a CloneCounter.results_measurements attribute. This is a pandas DataFrame containing all per cell measurements.

exp.results_measurements

Test thresholds to define cells as GFP positive or negative

We now aim to determine whether each cell resides inside or outside a "clone". To do this, we first define a 'threshold' that categorises whether a cell is GFP positive or negative. This can be as simple as "mean_intensity > 1000", or involve multiple conditions e.g. "mean_intensity > 1000 & eccentricity > 0.3".

Thresholds are not applied directly to the C1 (GFP) image. Instead, for every label of the total segmentation channel (in this case C0: DAPI), we look at its corresponding intensity in the C1 (GFP) channel. We do this by querying the CloneCounter.results_measurement dataframe and keeping only those labels that meet our desired threshold criteria.

The CloneCounter class provides a helper method testing_possible_thresholds that can plot and compare different thresholds:

exp.testing_possible_thresholds(
    int_img="a1g10p1",
    thresholds=[
        ['int_img_ch == "C1" & seg_ch == "C0" & mean_intensity > 0'],
        ['int_img_ch == "C1" & seg_ch == "C0" & mean_intensity > 100'],
        ['int_img_ch == "C1" & seg_ch == "C0" & mean_intensity > 1000'],
        ['int_img_ch == "C1" & seg_ch == "C0" & mean_intensity > 10000'],
    ],
    figure_size=(8, 7),
    interpolation="none",
)

From this test, we can see that a mean intensity threshold of 1000 appears roughly suitable, as its binary image on the left keeps most of the GFP positive cells, while excluding GFP negative cells.

If we were still unhappy with this as a simple threshold, we could increase stringency by adding another threshold on a different cell measurement. Below, we keep a mean intensity threshold constant at 1000, while varying an additional eccentricity threshold:

exp.testing_possible_thresholds(
    int_img="a1g10p1",
    thresholds=[
        [
            'int_img_ch == "C1" & seg_ch == "C0" & mean_intensity > 1000 & eccentricity > 0'
        ],
        [
            'int_img_ch == "C1" & seg_ch == "C0" & mean_intensity > 1000 & eccentricity > 0.3'
        ],
        [
            'int_img_ch == "C1" & seg_ch == "C0" & mean_intensity > 1000 & eccentricity > 0.6'
        ],
    ],
    figure_size=(8, 8),
    interpolation="none",
)

In this example, most cells are pretty round and so both eccentricity == 0 and eccentricity == 0.3 look suitable.

Define clones and calculate numbers of touching neighbours

Once we have decided on a threshold, we can plug it into the CloneCounter.add_clones_and_neighbouring_labels method and apply it to all images:

exp.add_clones_and_neighbouring_labels(
    thresholds=[
        'int_img_ch == "C1" & seg_ch == "C0" & eccentricity > 0.3 & mean_intensity > 1000'
    ],
    thresh_name="GFP",
    calc_clones=True,
)

This method call does a couple of things:

1) It updates CloneCounter.results_measurements with an additional boolean column describing whether a cell (label) is positive or negative for the desired threshold.

exp.results_measurements.head()

2) It adds new neighbourhood count and clone images to CloneCounter.image_data. (See the additional 'C1' data variable in the Xarray dataset below)

exp.image_data

<xarray.Dataset>
Dimensions:         (GFP_neighbours: 7, img_channels: 4, img_name: 67, seg_channels: 1, x: 810, y: 810)
Coordinates:
  * img_channels    (img_channels) <U2 'C0' 'C1' 'C2' 'C3'
  * img_name        (img_name) <U7 'a1g01p1' 'a1g01p2' ... 'a2g13p2' 'a2g13p3'
  * y               (y) float64 0.0 0.275 0.55 0.825 ... 221.7 221.9 222.2 222.5
  * x               (x) float64 0.0 0.275 0.55 0.825 ... 221.7 221.9 222.2 222.5
  * seg_channels    (seg_channels) <U2 'C0'
  * GFP_neighbours  (GFP_neighbours) <U16 'ext_tot_seg_labs' ... 'GFPpos_nc'
Data variables:
    images          (img_channels, img_name, y, x) uint16 dask.array<chunksize=(1, 1, 810, 810), meta=np.ndarray>
    segmentations   (seg_channels, img_name, y, x) uint16 dask.array<chunksize=(1, 1, 810, 810), meta=np.ndarray>
    GFP             (GFP_neighbours, img_name, y, x) uint16 dask.array<chunksize=(7, 1, 810, 810), meta=np.ndarray>

array(['C0', 'C1', 'C2', 'C3'], dtype='<U2')

array(['a1g01p1', 'a1g01p2', 'a1g01p3', 'a1g02p1', 'a1g02p2', 'a1g02p3',
       'a1g03p1', 'a1g03p2', 'a1g03p3', 'a1g04p1', 'a1g04p2', 'a1g04p3',
       'a1g05p1', 'a1g05p2', 'a1g05p3', 'a1g07p1', 'a1g07p2', 'a1g08p1',
       'a1g08p2', 'a1g08p3', 'a1g09p1', 'a1g09p2', 'a1g09p3', 'a1g10p1',
       'a1g11p1', 'a1g11p2', 'a1g11p3', 'a1g12p1', 'a1g12p2', 'a1g13p1',
       'a1g14p1', 'a1g15p1', 'a1g15p2', 'a1g15p3', 'a2g01p1', 'a2g01p2',
       'a2g01p3', 'a2g02p1', 'a2g02p2', 'a2g03p1', 'a2g03p2', 'a2g03p3',
       'a2g04p1', 'a2g04p2', 'a2g05p1', 'a2g05p2', 'a2g05p3', 'a2g06p1',
       'a2g06p2', 'a2g06p3', 'a2g07p1', 'a2g07p2', 'a2g07p3', 'a2g08p1',
       'a2g08p3', 'a2g09p1', 'a2g09p2', 'a2g09p3', 'a2g10p1', 'a2g10p2',
       'a2g10p3', 'a2g11p1', 'a2g11p3', 'a2g12p1', 'a2g12p2', 'a2g13p2',
       'a2g13p3'], dtype='<U7')

array([  0.   ,   0.275,   0.55 , ..., 221.925, 222.2  , 222.475])

array([  0.   ,   0.275,   0.55 , ..., 221.925, 222.2  , 222.475])

array(['C0'], dtype='<U2')

array(['ext_tot_seg_labs', 'GFP_neg_labs', 'GFP_pos_labs', 'GFP_clone',
       'tot_nc', 'GFPneg_nc', 'GFPpos_nc'], dtype='<U16')

To illustrate what these new images are, lets plot and walkthrough them for one img_name (e.g. "a1g01p1"). As all these new images derive from the original total segmentation image, we will plot this, along with the original fluorescent image (channels C0 and C1).

We first import helper plotting functions from clonedetective.utils:

from clonedetective.utils import RGB_image_from_CYX_img, plot_new_images

import numpy as np

# grab one img_name from dataset
img = exp.image_data.sel(img_name="a1g01p1")
fluo_img = img["images"].data.compute()

# create RGB image of DAPI and GFP channels
RGB_image = RGB_image_from_CYX_img(
    red=None, green=fluo_img[1, ...], blue=fluo_img[0, ...]
)

# create a list of RGB fluorescent, total segmentation and the new images for easy plotting
img_to_plot = np.concatenate(
    [img["segmentations"].data.compute(), img["GFP"].data.compute()]
)
img_to_plot = [RGB_image] + [img for img in img_to_plot]

# create text labels for our plot
text_labels = ["DAPI + GFP", "total_segmentation"] + img["GFP"].coords[
    "GFP_neighbours"
].values.tolist()

plot_new_images(
    img_to_plot[:6],
    text_labels[:6],
    "ABCDEF",
    figure_shape=(2, 3),
    figure_size=(8, 6),
    vmax=np.unique(img_to_plot[1]).shape[0],
    interpolation="none",
)

(A) - two channel fluorescent image
(B) - total segmentation image i.e. from exp.image_data['segmentations'].
(C) - labels of total segmentation image have been extended so they touch each other.
(D) - labels that did not meet our threshold (C1_neg_labels).
(E) - labels that did meet our threshold (C1_pos_labels).
(F) - touching labels that did meet our threshold have been merged into 'clones'

plot_new_images(
    [img_to_plot[0]] + img_to_plot[6:],
    [text_labels[0]] + text_labels[6:],
    "AGHI",
    figure_shape=(1, 4),
    vmax=np.unique(img_to_plot[6]).shape[0],
    colorbar=True,
    label_cmap="magma",
)

(G) - Parametric image where the number of total neighbours is mapped to color.
(H) - Parametric image where the number of C1neg neighbours is mapped to color.
(I) - Parametric image where the number of C1pos neighbours is mapped to color.

Calculate numbers of touching neighbours for non-clone image channels

As well as a GFP channel, our dataset contains fluorescence channels for PDM1 (channel C2) and Prospero (channel C3) immunostains. Since PDM1 marks EnteroCyte (EE) cells, and Prospero marks EnteroEndocrine (EE) cells, we can use thresholds to define these cell types within our images.

Cropped image of intestine stained with DAPI, GFP, PDM1 and Prospero with labelled cell types

First, we define a threshold for EC cells. Since EC are large absorptive cells, we can use total intensity rather than mean intensity for thresholding:

exp.results_measurements = exp.results_measurements.eval(
    "total_intensity = mean_intensity * area"
)

To find suitable PDM1 (C2) threshold, we can again use the CloneCounter.testing_possible_thresholds()) method:

exp.testing_possible_thresholds(
    int_img="a1g01p1",
    thresholds=[
        ['int_img_ch == "C2" & seg_ch == "C0" & total_intensity > 5e3'],
        ['int_img_ch == "C2" & seg_ch == "C0" & total_intensity > 5e4'],
        ['int_img_ch == "C2" & seg_ch == "C0" & total_intensity > 5e5'],
        ['int_img_ch == "C2" & seg_ch == "C0" & total_intensity > 5e6'],
    ],
    figure_size=(8, 7),
    interpolation="none",
)

So 'total_intensity > 5e5' works well for the PDM1 (C2) channel.

Next, let's find a suitable Prospero threshold to define EE cells:

exp.testing_possible_thresholds(
    int_img="a1g10p1",
    thresholds=[
        ['int_img_ch == "C3" & seg_ch == "C0" & mean_intensity > 100',],
        ['int_img_ch == "C3" & seg_ch == "C0" & mean_intensity > 1000',],
        ['int_img_ch == "C3" & seg_ch == "C0" & mean_intensity > 5000',],
        ['int_img_ch == "C3" & seg_ch == "C0" & mean_intensity > 25000',],
    ],
    figure_size=(8, 7),
    interpolation="none",
)

'mean_intensity > 1000' looks to work well for the Prospero (C3 channel).

IMP -> However, it is crucial that threshold definitions for cell types are mutually exclusive i.e. a cell cannot be both EC_pos and EE_pos. If thresholds are not mutually exclusive, the same cell might be counted more than once when we later calculate percentage cell types and the number of neighbours.

At the moment, our thresholds are not mutually exclusive since there might a small percentage of cells that have PDM1 'total_intensity > 5e5' and Prospero 'mean_intensity' > 1000. We can check this using the CloneCounter.mutually_exclusive_cell_types()) method after defining out initial cell type thresholds:

exp.add_clones_and_neighbouring_labels(
    thresholds=['int_img_ch == "C2" & seg_ch == "C0" & total_intensity > 5e5',],
    thresh_name="EC",
    calc_clones=False,
)

exp.add_clones_and_neighbouring_labels(
    thresholds=['int_img_ch == "C3" & seg_ch == "C0" & mean_intensity > 1000',],
    thresh_name="EE",
    calc_clones=False,
)

exp.mutually_exclusive_cell_types()

False

To avoid this problem, we can define an additional double positive cell type (ECEC), and more precisely define our cell types as:

EC cells : positive for PDM1 (channel C2), negative for Prospero (channel C3).
EE cells : negative for PDM1 (channel C2), positive for Prospero (channel C3).
ECEE cells : positive for PDM1 (channel C2), positive for Prospero (channel C3)

exp.add_clones_and_neighbouring_labels(
    thresholds=[
        'int_img_ch == "C2" & seg_ch == "C0" & total_intensity > 5e5',
        'int_img_ch == "C3" & seg_ch == "C0" & mean_intensity < 1000',
    ],
    thresh_name="EC",
    calc_clones=False,
)

exp.add_clones_and_neighbouring_labels(
    thresholds=[
        'int_img_ch == "C2" & seg_ch == "C0" & total_intensity < 5e5',
        'int_img_ch == "C3" & seg_ch == "C0" & mean_intensity > 1000',
    ],
    thresh_name="EE",
    calc_clones=False,
)

exp.add_clones_and_neighbouring_labels(
    thresholds=[
        'int_img_ch == "C2" & seg_ch == "C0" & total_intensity > 5e5',
        'int_img_ch == "C3" & seg_ch == "C0" & mean_intensity > 1000',
    ],
    thresh_name="ECEE",
    calc_clones=False,
)

Now check again:

exp.mutually_exclusive_cell_types()

True

The final major cell type within the fly intestine are progenitor cells including Intestinal Stem Cells (ISCs) and EnteroBlasts (EBs). These can be defined as having small nuclei and not being 'EC_pos', 'EE_pos' nor 'ECEC_pos':

exp.add_clones_and_neighbouring_labels(
    thresholds=["area_um2 < 50 & not EC_pos & not EE_pos & not ECEE_pos"],
    thresh_name="ISCorEB",
    calc_clones=False,
)

IMP -> However, as well as being mutally exclusive, our cell type thresholds also need to completely categorise every cell. At the moment, this is not the case, since large nuclei cells that are not 'EC_pos', 'EE_pos' nor 'ECEC_pos', have not been classified.

We can check using the following method:

exp.complete_set_of_cell_types()

False

We, therefore, define a threshold for a rare preEC cell type, which have large nuclei but are not 'EC_pos', 'EE_pos' nor 'ECEC_pos'.

exp.add_clones_and_neighbouring_labels(
    # could also be defined as ["not EC_pos & not EE_pos & not ECEE_pos & not ISCorEB_pos"]
    thresholds=["area_um2 > 50 & not EC_pos & not EE_pos & not ECEE_pos"],
    thresh_name="preEC",
    calc_clones=False,
)

This completes our cell type thresholding:

exp.mutually_exclusive_cell_types(), exp.complete_set_of_cell_types()

(True, True)

Quantify clone and the touching neighbours

We now quantify numbers of clones and touching neighbours.

exp.measure_all_clones_and_neighbouring_labels()

This operation populates a CloneCounter.results_clones_and_neighbour_counts dictionary attribute with pandas dataframes for each defined cell type or clone threshold:

exp.results_clones_and_neighbour_counts.keys()

dict_keys(['GFP', 'EC', 'EE', 'ECEE', 'ISCorEB', 'preEC'])

Each dataframe contains neighbour count information for that cell type e.g.

exp.results_clones_and_neighbour_counts["GFP"].head()

Next, we combine these dataframes together, along with the original CloneCounter.results_measurements dataframe, to return a final dataframe that contains information on all cell properties (e.g. mean_intensity), whether a cell inside a clone, and the number and type of neighbours a cell has:

df = exp.combine_neighbour_counts_and_measurements()

Data export for downstream analysis

We can now export this DataFrame for future downstream analysis using your favourite data wrangling and plotting tools.

df.to_csv("data/example_results.csv", index=False)

To follow on with these analyses, please see tutorials:

	seg_ch	int_img_ch	seg_img	int_img	label	area	mean_intensity	centroid-0	centroid-1	eccentricity	area_um2
0	C0	C0	a1g01p1	a1g01p1	1	346	1144.604046	5.054913	97.182081	0.917838	26.166250
1	C0	C0	a1g01p1	a1g01p1	2	574	2557.510453	7.952962	223.238676	0.867292	43.408750
2	C0	C0	a1g01p1	a1g01p1	3	608	3160.207237	13.875000	519.351974	0.515510	45.980000
3	C0	C0	a1g01p1	a1g01p1	4	173	2825.052023	7.283237	644.699422	0.635806	13.083125
4	C0	C0	a1g01p1	a1g01p1	5	168	2858.904762	5.690476	385.523810	0.863300	12.705000
...	...	...	...	...	...	...	...	...	...	...	...
289	C0	C3	a2g13p3	a2g13p3	290	613	96.443719	792.845024	551.652529	0.499582	46.358125
290	C0	C3	a2g13p3	a2g13p3	291	410	76.739024	794.009756	459.451220	0.377639	31.006250
291	C0	C3	a2g13p3	a2g13p3	292	675	31.414815	798.114074	786.000000	0.698885	51.046875
292	C0	C3	a2g13p3	a2g13p3	293	599	95.011686	799.739566	500.397329	0.690486	45.299375
293	C0	C3	a2g13p3	a2g13p3	294	150	48.766667	805.913333	633.293333	0.925589	11.343750

	segmentation_channel	image_channel
0	C0	C0
1	C0	C1
2	C0	C2
3	C0	C3

	GFP_neighbours	GFP_clone	GFPneg_nc	GFPpos_nc	tot_nc
int_img	label
a1g01p1	11	0	2	2	4
	12	0	4	0	4
	15	0	2	0	2
	17	2	2	0	2
	18	0	2	0	2

Example Walkthrough