Cellarea analysis

Here we explore descriptive statistics for cell area of embryonic stem cells.

In [1]:

Copied!





import pandas as pd
import numpy as np

# Load the data
data = pd.read_excel("cellarea.xlsx", header=None, index_col=None)
data.head()
import pandas as pd
import numpy as np

# Load the data
data = pd.read_excel("cellarea.xlsx", header=None, index_col=None)
data.head()

Out[1]:

	0
0	48000
1	42000
2	11000
3	26000
4	15000

In [2]:

Copied!

hist = data.hist(bins=100)
hist = data.hist(bins=100)

No description has been provided for this image

We can see an outliers at the very right end of the graph.

In [3]:

Copied!





# Remove the outlier using 99% percentile
q = data.quantile(0.99)
car = data[data < q]
hist2 = car.hist(bins=100)
# Remove the outlier using 99% percentile
q = data.quantile(0.99)
car = data[data < q]
hist2 = car.hist(bins=100)

In [4]:

Copied!





# Rescale the data to moderate values so that the area is expressed in thousands of 
# micro m^2
car_scaled = car / 1000.0

# Compute the locations of the car
print("Median: " + str(car_scaled.median()))
print("Mode: " + str(car_scaled.mode()))

# Compute variability
print("Var: " + str(car_scaled.var()))
print("Std: " + str(car_scaled.std()))
# Rescale the data to moderate values so that the area is expressed in thousands of 
# micro m^2
car_scaled = car / 1000.0

# Compute the locations of the car
print("Median: " + str(car_scaled.median()))
print("Mode: " + str(car_scaled.mode()))

# Compute variability
print("Var: " + str(car_scaled.var()))
print("Std: " + str(car_scaled.std()))

Median: 0    17.0
dtype: float64
Mode:       0
0  10.0
Var: 0    360.630484
dtype: float64
Std: 0    18.990273
dtype: float64

In [5]:

Copied!





# Box-and-Whiskers Plot
# The top and bottom of the "box" are the 25th and 75th percentile of the data, 
# respectively, with the distances between the representing the IQR. The green line inside the box 
# is the median. Since the median is not centered in the box, the sample is skewed. Whiskers 
# extend from the lower and upper sides of the box to the data's most extreme values
# within 1.5 times the IQR. Potential outliers are displayed with black "o" beyond
# the endpoints of the wiskers.
car_scaled.boxplot()
# Box-and-Whiskers Plot
# The top and bottom of the "box" are the 25th and 75th percentile of the data, 
# respectively, with the distances between the representing the IQR. The green line inside the box 
# is the median. Since the median is not centered in the box, the sample is skewed. Whiskers 
# extend from the lower and upper sides of the box to the data's most extreme values
# within 1.5 times the IQR. Potential outliers are displayed with black "o" beyond
# the endpoints of the wiskers.
car_scaled.boxplot()

Out[5]:

<Axes: >

In [6]:

Copied!





# Histogram
# Histogram is a rough approximation of the population distribution based on a sample. 
# Plotted in histogram are frequencies for interval-grouped data. Common way to determine the bin size
# is to use Sturges'r rule:
# k = 1 + log2n, where n is the size of the sample
k = 1 + np.log2(car_scaled.size)
hist3 = car_scaled.hist(bins=int(k))
# Histogram
# Histogram is a rough approximation of the population distribution based on a sample. 
# Plotted in histogram are frequencies for interval-grouped data. Common way to determine the bin size
# is to use Sturges'r rule:
# k = 1 + log2n, where n is the size of the sample
k = 1 + np.log2(car_scaled.size)
hist3 = car_scaled.hist(bins=int(k))

In [7]:

Copied!





# Empirical Cumulative Distribution Function
# The empirical cumulative distribution function (ECDF) F_n(x) for sample, X_1, ..., X_n
# is defined as:
# F_n(x) = (1 / n) \sum^n_{i = 1} \mathbf{1} (X_i \leq x)
# and represents the proportion of sample values smaller than x. 
# Empirical Cumulative Distribution Function
# The empirical cumulative distribution function (ECDF) F_n(x) for sample, X_1, ..., X_n
# is defined as:
# F_n(x) = (1 / n) \sum^n_{i = 1} \mathbf{1} (X_i \leq x)
# and represents the proportion of sample values smaller than x.