Scatter


Scatterplot block.

The scatter plot is perhaps the most well-known chart to plot x, and y coordinates. Basic charts are very useful from time to time, especially with the brushing and zooming capabilities. The scatter plots can be sample-wise colored and used to detect relationships between (groups of) variables. The input data frame should contain 2 columns (x and y) with the coordinates, and the index represents the class label.

param x:

1d coordinates x-axis.

type x:

numpy array

param y:

1d coordinates y-axis.

type y:

numpy array

param x1:

Second set of 1d coordinates x-axis.

type x1:

numpy array

param y1:

Second set of 1d coordinates y-axis.

type y1:

numpy array

param x2:

Third set of 1d coordinates x-axis.

type x2:

numpy array

param y2:

Third set of 1d coordinates y-axis.

type y2:

numpy array

param jitter:

Add jitter to data points as random normal data. Values of 0.01 is usually good for one-hot data seperation.

type jitter:

float, default: None

param size:

Size of the samples.

type size:

list/array of with same size as (x,y).

param color:
  • ‘#ffffff’ : All dots are get the same hex color.

  • None: The same color as for c is applied.

  • [‘#000000’, ‘#ffffff’,…]: list/array of hex colors with same size as (x,y)

type color:

list/array of hex colors with same size as (x,y)

param stroke:
Edgecolor of dotsize in hex colors.
  • ‘#000000’ : All dots are get the same hex color.

  • [‘#000000’, ‘#ffffff’,…]: list/array of hex colors with same size as (x,y)

type stroke:

list/array of hex colors with same size as (x,y)

param c_gradient:
Hex color to make a lineair gradient using the density.
  • None: Do not use gradient.

  • opaque: Towards the edges the points become more transparant. This will stress the dense areas and make scatter plot tidy.

  • ‘#FFFFFF’: Towards the edges it smooths into this color

type c_gradient:

String, (default: ‘opaque’)

param opacity:

Opacity of the dot. Shoud be same size as (x,y)

type opacity:

float or list/array [0-1]

param tooltip:

labels of the samples.

type tooltip:

list of labels with same size as (x,y)

param cmap:
All colors can be reversed with ‘_r’, e.g. ‘binary’ to ‘binary_r’
  • ‘tab20c’, ‘Set1’, ‘Set2’, ‘rainbow’, ‘bwr’, ‘binary’, ‘seismic’, ‘Blues’, ‘Reds’, ‘Pastel1’, ‘Paired’, ‘twilight’, ‘hsv’

type cmap:

String, (default: ‘inferno’)

param scale:

Scale datapoints. The default is False.

type scale:

Bool, optional

param label_radio:

The labels used for the radiobuttons.

type label_radio:

List [‘(x, y)’, ‘(x1, y1)’, ‘(x2, y2)’]

param set_xlim:

Width of the x-axis: The default is extracted from the data with 10% spacing.

type set_xlim:

tuple, (default: [None, None])

param set_ylim:

Height of the y-axis: The default is extracted from the data with 10% spacing.

type set_ylim:

tuple, (default: [None, None])

param title:
Title of the figure.
  • ‘Scatterplot’

type title:

String, (default: None)

param filepath:
File path to save the output.
  • Temporarily path: ‘d3blocks.html’

  • Relative path: ‘./d3blocks.html’

  • Absolute path: ‘c://temp//d3blocks.html’

  • None: Return HTML

type filepath:

String, (Default: user temp directory)

param figsize:
Size of the figure in the browser, [width, height].
  • [900, 600]

type figsize:

tuple

param showfig:
  • True: Open browser-window.

  • False: Do not open browser-window.

type showfig:

bool, (default: True)

param overwrite:
  • True: Overwrite the html in the destination directory.

  • False: Do not overwrite destination file but show warning instead.

type overwrite:

bool, (default: True)

param notebook:
  • True: Use IPython to show chart in notebook.

  • False: Do not use IPython.

type notebook:

bool

param save_button:
  • True: Save button is shown in the HTML to save the image in svg.

  • False: No save button is shown in the HTML.

type save_button:

bool, (default: True)

param reset_properties:
  • True: Reset the node_properties at each run.

  • False: Use the d3.node_properties()

type reset_properties:

bool, (default: True)

returns:
  • d3.node_properties (DataFrame of dictionary) – Contains properties of the unique input label/nodes/samples.

  • d3.edge_properties (DataFrame of dictionary) – Contains properties of the unique input edges/links.

  • d3.config (dictionary) – Contains configuration properties.

Examples

>>> # Load d3blocks
>>> from d3blocks import D3Blocks
>>> #
>>> # Initialize
>>> d3 = D3Blocks()
>>> #
>>> # Load example data
>>> df = d3.import_example('cancer')
>>> #
>>> # Set size and tooltip
>>> size = df['survival_months'].fillna(1).values / 20
>>> tooltip = df['labx'].values + ' <br /> Survival: ' + df['survival_months'].astype(str).str[0:4].values
>>> #
>>> # Scatter plot
>>> d3.scatter(df['tsneX'].values,
               df['tsneY'].values,
               size=size,
               color=df['labx'].values,
               stroke='#000000',
               opacity=0.4,
               tooltip=tooltip,
               filepath='scatter_demo.html',
               cmap='tab20')

Examples

>>> # Scatter plot with transitions. Note that scale is set to True to make the axis comparible to each other
>>> d3.scatter(df['tsneX'].values,
               df['tsneY'].values,
               x1=df['PC1'].values,
               y1=df['PC2'].values,
               label_radio=['tSNE', 'PCA'],
               scale=True,
               size=size,
               color=df['labx'].values,
               stroke='#000000',
               opacity=0.4,
               tooltip=tooltip,
               filepath='scatter_transitions2.html',
               cmap='tab20')

Examples

>>> # Scatter plot with transitions. Note that scale is set to True to make the axis comparible to each other
>>> d3.scatter(df['tsneX'].values,
               df['tsneY'].values,
               x1=df['PC1'].values,
               y1=df['PC2'].values,
               x2=df['PC2'].values,
               y2=df['PC1'].values,
               label_radio=['tSNE', 'PCA', 'PCA_reverse'],
               scale=True,
               size=size,
               color=df['labx'].values,
               stroke='#000000',
               opacity=0.4,
               tooltip=tooltip,
               filepath='scatter_transitions3.html',
               cmap='tab20')

Examples

>>> # Load d3blocks
>>> from d3blocks import D3Blocks
>>> #
>>> # Initialize
>>> d3 = D3Blocks(chart='Scatter')
>>> #
>>> # Import example
>>> df = d3.import_example('cancer')
>>> #
>>> # Set properties
>>> d3.set_edge_properties(df['tsneX'].values,
                           df['tsneY'].values,
                           x1=df['PC1'].values,
                           y1=df['PC2'].values,
                           label_radio=['tSNE','PCA'],
                           size=df['survival_months'].fillna(1).values / 10,
                           color=df['labx'].values,
                           tooltip=df['labx'].values + ' <br /> Survival: ' + df['survival_months'].astype(str).str[0:4].values,
                           scale=True)
>>> #
>>> # Show the chart
>>> d3.show()
>>> #
>>> # Set specific node properties.
>>> print(d3.edge_properties)
>>> d3.edge_properties.loc[0,'size']=50
>>> d3.edge_properties.loc[0,'color']='#000000'
>>> d3.edge_properties.loc[0,'tooltip']='I am adjusted!'
>>> #
>>> # Configuration can be changed too.
>>> print(d3.config)
>>> #
>>> # Show the chart again with adjustments
>>> d3.show()

References

Input Data

The input dataset are the x-coordinates and y-coordinates that needs to be specified seperately.

#                 x          y   age  ... labels
# labels                              ...
# acc     37.204296  24.162813  58.0  ...    acc
# acc     37.093090  23.423557  44.0  ...    acc
# acc     36.806297  23.444910  23.0  ...    acc
# acc     38.067886  24.411770  30.0  ...    acc
# acc     36.791195  21.715324  29.0  ...    acc
#           ...        ...   ...  ...    ...
# brca     0.839383  -8.870781   NaN  ...   brca
# brca    -5.842904   2.877595   NaN  ...   brca
# brca    -9.392038   1.663352  71.0  ...   brca
# brca    -4.016389   6.260741   NaN  ...   brca
# brca     0.229801  -8.227086   NaN  ...   brca

# [4674 rows x 9 columns]

Chart

Default scatterplot

Transitions (2 coordinates)

Transitions (3 coordinates)