Pandas has a great object - the Styler object. It can do wonderful things. Many times, when we research, it's great to visualize the data with with colors. I'm the first one to use Matplotlib when it is needed but sometimes there is just no other way than looking at the data itself. Coloring the data could help a great deal with that. Highlighting null values, understanding the scale of the data or getting a sense of proportions is made a lot easier with styling. In the old days, I used to export my dataframe into an excel file or a google sheet and deploy the reliable conditional formatting. I still love conditional formatting, but exporting tables is something I hate doing, and also Excel and Google Sheets lack the programmatic abilities Python has.

So with the styler object you research with style!

so fetch

In [12]:

import pandas as pd
import blog
blog.set_blog_style()

Toy Examples with Pandas Testing¶

Pandas testing has a nice method for working with toy examples. More about it here

In [14]:

import pandas.util.testing as tm
tm.N, tm.K = 10, 7
st = pd.util.testing.makeTimeDataFrame() * 100
st

Out[14]:

	A	B	C	D	E	F	G
2000-01-03	8.110432	-7.671659	-75.679396	-63.388007	79.634420	10.497984	37.576425
2000-01-04	-18.397552	111.882298	79.171438	53.751517	-107.629410	56.714036	37.317531
2000-01-05	247.767401	-107.709475	-148.387183	58.028000	75.225256	76.857173	72.986559
2000-01-06	-0.674517	42.458056	-33.310198	19.502670	14.905914	-39.948506	-114.569477
2000-01-07	-114.318425	6.137761	-93.587336	-34.042118	-47.988622	-191.936093	2.869611
2000-01-10	164.125587	-53.164714	-117.143462	108.541775	70.644089	133.318074	111.600095
2000-01-11	-29.388656	-242.240230	132.780522	149.456782	70.008844	-88.651034	-156.229086
2000-01-12	110.106442	116.538988	65.999453	38.939958	-6.084684	4.741270	54.417413
2000-01-13	-191.411423	100.878667	-38.618361	9.532799	-229.927711	-2.561390	-13.966168
2000-01-14	-78.295145	-53.193636	63.683134	47.418240	-166.267155	102.587168	-114.575626

And let's insert some null values

In [15]:

stnan = st.copy()
stnan[np.random.rand(*stnan.shape) < 0.05] = np.nan

The Styler Object¶

Pandas uses an accessor to get a Styler object on the dataframe. This object implements a _repr_html_ which is the method that Jupyter Notebooks use to make the dataframes so nice. You can also export the html.

In [16]:

tystnan.style # This looks just like the dataframe.

Out[16]:

	A	B	C	D	E	F	G
2000-01-03 00:00:00	8.11043	-7.67166	-75.6794	-63.388	79.6344	10.498	nan
2000-01-04 00:00:00	-18.3976	111.882	79.1714	nan	-107.629	56.714	37.3175
2000-01-05 00:00:00	247.767	-107.709	nan	58.028	75.2253	76.8572	72.9866
2000-01-06 00:00:00	-0.674517	42.4581	-33.3102	19.5027	14.9059	-39.9485	-114.569
2000-01-07 00:00:00	-114.318	6.13776	-93.5873	-34.0421	-47.9886	-191.936	2.86961
2000-01-10 00:00:00	164.126	-53.1647	-117.143	108.542	70.6441	nan	111.6
2000-01-11 00:00:00	-29.3887	-242.24	132.781	149.457	70.0088	-88.651	-156.229
2000-01-12 00:00:00	110.106	116.539	65.9995	38.94	-6.08468	4.74127	54.4174
2000-01-13 00:00:00	-191.411	100.879	-38.6184	9.5328	-229.928	-2.56139	-13.9662
2000-01-14 00:00:00	-78.2951	-53.1936	63.6831	47.4182	-166.267	102.587	-114.576

Basic Built-In Styling¶

The styler object has some nice built in functions. You can highlight nulls, min, max, etc. You can also apply it by axis, same as you would on applying functions.
In the next example we should expect:

nan values to be red
each row would have one yellow cell
each column would have one blue cell

In [17]:

(stnan
 .style
 .highlight_null('red')
 .highlight_max(color='steelblue', axis = 0)
 .highlight_min(color ='gold', axis = 1)
)

Out[17]:

	A	B	C	D	E	F	G
2000-01-03 00:00:00	8.11043	-7.67166	-75.6794	-63.388	79.6344	10.498	nan
2000-01-04 00:00:00	-18.3976	111.882	79.1714	nan	-107.629	56.714	37.3175
2000-01-05 00:00:00	247.767	-107.709	nan	58.028	75.2253	76.8572	72.9866
2000-01-06 00:00:00	-0.674517	42.4581	-33.3102	19.5027	14.9059	-39.9485	-114.569
2000-01-07 00:00:00	-114.318	6.13776	-93.5873	-34.0421	-47.9886	-191.936	2.86961
2000-01-10 00:00:00	164.126	-53.1647	-117.143	108.542	70.6441	nan	111.6
2000-01-11 00:00:00	-29.3887	-242.24	132.781	149.457	70.0088	-88.651	-156.229
2000-01-12 00:00:00	110.106	116.539	65.9995	38.94	-6.08468	4.74127	54.4174
2000-01-13 00:00:00	-191.411	100.879	-38.6184	9.5328	-229.928	-2.56139	-13.9662
2000-01-14 00:00:00	-78.2951	-53.1936	63.6831	47.4182	-166.267	102.587	-114.576

Color Scales¶

If you want to understand the scale of the data, applying a gradient creates sort of a "heat map" on the table. In the next case - lower values are white while higher values are dark blue.

In [18]:

st.style.background_gradient()

Out[18]:

	A	B	C	D	E	F	G
2000-01-03 00:00:00	8.11043	-7.67166	-75.6794	-63.388	79.6344	10.498	37.5764
2000-01-04 00:00:00	-18.3976	111.882	79.1714	53.7515	-107.629	56.714	37.3175
2000-01-05 00:00:00	247.767	-107.709	-148.387	58.028	75.2253	76.8572	72.9866
2000-01-06 00:00:00	-0.674517	42.4581	-33.3102	19.5027	14.9059	-39.9485	-114.569
2000-01-07 00:00:00	-114.318	6.13776	-93.5873	-34.0421	-47.9886	-191.936	2.86961
2000-01-10 00:00:00	164.126	-53.1647	-117.143	108.542	70.6441	133.318	111.6
2000-01-11 00:00:00	-29.3887	-242.24	132.781	149.457	70.0088	-88.651	-156.229
2000-01-12 00:00:00	110.106	116.539	65.9995	38.94	-6.08468	4.74127	54.4174
2000-01-13 00:00:00	-191.411	100.879	-38.6184	9.5328	-229.928	-2.56139	-13.9662
2000-01-14 00:00:00	-78.2951	-53.1936	63.6831	47.4182	-166.267	102.587	-114.576

Custom¶

For me this is the best part, with a bit of css you can do anything on your dataframe, This is where we really differ from Excel or Sheets, doing all of this programitacally make life so much easier.

In [19]:

def custom_style(val):
    if val < -100:
        return 'background-color:red' # Low values are red
    elif val > 100:
        return 'background-color:green' # High values are green
    elif abs(val) <5:
        return 'background-color:yellow'# Values close to 0 are yellow
    else:
        return ''
st.style.applymap(custom_style)

Out[19]:

	A	B	C	D	E	F	G
2000-01-03 00:00:00	8.11043	-7.67166	-75.6794	-63.388	79.6344	10.498	37.5764
2000-01-04 00:00:00	-18.3976	111.882	79.1714	53.7515	-107.629	56.714	37.3175
2000-01-05 00:00:00	247.767	-107.709	-148.387	58.028	75.2253	76.8572	72.9866
2000-01-06 00:00:00	-0.674517	42.4581	-33.3102	19.5027	14.9059	-39.9485	-114.569
2000-01-07 00:00:00	-114.318	6.13776	-93.5873	-34.0421	-47.9886	-191.936	2.86961
2000-01-10 00:00:00	164.126	-53.1647	-117.143	108.542	70.6441	133.318	111.6
2000-01-11 00:00:00	-29.3887	-242.24	132.781	149.457	70.0088	-88.651	-156.229
2000-01-12 00:00:00	110.106	116.539	65.9995	38.94	-6.08468	4.74127	54.4174
2000-01-13 00:00:00	-191.411	100.879	-38.6184	9.5328	-229.928	-2.56139	-13.9662
2000-01-14 00:00:00	-78.2951	-53.1936	63.6831	47.4182	-166.267	102.587	-114.576

Bars¶

Applying bars to your data gives a nice look if you want to understand how your data compares between itself. I know economists have great use for it, but everybody can employ this to get a grasp of the data quickly.

In [21]:

(st.style
 .bar(subset=['A','D'],color='steelblue')
 .bar(subset=['G'],color=['indianred','limegreen'], align='mid')
)

Out[21]:

	A	B	C	D	E	F	G
2000-01-03 00:00:00	8.11043	-7.67166	-75.6794	-63.388	79.6344	10.498	37.5764
2000-01-04 00:00:00	-18.3976	111.882	79.1714	53.7515	-107.629	56.714	37.3175
2000-01-05 00:00:00	247.767	-107.709	-148.387	58.028	75.2253	76.8572	72.9866
2000-01-06 00:00:00	-0.674517	42.4581	-33.3102	19.5027	14.9059	-39.9485	-114.569
2000-01-07 00:00:00	-114.318	6.13776	-93.5873	-34.0421	-47.9886	-191.936	2.86961
2000-01-10 00:00:00	164.126	-53.1647	-117.143	108.542	70.6441	133.318	111.6
2000-01-11 00:00:00	-29.3887	-242.24	132.781	149.457	70.0088	-88.651	-156.229
2000-01-12 00:00:00	110.106	116.539	65.9995	38.94	-6.08468	4.74127	54.4174
2000-01-13 00:00:00	-191.411	100.879	-38.6184	9.5328	-229.928	-2.56139	-13.9662
2000-01-14 00:00:00	-78.2951	-53.1936	63.6831	47.4182	-166.267	102.587	-114.576

DeanLa

Research with Style