Pandas has a great object - the Styler object. It can do wonderful things. Many times, when we research, it's great to visualize the data with with colors. I'm the first one to use Matplotlib when it is needed but sometimes there is just no other way than looking at the data itself. Coloring the data could help a great deal with that. Highlighting null values, understanding the scale of the data or getting a sense of proportions is made a lot easier with styling. In the old days, I used to export my dataframe into an excel file or a google sheet and deploy the reliable conditional formatting. I still love conditional formatting, but exporting tables is something I hate doing, and also Excel and Google Sheets lack the programmatic abilities Python has.

So with the styler object you research with style!

so fetch

In [12]:
import pandas as pd
import blog
blog.set_blog_style()

Toy Examples with Pandas Testing

Pandas testing has a nice method for working with toy examples. More about it here

In [14]:
import pandas.util.testing as tm
tm.N, tm.K = 10, 7
st = pd.util.testing.makeTimeDataFrame() * 100
st
Out[14]:
A B C D E F G
2000-01-03 8.110432 -7.671659 -75.679396 -63.388007 79.634420 10.497984 37.576425
2000-01-04 -18.397552 111.882298 79.171438 53.751517 -107.629410 56.714036 37.317531
2000-01-05 247.767401 -107.709475 -148.387183 58.028000 75.225256 76.857173 72.986559
2000-01-06 -0.674517 42.458056 -33.310198 19.502670 14.905914 -39.948506 -114.569477
2000-01-07 -114.318425 6.137761 -93.587336 -34.042118 -47.988622 -191.936093 2.869611
2000-01-10 164.125587 -53.164714 -117.143462 108.541775 70.644089 133.318074 111.600095
2000-01-11 -29.388656 -242.240230 132.780522 149.456782 70.008844 -88.651034 -156.229086
2000-01-12 110.106442 116.538988 65.999453 38.939958 -6.084684 4.741270 54.417413
2000-01-13 -191.411423 100.878667 -38.618361 9.532799 -229.927711 -2.561390 -13.966168
2000-01-14 -78.295145 -53.193636 63.683134 47.418240 -166.267155 102.587168 -114.575626

And let's insert some null values

In [15]:
stnan = st.copy()
stnan[np.random.rand(*stnan.shape) < 0.05] = np.nan

The Styler Object

Pandas uses an accessor to get a Styler object on the dataframe. This object implements a _repr_html_ which is the method that Jupyter Notebooks use to make the dataframes so nice. You can also export the html.

In [16]:
tystnan.style # This looks just like the dataframe.
Out[16]:
A B C D E F G
2000-01-03 00:00:00 8.11043 -7.67166 -75.6794 -63.388 79.6344 10.498 nan
2000-01-04 00:00:00 -18.3976 111.882 79.1714 nan -107.629 56.714 37.3175
2000-01-05 00:00:00 247.767 -107.709 nan 58.028 75.2253 76.8572 72.9866
2000-01-06 00:00:00 -0.674517 42.4581 -33.3102 19.5027 14.9059 -39.9485 -114.569
2000-01-07 00:00:00 -114.318 6.13776 -93.5873 -34.0421 -47.9886 -191.936 2.86961
2000-01-10 00:00:00 164.126 -53.1647 -117.143 108.542 70.6441 nan 111.6
2000-01-11 00:00:00 -29.3887 -242.24 132.781 149.457 70.0088 -88.651 -156.229
2000-01-12 00:00:00 110.106 116.539 65.9995 38.94 -6.08468 4.74127 54.4174
2000-01-13 00:00:00 -191.411 100.879 -38.6184 9.5328 -229.928 -2.56139 -13.9662
2000-01-14 00:00:00 -78.2951 -53.1936 63.6831 47.4182 -166.267 102.587 -114.576

Basic Built-In Styling

The styler object has some nice built in functions. You can highlight nulls, min, max, etc. You can also apply it by axis, same as you would on applying functions.
In the next example we should expect:

  • nan values to be red
  • each row would have one yellow cell
  • each column would have one blue cell
In [17]:
(stnan
 .style
 .highlight_null('red')
 .highlight_max(color='steelblue', axis = 0)
 .highlight_min(color ='gold', axis = 1)
)
Out[17]:
A B C D E F G
2000-01-03 00:00:00 8.11043 -7.67166 -75.6794 -63.388 79.6344 10.498 nan
2000-01-04 00:00:00 -18.3976 111.882 79.1714 nan -107.629 56.714 37.3175
2000-01-05 00:00:00 247.767 -107.709 nan 58.028 75.2253 76.8572 72.9866
2000-01-06 00:00:00 -0.674517 42.4581 -33.3102 19.5027 14.9059 -39.9485 -114.569
2000-01-07 00:00:00 -114.318 6.13776 -93.5873 -34.0421 -47.9886 -191.936 2.86961
2000-01-10 00:00:00 164.126 -53.1647 -117.143 108.542 70.6441 nan 111.6
2000-01-11 00:00:00 -29.3887 -242.24 132.781 149.457 70.0088 -88.651 -156.229
2000-01-12 00:00:00 110.106 116.539 65.9995 38.94 -6.08468 4.74127 54.4174
2000-01-13 00:00:00 -191.411 100.879 -38.6184 9.5328 -229.928 -2.56139 -13.9662
2000-01-14 00:00:00 -78.2951 -53.1936 63.6831 47.4182 -166.267 102.587 -114.576

Color Scales

If you want to understand the scale of the data, applying a gradient creates sort of a "heat map" on the table. In the next case - lower values are white while higher values are dark blue.

In [18]:
st.style.background_gradient()
Out[18]:
A B C D E F G
2000-01-03 00:00:00 8.11043 -7.67166 -75.6794 -63.388 79.6344 10.498 37.5764
2000-01-04 00:00:00 -18.3976 111.882 79.1714 53.7515 -107.629 56.714 37.3175
2000-01-05 00:00:00 247.767 -107.709 -148.387 58.028 75.2253 76.8572 72.9866
2000-01-06 00:00:00 -0.674517 42.4581 -33.3102 19.5027 14.9059 -39.9485 -114.569
2000-01-07 00:00:00 -114.318 6.13776 -93.5873 -34.0421 -47.9886 -191.936 2.86961
2000-01-10 00:00:00 164.126 -53.1647 -117.143 108.542 70.6441 133.318 111.6
2000-01-11 00:00:00 -29.3887 -242.24 132.781 149.457 70.0088 -88.651 -156.229
2000-01-12 00:00:00 110.106 116.539 65.9995 38.94 -6.08468 4.74127 54.4174
2000-01-13 00:00:00 -191.411 100.879 -38.6184 9.5328 -229.928 -2.56139 -13.9662
2000-01-14 00:00:00 -78.2951 -53.1936 63.6831 47.4182 -166.267 102.587 -114.576

Custom

For me this is the best part, with a bit of css you can do anything on your dataframe, This is where we really differ from Excel or Sheets, doing all of this programitacally make life so much easier.

In [19]:
def custom_style(val):
    if val < -100:
        return 'background-color:red' # Low values are red
    elif val > 100:
        return 'background-color:green' # High values are green
    elif abs(val) <5:
        return 'background-color:yellow'# Values close to 0 are yellow
    else:
        return ''
st.style.applymap(custom_style)
Out[19]:
A B C D E F G
2000-01-03 00:00:00 8.11043 -7.67166 -75.6794 -63.388 79.6344 10.498 37.5764
2000-01-04 00:00:00 -18.3976 111.882 79.1714 53.7515 -107.629 56.714 37.3175
2000-01-05 00:00:00 247.767 -107.709 -148.387 58.028 75.2253 76.8572 72.9866
2000-01-06 00:00:00 -0.674517 42.4581 -33.3102 19.5027 14.9059 -39.9485 -114.569
2000-01-07 00:00:00 -114.318 6.13776 -93.5873 -34.0421 -47.9886 -191.936 2.86961
2000-01-10 00:00:00 164.126 -53.1647 -117.143 108.542 70.6441 133.318 111.6
2000-01-11 00:00:00 -29.3887 -242.24 132.781 149.457 70.0088 -88.651 -156.229
2000-01-12 00:00:00 110.106 116.539 65.9995 38.94 -6.08468 4.74127 54.4174
2000-01-13 00:00:00 -191.411 100.879 -38.6184 9.5328 -229.928 -2.56139 -13.9662
2000-01-14 00:00:00 -78.2951 -53.1936 63.6831 47.4182 -166.267 102.587 -114.576

Bars

Applying bars to your data gives a nice look if you want to understand how your data compares between itself. I know economists have great use for it, but everybody can employ this to get a grasp of the data quickly.

In [21]:
(st.style
 .bar(subset=['A','D'],color='steelblue')
 .bar(subset=['G'],color=['indianred','limegreen'], align='mid')
)
Out[21]:
A B C D E F G
2000-01-03 00:00:00 8.11043 -7.67166 -75.6794 -63.388 79.6344 10.498 37.5764
2000-01-04 00:00:00 -18.3976 111.882 79.1714 53.7515 -107.629 56.714 37.3175
2000-01-05 00:00:00 247.767 -107.709 -148.387 58.028 75.2253 76.8572 72.9866
2000-01-06 00:00:00 -0.674517 42.4581 -33.3102 19.5027 14.9059 -39.9485 -114.569
2000-01-07 00:00:00 -114.318 6.13776 -93.5873 -34.0421 -47.9886 -191.936 2.86961
2000-01-10 00:00:00 164.126 -53.1647 -117.143 108.542 70.6441 133.318 111.6
2000-01-11 00:00:00 -29.3887 -242.24 132.781 149.457 70.0088 -88.651 -156.229
2000-01-12 00:00:00 110.106 116.539 65.9995 38.94 -6.08468 4.74127 54.4174
2000-01-13 00:00:00 -191.411 100.879 -38.6184 9.5328 -229.928 -2.56139 -13.9662
2000-01-14 00:00:00 -78.2951 -53.1936 63.6831 47.4182 -166.267 102.587 -114.576


Comments

comments powered by Disqus