Pie Chart

A pie chart is a circular statistical representation which relates numerical proportion between the quantity of items using slices. The arc length of each slice is directly proportional to the quantity of the item it represents. A basic pie chart would look something like this:

From the pie chart it is fairly visible that the item represented by the color blue holds the upper hand then the red one and so on. You can find all about pie charts here and here.

Histogram

Histograms are a data representation that are widely used for a variety of numerical content. They can be quite useful in analysis and offers an estimate of probability distribution of a variable over short intervals. A histogram would look something like this: Histograms usually represent data that is being grouped into discrete bins. The number of grouped data or the frequency is then plotted against the range of these bins thus forming rectangles of different heights. You can find all about histograms here and here

The Comparison

As far as we have discussed something common between pie charts and histograms is their data representation. In a pie chart bigger the arc length of a slice larger the quantity of item and in a histogram higher the peak more the quantity of items in a bin. So a histogram can basically be converted to a pie chart where the frequency of elements in each bin of the histogram can represent the slices in the pie chart.Here is an example of one such representation:

Each slice starting from the top in clockwise direction represents each of the rectangles in the histogram. So representing a discrete histogram using a pie chart is the easiest of tasks.

But there is a way these two tends to differ from each other. Histograms can also represent a continuous function called a probability distribution function. Plotting the midpoints of the rectangles in a histogram vaguely gives us a probability distribution graph. A probability distribution displays the probabilities associated with all possible outcomes of an event. A function representing a probability distribution graph is called a probability distribution function. Below is one such graphical representation of a probability distribution function of the same data used to plot the histogram: So can we have a continuous pie chart for such cases? The borders between the colors of a pie chart is the only thing that stays in the way making it discrete. A solution to this problem includes removing the stiff borders of a pie chart and allowing the colors to merge to form a gradient which would look like a circular pallette of colors. Now how would we convert the histogram to such a pie chart? Well that involves a series of calculations.More information on probability distributions and probability density functions can be found here and here.

The Representation

The first step towards this approach includes assigning a color to each of the intervals just like the pie chart does. Now the frequency values are integrated over the available intervals of 0 to n. Integrating the frequency values gives us a graph which raises at intervals with values of high frequency and remains almost straight at intervals with values of less frequency.

This data can then be normalized in a range of 0 to 1. This can be achieved by the following equation:

xn = xn / max(x)

Here xn is the value in the set to be normalized and max(x) gives the largest value in the set of values. Remember that the data in a pie chart is always normalized as the whole pie chart make up to 100%.

After normalizing the values the inverse of this function is calculated through interpolation. As the representation is in the form of a circular gradient there has to be points on these circles of which colors are to be determined to form a perfect gradient. Always remember that the number of points are to be calculated in such a way that the transitions in the gradient seems smooth for a given data. This can be calculated as:

Color(angle) =c( Finv(angle / 360) )

Here F(x) gives the integral of the normalized histogram, and Finv(y) is the inverse of this function with domain [0,1] and codomain as the x-axis of the histogram. c(x) is a function which assigns a color for a given x-coordinate of the histogram. Typically this is achieved by choosing some key break points, assigning colors to them and interpolating between these break points for the other values of x.

When we have enough colors for drawing the full circle the output is a circular pallette where the coverage of a color related to a particular x is directly proportional to the frequency of x. The colors can be made more distinguishable by interpolating using a sigmoid function to find the color of a particular angle. This way the color stays inclined towards either of the two colors we are interpolating between and the colors in the representation stays true to the colors assigned to x in the start of the calculations. Below is a histogram along with its continuous pie chart representation.

The histogram data shown is for the raaga called Sahāna. The data representation starts from the bottom part of the continuous pie chart. So the white color represents the first peak, the green color represents the smaller second peak and so on. As it can be seen the higher the peaks in the histogram, the more the color(which represents the data) in the pie chart. It does not have strict boundaries between colors as well. So basically it is a pie chart with continuous data from something like a probability distribution of a histogram.

This concept was used to plot a pitch histogram of a melody which is continuous in form. The representation summarizes the pitch properties associated with the melody. The continuous pie chart of the pitch histogram of songs in the same melody can thus be compared easily with the actual representation. The pitch histograms and other details about melodies(called Raaga in this context) can all be found here.