Select Page
|

If your experiment needs statistics, you ought to have done a better experiment.

Ernest Rutherford

The aim of analytics is to find answers to problems. How do we find answers to the problems? We do that using data. When we begin to statistically analyse some data gathered through an experiment, we come across different types of data or variables. While there are many different classifications types possible, in this post, we’ll talk about the different types of variables in statistics.

Are All Statistical Data Created Equal?

No! While it’s easy to imagine all statistical data being simply numbers, there are different types of statistical data. The data are classified into types based on four different properties that they possess to varying degrees:

  • Identity Property means that the data is measured in an identifiable way and every single value confers a meaningful identity
  • Magnitude Property means that different values have an ordered relationship and the larger values have a higher magnitude than smaller values
  • Property of Equal Intervals means that the distance between any two points on the scale are equal
  • True Zero Property means that the scale has a true zero value and no value is possible below that

Let us take a look at the different types of statistical variables and see how each of them contain one or more of the above properties:

  • Nominal data refers to data that follow a naming system. The word nominal comes from the Latin word nomen meaning name. Nominal data do not convey anything besides the fact that the individual data points that are recorded, have something in common.  Since such data conveys identity, it satisfies only the identity property. For example, if a set of 6 people are drawn randomly from an office and their names recorded, the set of the names is nominal data.
    Importantly, it is not necessary for nominal data to always be a name (or a word). Assume that the six people selected randomly were assigned numbers 1, 2, 3,…6 and the numbers were recorded instead of their names. The numbers do not convey any meaning, or order, besides the fact that they represent the set of people selected at random. Though they are numbers, the meaning conveyed by them is same as the list of names and is nominal data.
    Think about this – the jersey numbers that sportspersons wear do not convey any numerical information, nor do they follow a specific order. The jersey numbers, though numerical, are nominal data. Indian cricket captain, Virat Kohli wears a number 18 on this jersey. The 18 does not mean that he is 18th in something, or anything else.
  • Ordinal data refers to a set that can be ordered in a specific way. In other words, the data has to have a meaningful order, irrespective of whether the data is numerical or not. For example, the words, “First”, “Second”, “Third”, etc. have a meaningful order. Similarly, “Highest”, “Higher”, “High” too have a particular order. Even if such ordinal data is numerical, they still do not lend themselves to arithmetic computations. While, 1, 2 make sense as individual ranks, adding them is meaningless and does not give us rank 3. Ordinal data conveys identity as well as order, and therefore satisfies both identity and magnitude properties.
  • Integer (or interval) data refers to data points that lie along a particular measurement scale. The most important property of an integer data is that the distance between consecutive points on an integer scale are equal. For example, the Celsius scale of measuring temperature is an integer scale. The temperature difference between 2 to 3, 10 to 11, 50 to 51 – are all equal. Integer scale can be created arbitrarily too. We are often asked to rate our satisfaction as a customer on a scale of 0 to 10 (where 0 is least satisfied). The data collected from such questions would be integer scale data. Data on the integer scale cannot be multiplied or divided. For example, sitting in a 20 Celsius room for 2 hours is not the same as experiencing 40 Celsius. Further, 0 on integer scale is only a point along the scale and does not indicate absence. Does 0 Celsius mean that there is no temperature present? Integer data has equal distance between any two consecutive points on the scale. Therefore, integer data satisfies identity, magnitude and the property of equal intervals.
  • Ratio scale is similar to integer scale with an important difference – ratio scale can be multiplied and divided. For example, 20 kilometres travelled 20 times would mean a total of 400 kilometres travelled. Unlike integer scale, ratio scale also has a meaningful 0 indicating the absence of the quantity being measured. 0 kilometres travelled means an absence of distance travelled at all. This presence of a true 0 differentiates it from integer scale. Therefore, ratio data satisfies identity, magnitude, property of equal intervals, as well as the true zero property.

If we only consider numerical data, there can be another way of classifying them, based on the continuity of the data:

  • Discrete data is one that can take only specific values and nothing outside those. For example, age of a person in years would be discrete data. The dates of a month are discrete data – a date can be 2, and at midnight, it changes to 3 with nothing in between. Similarly, year too is discrete. At 23:59 on December 31, 2016, the year was 2016, and that changed to 2017 after a minute. There was no 2016.5 year.
  • Continuous data is one that can take any value, unlike discrete data. For example, a person’s speed while walking can be 4.2 kilometres/hour, change to 4.3 kilometres/hour, 4.5 kilometres/hour, 5 kilometres/hour, or 6 kilometres/hour. The speed can also take any value in between.

Before we close this post, here are some fictitious data and their types: