Designing Orthogonal Survey

Amir Harjo
6 min readOct 8, 2023

--

Conjoint Series — Scene 3

Conjoint Series — Scene 1 — How to Understand Consumer Preference?

Conjoint Series — Scene 2 — Which Product are Most Important?

Conjoint Series — Scene 4 — Regression Analysis for All Respondent

Conjoint Series — Scene 5 — Market Segmentation using Conjoint Analysis

“Let’s check if the design of the spot remover case is orthogonal. Abe, please do the calculation.” Dess instruct Abe.

“Very well,” Abe response.

Abe start by checking if the data type is correct.

str(spot_remover)
tibble [18 × 6] (S3: tbl_df/tbl/data.frame)
$ package_design : chr [1:18] "A" "A" "A" "B" ...
$ brand_name : chr [1:18] "K2R" "Glory" "Bissel" "K2R" ...
$ price : chr [1:18] "$1.19" "$1.39" "$1.59" "$1.39" ...
$ good_housekeeping_seal_y_n: chr [1:18] "No" "No" "Yes" "Yes" ...
$ money_back_guarantee_y_n : chr [1:18] "No" "Yes" "No" "Yes" ...
$ rank : num [1:18] 13 11 17 2 14 3 12 7 9 18 ...

“Hmmm…. the data type is not what I expected. To measure if the data is orthogonally designed, I need to calculate the correlation between variable. The variable need to be numeric.” Abe mumbling.

Abe understand that there will be two ways how to do it. First method is by change the column with character data into dummy variable and measure the correlation. Second one is by changing the column with character data into factor, then change the level of factor into numeric and measure the correlation. Abe is not sure if both method will give the same conclusion.

“Why not trying both?” Dess suggested.

Abe comply with Dess suggestion. He start writing the script for the approach.

library(fastDummies)

spot_remover_dummy <- dummy_cols(spot_remover)

Upon checking the result, Abe find out that the column of the data increase based on the number of level on each variable.

> str(spot_remover_dummy)
tibble [18 × 19] (S3: tbl_df/tbl/data.frame)
$ package_design : chr [1:18] "A" "A" "A" "B" ...
$ brand_name : chr [1:18] "K2R" "Glory" "Bissel" "K2R" ...
$ price : chr [1:18] "$1.19" "$1.39" "$1.59" "$1.39" ...
$ good_housekeeping_seal_y_n : chr [1:18] "No" "No" "Yes" "Yes" ...
$ money_back_guarantee_y_n : chr [1:18] "No" "Yes" "No" "Yes" ...
$ rank : num [1:18] 13 11 17 2 14 3 12 7 9 18 ...
$ package_design_A : int [1:18] 1 1 1 0 0 0 0 0 0 1 ...
$ package_design_B : int [1:18] 0 0 0 1 1 1 0 0 0 0 ...
$ package_design_C : int [1:18] 0 0 0 0 0 0 1 1 1 0 ...
$ brand_name_Bissel : int [1:18] 0 0 1 0 0 1 0 0 1 0 ...
$ brand_name_Glory : int [1:18] 0 1 0 0 1 0 0 1 0 0 ...
$ brand_name_K2R : int [1:18] 1 0 0 1 0 0 1 0 0 1 ...
$ price_$1.19 : int [1:18] 1 0 0 0 0 1 0 1 0 0 ...
$ price_$1.39 : int [1:18] 0 1 0 1 0 0 0 0 1 0 ...
$ price_$1.59 : int [1:18] 0 0 1 0 1 0 1 0 0 1 ...
$ good_housekeeping_seal_y_n_No : int [1:18] 1 1 0 0 1 1 1 0 1 0 ...
$ good_housekeeping_seal_y_n_Yes: int [1:18] 0 0 1 1 0 0 0 1 0 1 ...
$ money_back_guarantee_y_n_No : int [1:18] 1 0 1 0 1 1 0 1 1 1 ...
$ money_back_guarantee_y_n_Yes : int [1:18] 0 1 0 1 0 0 1 0 0 0 ...
- attr(*, ".internal.selfref")=<externalptr>

“Hi Dess, I am ready to calculate the correlation. Let me remove the original column and measure the calculation only for the dummy variable”.

library(tidyverse)
spot_remover_dummy <- spot_remover_dummy %>% select(-c(package_design,brand_name,price,
good_housekeeping_seal_y_n,money_back_guarantee_y_n,rank))
cor(spot_remover_dummy)

Abe checking few rows of the result and aghast. ”Dess, the design is not orthogonal. It is not really zero correlation between variable.”

package_design_A package_design_B package_design_C brand_name_Bissel brand_name_Glory
package_design_A 1.000000e+00 -5.000000e-01 -5.000000e-01 0.000000e+00 0.000000e+00
package_design_B -5.000000e-01 1.000000e+00 -5.000000e-01 -3.388132e-21 0.000000e+00
package_design_C -5.000000e-01 -5.000000e-01 1.000000e+00 0.000000e+00 0.000000e+00
brand_name_Bissel 0.000000e+00 -3.388132e-21 0.000000e+00 1.000000e+00 -5.000000e-01
brand_name_Glory 0.000000e+00 0.000000e+00 0.000000e+00 -5.000000e-01 1.000000e+00
brand_name_K2R 0.000000e+00 3.388132e-21 0.000000e+00 -5.000000e-01 -5.000000e-01
price_$1.19 0.000000e+00 -3.388132e-21 0.000000e+00 -1.355253e-20 -6.776264e-21
price_$1.39 0.000000e+00 3.388132e-21 0.000000e+00 -1.694066e-20 -1.016440e-20
price_$1.59 0.000000e+00 -3.388132e-21 0.000000e+00 -2.032879e-20 3.388132e-21
good_housekeeping_seal_y_n_No 1.355253e-20 1.355253e-20 1.355253e-20 1.355253e-20 -6.776264e-21
good_housekeeping_seal_y_n_Yes 0.000000e+00 3.388132e-21 0.000000e+00 -2.710505e-20 -6.776264e-21
money_back_guarantee_y_n_No 1.355253e-20 1.355253e-20 1.355253e-20 -4.065758e-20 0.000000e+00
money_back_guarantee_y_n_Yes 0.000000e+00 3.388132e-21 0.000000e+00 -2.710505e-20 -2.032879e-20

“Yes, sometimes it is very hard to get perfect orthogonal design with the number of profile that we expect. For example for this case, it is hard to get 18 perfect orthogonal from previously 108 combination. In the research, this is still acceptable to have nearly perfect orthogonal design” Dess explain, and continue with question.

“But you see Abe, the correlation between variable is very small right? Lets round the number and see what we got”.

Abe round the result and write the result in Excel so he and the team can check all the column and correlation easily.

# correlation
cor_spot_remover = data.frame(cor(spot_remover_dummy))
cor_spot_remover = round(cor_spot_remover)

# write to excel using open xlsx library
library(openxlsx)
write.xlsx(cor_spot_remover,"cor_spot_remover.xlsx",rowNames=TRUE)

Everyone verify that the design looks perfectly orthogonal.

“Let me check the correlation with second method.” Abe continue writing R script.

# change to factor
spot_remover <- as.data.frame(unclass(spot_remover),stringsAsFactors = TRUE)

# change factor to numeric
factors <- sapply(spot_remover, is.factor)
spot_remover[ , factors] <- lapply(spot_remover[ , factors], as.numeric)

“Abe, if you check the correlation, it will not perfectly orthogonal. So, to make it short, please round the result.” Dess tell Abe.

> round(cor(spot_remover %>% select(-rank)))

package_design brand_name price good_housekeeping_seal_y_n money_back_guarantee_y_n
package_design 1 0 0 0 0
brand_name 0 1 0 0 0
price 0 0 1 0 0
good_housekeeping_seal_y_n 0 0 0 1 0
money_back_guarantee_y_n 0 0 0 0 1

Using Conjoint Package in R

Abe, Ray and Ran now understand what is the requirement of orthogonal design.

“The truth is, to be able to create orthogonal design, there will be optimization algorithm involved. How will you approach it Ray?”

“First, I want to calculate all the correlation between variable. And then try to minimize the sum of the correlation. ” Ray response and then continue,” So, in the end, it will be nonlinear optimization model where the objective of the function is to minimize the sum of absolute correlation between variables. And because the absolute value always greater than zero, there will be no constraint.”

“Exactly, you perfectly capture the requirement of the model. Now, if we have to build our own optimization in R, it will be ugly and dirty. Luckily, there is one library in R that can help us to create orthogonal design”.

Dess take example of tea product which have four attributes and each product will have 2 to 3 levels with below specification:

  • Price = Low, Medium, High
  • Variety = Black, Green, Red
  • Kind = Bags, Granulated, Leafy
  • Aroma = Yes, No

Without orthogonal design, we will have 3 x 3 x 3 x 2 or 54 products profile.

# Tea example
experiment<-expand.grid(
price=c("low","medium","high"),
variety=c("black","green","red"),
kind=c("bags","granulated","leafy"),
aroma=c("yes","no"))

And using the conjoint library and specifying the orthogonal design, the number of product profiles can be reduced to only 9.

library(conjoint)

design_or=caFactorialDesign(data=experiment,type="orthogonal")
> print(design_or)
price variety kind aroma
4 low green bags yes
9 high red bags yes
10 low black granulated yes
17 medium red granulated yes
21 high black leafy yes
23 medium green leafy yes
29 medium black bags no
42 high green granulated no
52 low red leafy no

As shown by Abe in the correlation calculation of the Spot Remover case, Abe can change the attributes to its level.

> caEncodedDesign(design_or)
price variety kind aroma
4 1 2 1 1
9 3 3 1 1
10 1 1 2 1
17 2 3 2 1
21 3 1 3 1
23 2 2 3 1
29 2 1 1 2
42 3 2 2 2
52 1 3 3 2

And calculate the correlation

> cor(caEncodedDesign(design_or))
price variety kind aroma
price 1 0 0 0
variety 0 1 0 0
kind 0 0 1 0
aroma 0 0 0 1

“Everyone understand?”

Abe, Ray and Ran agreed.

“Then next, lets discuss how our analysis can be used to guide our marketing product strategy.” Dess smile and close this sessions.

Source:

https://cran.r-project.org/web/packages/conjoint/conjoint.pdf

--

--