Conjoint — Regression Analysis for All Respondent

Amir Harjo
7 min readOct 22, 2023

--

Conjoint Series — Scene 4

Conjoint Series — Scene 1 — How to Understand Consumer Preference?

Conjoint Series — Scene 2 — Which Product are Most Important?

Conjoint Series — Scene 3 — Designing Orthogonal Survey

Conjoint Series — Scene 5— Market Segmentation using Conjoint Analysis

After lunch, Dess and rest of the team have a sessions at the collaboration room. They continue the discussion about conjoint analysis. So many insights aspect that still not delivered by Dess in the previous discussion.

“Now, lets continue our discussion.” Dess starting the lecture.

On previous example, we create the regression for only one response. Imagine that the survey was conducted publicly for the intended sample. Suppose there are 100 respondent. How should we do the regression? Should we aggregate the rank, and then run the regression for those aggregated rank? Or should we run the regression without aggregating the rank?

The answer is, we run the regression without aggregating the rank. Lets try to do it in R, with the “conjoint” package. We will do it step by step until from getting value of utilities from one respondent to the insights when the regression run for all respondent.

Regression and Insights

We will use tea data, similar the the tea data that we use on previous discussion, each with 2 to 3 levels.

  • Price = Low, Medium, High
  • Variety = Black, Green, Red
  • Kind = Bags, Granulated, Leafy
  • Aroma = Yes, No
# load library
library(conjoint)

# load tea data
data(tea)

When loading tea data, five different files will be loaded.

tpref — Vector of preferences (length 1300).

tprefm — Matrix of preferences (100 respondents and 13 profiles).

tprof — Matrix of profiles (4 attributes and 13 profiles).

tlevn — Character vector of names for the attributes’ levels.

tsimp — Matrix of simulation profiles.

Using this package, we can estimate parameters of conjoint analysis for one respondent using “caModel” and see the result.

> caModel(y=tprefm[1,], x=tprof)

Call:
lm(formula = frml)

Residuals:
1 2 3 4 5 6 7 8 9 10 11 12 13
1.1345 -1.4897 0.3103 -0.2655 0.3103 0.1931 1.5931 -1.4310 -1.4310 1.1207 0.3690 1.1931 -1.6069

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.3937 0.5439 6.240 0.00155 **
factor(x$price)1 -1.5172 0.7944 -1.910 0.11440
factor(x$price)2 -1.1414 0.6889 -1.657 0.15844
factor(x$variety)1 -0.4747 0.6889 -0.689 0.52141
factor(x$variety)2 -0.6747 0.6889 -0.979 0.37234
factor(x$kind)1 0.6586 0.6889 0.956 0.38293
factor(x$kind)2 -1.5172 0.7944 -1.910 0.11440
factor(x$aroma)1 0.6293 0.5093 1.236 0.27150
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.78 on 5 degrees of freedom
Multiple R-squared: 0.8184, Adjusted R-squared: 0.5642
F-statistic: 3.22 on 7 and 5 DF, p-value: 0.1082

“What do you say about the result Abe?” Dess ask Abe who is still looking at his monitor.

“I think we can say that we can read this result the way we read the result for the spot remover data. Only one response and it will not tell so much truth. I think, one key take away for the observation is that for responder one, none factor actually really important. Because the p-value of the regression all is not significant”

“Great observation”

Yes, when we do regression, we cannot expect that everyone weight or care so much about the product they bought.

Now, lets run the conjoint analysis for all 100 respondent. Remember, when we run conjoint analysis for all respondents, we are not aggregating the rank. We run regression for 1300 rows of data consist of 13 product profiles and 100 respondents.

> Conjoint(y=tpref, x=tprof, z=tlevn)

Call:
lm(formula = frml)

Residuals:
Min 1Q Median 3Q Max
-5,1888 -2,3761 -0,7512 2,2128 7,5134

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3,55336 0,09068 39,184 < 2e-16 ***
factor(x$price)1 0,24023 0,13245 1,814 0,070 .
factor(x$price)2 -0,14311 0,11485 -1,246 0,213
factor(x$variety)1 0,61489 0,11485 5,354 1,02e-07 ***
factor(x$variety)2 0,03489 0,11485 0,304 0,761
factor(x$kind)1 0,13689 0,11485 1,192 0,234
factor(x$kind)2 -0,88977 0,13245 -6,718 2,76e-11 ***
factor(x$aroma)1 0,41078 0,08492 4,837 1,48e-06 ***
---
Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1

Residual standard error: 2,967 on 1292 degrees of freedom
Multiple R-squared: 0,09003, Adjusted R-squared: 0,0851
F-statistic: 18,26 on 7 and 1292 DF, p-value: < 2,2e-16

[1] "Part worths (utilities) of levels (model parameters for whole sample):"
levnms utls
1 intercept 3,5534
2 low 0,2402
3 medium -0,1431
4 high -0,0971
5 black 0,6149
6 green 0,0349
7 red -0,6498
8 bags 0,1369
9 granulated -0,8898
10 leafy 0,7529
11 yes 0,4108
12 no -0,4108
[1] "Average importance of factors (attributes):"
[1] 24,76 32,22 27,15 15,88
[1] Sum of average importance: 100,01
[1] "Chart of average factors importance"

We get the overall parameters of conducting conjoint analysis. When we run this analysis, we will get three different insights:

Insight One: The part worth utilities of each model. Part worth utilities is the coefficient of regression. In each attributes, it always sum 0. For example in attributes price, we have three level: low, medium and high. Because those are factor (categorical) variable, only two of the variable will be the input for regression — medium and high. Value for medium and high are -0.1431 and -0.0971 respectively. So to make sum is 0, low utilities value will be 0.2402. This number mean that low price is more preferred compare to medium and high price. We can get the value from the output chart.

Insight Two: The importance of each factor. In the example for tea, we have 4 different attribute: price, variety, kind and aroma. The importance value of variety is 32.22% showing that this is the most important attribute when customer choosing tea.

Insight Three: Chart of average factor importance. Which is same as insight two but in chart.

We see that R-squared regression is quite low. But that is expected due to different person might value different attribute differently and it resulting in low R-squared and low p-value for each level attribute.

Since each person value different attribute in different way, there might be a way to cluster of respondent in different cluster.

Market Segmentation

Dess wanted to show how the result from the conjoint analysis can be used to create market segment.

“To create the segment, first we should calculate the part worth utilities for each respondent. Then, using k-means clustering, we group the respondent into several cluster with the closest similarity”

“Abe, please go ahead and try it in R” Dess instruct Abe.

Abe open POSIT and run few code. First he run to get part worth from 1 respondent.

> caPartUtilities(y=tprefm[1,], x=tprof, z=tlevn)
intercept low medium high black green red bags granulated leafy yes no
[1,] 3.394 -1.517 -1.141 2.659 -0.475 -0.675 1.149 0.659 -1.517 0.859 0.629 -0.629

So far so good. Then he check if he only run for first 5 respondent and the result seems what is expected

> caPartUtilities(y=tprefm[1:5,], x=tprof, z=tlevn)
intercept low medium high black green red bags granulated leafy yes no
[1,] 3.394 -1.517 -1.141 2.659 -0.475 -0.675 1.149 0.659 -1.517 0.859 0.629 -0.629
[2,] 5.049 3.391 -0.695 -2.695 -1.029 0.971 0.057 1.105 -0.609 -0.495 -0.681 0.681
[3,] 4.029 2.563 -1.182 -1.382 -0.248 2.352 -2.103 -0.382 -2.437 2.818 0.776 -0.776
[4,] 5.856 -1.149 -0.025 1.175 -0.492 1.308 -0.816 -0.825 -0.149 0.975 0.121 -0.121
[5,] 6.250 -2.333 2.567 -0.233 -0.033 -0.633 0.667 -0.233 -0.333 0.567 -1.250 1.250

He then can proceed to get partial utilities for all 100 respondent.

caPartUtilities(y=tprefm, x=tprof, z=tlevn)

He then thinking to proceed to create segment using k-means clustering. But luckily, the package itself already provide function to create segment. For example, we want to create three segment from 100 respondents, Abe could just run this code and the segment will be available.

clu = caSegmentation(y=tpref, x=tprof, c=3)

And then check what is the insight in each segment by running the conjoint analysis 3 times

# profile segment 1
tprefm1 <- tprefm[clu$sclu==1,]
tpref1 <- data.frame(Y=matrix(t(tprefm1), ncol=1, nrow=ncol(tprefm1)*nrow(tprefm1), byrow=F))
Conjoint(y=tpref1, x=tprof, z=tlevn)

# profile segment 2
tprefm2 <- tprefm[clu$sclu==2,]
tpref2 <- data.frame(Y=matrix(t(tprefm2), ncol=1, nrow=ncol(tprefm2)*nrow(tprefm2), byrow=F))
Conjoint(y=tpref2, x=tprof, z=tlevn)
# profile segment 3
tprefm3 <- tprefm[clu$sclu==3,]
tpref3 <- data.frame(Y=matrix(t(tprefm3), ncol=1, nrow=ncol(tprefm3)*nrow(tprefm3), byrow=F))
Conjoint(y=tpref3, x=tprof, z=tlevn)

Importance attributes for segment 1 — variety is the most important attributes

Importance attributes for segment 2 — kind is the most important attributes, followe by price

Importance attributes for segment 3 — variety and price are the most important attributes

“But, does 3 segment is correct answer? It could be 4 or 5 different segment that available in the data.” Ray complained because Dess seems to simplify the problem here.

“Excellent questions. We will calculate the estimated number of segment in next discussion”.

Source:

https://cran.r-project.org/web/packages/conjoint/conjoint.pdf

--

--