The ball is round and the game lasts for 90 minutes.
After the game is before the game.
Sepp Herberger.

m_n\{1,2,X\} is a softmax, I suppose. Above is not.

My concern is:

Can we extrapolate? Can we say, because yesterday it was raining, today it will rain?
Can we just say, because Germany won the last championship, it will this one also?
Clearly our past data say so, but we know its not. I claim, if we fitting a model it does
overfitting.

Thanks Bob!
I already knew both the Andrew World Cup’s model and the Milad model for the Premier League as well. In fact I have been largely inspired by these models and I enjoyed reading.

Actually, instead of the softmax parametrization, I used the alternative multinomial logistic parametrization here, (https://en.wikipedia.org/wiki/Multinomial_logistic_regression) modeling K-1 =2 probabilities and the K-th (the draw in this case ) as:

1/{1+\sum_{k=1}^{K-1}exp { beta_k x}

However, I realized now there is a typo since i did not exponentiate the etas in the denominators, and the sum is from 1 to K: thus, thanks!

As I motivated in the Andrew’s blog in the comments section (http://andrewgelman.com/2018/06/15/stan-goes-world-cup/#comments), this table only represents the estimated probabilities obtained after simulating the World Cup 10000 times before each game is played… Thus, the reason why Germany is favored is mainly due to a high FIFA ranking, rather than past historical results

The nomenclature around all this is very inconsistent and confusing. What you’re calling “multinomial logistic” is just softmax with one of the inputs pinned to 0. The 0 in the version you’re using (1 after exp(0)) identifies the model, but comes with the disadvantage that priors become asymmetric. There’s a discussion in the manual around K vs. K - 1 parameter parameterizations of multinomial logistic regression.

\eta_{n.} not have an intercept resp. home advantage parameter. Is there any reason for that?
At the same time you have u_{att} in att_t, same for defense. This is a constant for all t, so
both \eta_{n.} gets added these. Is this a case for an identifiability problem?

Mmh, I still have to think about it. Anyway, at the time being, mu_att and mu_def do not appear in the model anymore.
See my website for model and predictions updates about the quarter of finals starting today!

I had no idea what sensor fusion was, thanks for the suggestion!

I saw your model a while ago, I was impressed but couldn’t really follow it.
I would like to try it myself now that I have a little more experience modeling, but the links seems to be broken. Could you upload the models again?

one thing which Bob pointed out while I was working on this is that the model is a variant of the Bradley-Terry model used infer team abilities where each team has an estimated ability modeled as the expected number* of goals that they will score per game. The difference between team abilities predicts who will win the match.
(*number of goals is modeled as a continuous value, which isn’t correct)

The issue with the soccer models is not discrete parameters; it’s discrete data. Stan has no problem with discrete data. The only difficulty is that then we can’t use a simple normal or t distribution. The simplest way to proceed with a full generative model would be to use a continuous distribution with rounding, but then the likelihood is more complicated and expensive to compute, as it will be based on the normal or t cumulative distribution function. In practice, it makes more sense to just fit the continuous model to the data, do rounding when simulating fake data or posterior predictive checks, and then check that nothing much is lost by the rounding. I did this when playing with the original World Cup model. In any case, if you do want to fit a model to the discrete data, no marginalization is necessary, as there will be no latent discrete parameters.

One can also use the Skellam distribution to model the (discrete) goal difference. It’s a bit slower than other approaches, but it worked kind of nicely whn I tried it. I think not much is gained in terms of predictive power though. Here’s a pretty straightforward Stan function for the lpmf:

functions{
real skellam_lpmf(int k, real mu1, real mu2){
int abs_k = abs(k);
real lp = -mu1 - mu2 + 0.5*k*(log(mu1) - log(mu2)) + log(modified_bessel_first_kind(abs_k, 2*sqrt(mu1*mu2)));
return lp;
}
}

and download the zip folder with the html files and all the R/Stan code to fit the models.

Yeah, this was my first, previous model for the Euro Cup 2016. Good memories! Though, the models for the World Cup 2018 I’ve posted above are better written in my opinion and clearer.

This is interesting, I am writing an R package to fit many alternative soccer models (Dixon & Coles, Bradley & Terry, Karlis & Ntzoufras, Baio & Blangiardo, Egidi et al., etc.), I could include your World Cup model as well, if you agree

As @andrewgelman said, it’s data, so that’s fine. Predicted number of goals will probably be an expectation and hence should be continuous.

I believe @andrewgelman likes these continuous approximations, even normal ones where you get the possibility of not only real-valued goals, but negative ones.

The alternative would be to have something like a Poisson or negative binomial or other count-based model of the data rather than a continuous approximation. Then the parameters of the Poisson (or alternative) would be continuous, but it’d be the right shape for the data.

The code is open source licensed under the new BSD license, so it doesn’t require our permission.