How to Estimate Wedding Guests with Statistics

“If I invite X guests to my wedding, how many should I expect to come?”
“How many guests can I invite without going over capacity?”

Two of the more stressful and important questions during wedding planning. Invite too many and you’ll find yourself over-budget or out of space. Invite too few and you may have friends or extended family feeling left out. If you’re facing these questions, I have a solution for you. Read on!

A little bit of background: my wife and I found ourselves facing these questions as we planned our wedding about a year ago. With a quick Google search, many rules of thumb popped up: 70%, 75%, 80%, 85% invited guests will attend. If guests are coming from out of town or it’s a destination wedding, choose a lower number. Several factors made me uncomfortable trying to cleanly apply these rules of thumb:

All of my family and most of our good friends were out of town but most of my wife’s family was in the Bay Area. How could we apply different rules to different groups?
The wedding was in Tahoe. For guests in the Bay Area it’s a few hours’ drive. How would this affect the attendance rate compared to farther destinations which require a flight?
Most importantly, we had a very hard cap on the number of guests that could attend. If too many accepted, some invites would have to be rescinded (gulp).

Solution

As my contribution to the wedding planning process, I built a Google Sheet (template here) to to solve these questions. Let me share a scrubbed version:

Simply, each invitation is given a probability of attending. This offers total flexibility to apply different rules to different groups or even individuals. After making a wish list of everyone we might even consider inviting, we began estimating. We did this in a multi-step process:

Start everyone at 75%.
Update to 100% obvious guests (think bride & groom, parents, close family, groomsmen, bridesmaids, etc).
Bump up family, Bay Area guests and close friends by varying degrees assuming they’re more likely to attend.
Finally, make individual adjustments. For example, one of my good friends spends a lot of time working overseas so I set him only to 20% (in the end he couldn’t come).

Decisions

With the probabilities plugged in, we can turn to the Overview tab and make data-driven decisions! If you want to geek out about the math, jump to the bottom. This was useful primarily during two periods:

Creating a realistic invite list from the overall wish list.
Inviting additional guests as we received replies and felt confident we wouldn’t exceed our limit.

Part 1: Creating a realistic invite list
Our initial guest list was too large and we unfortunately could not invite everyone we wanted. Prediction intervals showing different probabilities of going over capacity made those conversations easier and gave us clear, objective targets. We then organized guests into different groups to make it easier to narrow the list. I modeled the sample groups roughly after our list in prioritized order:

Family: non-negotiable invites to family as well as the wedding party.
Team A: Close friends, family friends, etc.
Team B: People we would like to invite assuming there’s space.

In the scenario above, let’s assume you have a limit of 100 so some cuts are required. Given the numbers, some of Team A and all of Team B can’t be invited yet. The real-time numbers help determine where to draw the line.

Part 2: Inviting additional guests
We updated probabilities to 0% (not attending) and 100% (attending) as we received replies. Over time, this reduced the chances of going over capacity and gave us the confidence to invite more guests.

Results

So how did our model fare? In the end, 77% of those initially invited attended compared to our predicted 85%. I attribute this mainly to our over-confidence that almost all extended family and close friends could attend.

Below is our expected attendance over time as we made adjustments and received replies. You can see at the end it changes a lot when we get responses and have the data to invite more friends.

The model was immensely valuable because it gave us an objective way to wrap our heads around the problem. While our prediction didn’t perform significantly better than some of the rules of thumb above, it had clear advantages. The model had prediction intervals that helped us understand the probability of going over capacity. It gave us the confidence to invite more guests once we knew we had space. The model also helped us talk objectively about how many people from different groups were expected to attend and ensured invites were equally distributed.

If you’re getting married, feel free to make a copy of the template for your guest list! I hope you find it useful and that it makes wedding planning a little easier.

The Math for Nerds

The key numbers on the Overview tab use modified equations for a Poisson binomial distribution. That’s stats speak for an aggregation of independent trials where the outcome is binary (attending or not attending in this case).

The first modification I made ensured independent trials. If we simply applied a probability to each guest, the model would be inaccurate because there are very strong correlations (nearly 1) for certain guests’ trials. For example, a friend and their significant other are very likely to either both attend or both not. To account for this, each invitation is a trial rather than each person.

This creates a second problem: the equations are now wrong! Having a guest and their significant other within a trial breaks the assumption that each trial results in either 0 or 1, not 0 or 2. Let’s look at each equation:

Mean. The mean is used to calculate the expected guests. This helps answer the main question “If I invite X guests to my wedding, how many should I expect to come?” To get the right number, instead of just summing the probabilities, each probability is multiplied by its corresponding number of guests.

Standard Deviation. This powers the prediction intervals below. My stats knowledge was fuzzy, so instead of solving directly with equations and algebra, I brute-forced it with trial-and-error. With Monte Carlo simulations, I found the right term to modify the default equation: g_i². This one wasn’t super intuitive and in fact only this week did I discover my original equation was wrong! Originally I just included g_i. Glad I checked my math.

Playing with python made it much easier to find the correct result and check my work. It produced the following results in a scenario where all invites have 2 guests:

Computed mean: 150.00
Computed stdev: 8.66
Monte Carlo mean: 149.99
Monte Carlo stdev: 8.67

The code is available on GitHub.

Prediction Intervals. These help contextualize the standard deviation and answer the second question: “how many can I invite without going over capacity?”. For example, in the screenshot above, there is a 20% chance of having more than ~118 guests. A Z-table was used to offset from the mean.

Values
n = number of invitations
g_i = number of guests on a given invitation i
p_i = probability of all guests attending on a given invitation i

Parting Thoughts

I would be remiss if I didn’t mention a few things! First, when I originally tackled this problem, my friend Joel got me started on the right track with the Poisson binomial distribution. Thanks buddy! Also while researching for this blog post, I discovered others have looked at this problem too. Damjan Vukcevic wrote about this topic in 2013. Kudos to him for helping me validate my assumptions. The article is available online.

Second, we had an incredible wedding that was truly the best day of my life! All the planning was worth it and it was so special to share that day with so many family and friends that we love. Since then it has been an incredible nine months of marriage. To my wife if she reads this far, I love you!

Lastly, I want to acknowledge how fortunate and privileged we are to have our wedding as we did. In the grand scheme of things, “how many people can I invite to my wedding?” feels like a small problem compared to so many others in the world and one so many could never have. I hope for those that have the same fortune as us, this makes the process a little smoother and makes it easier to focus on what matters: celebrating love.

Correction: One equation was updated to reflect that we want standard deviation, not variance. I also clarified how I got the standard deviation.