Jake Gloudemans

Predictions #3

July 19, 2023 at 11:41 AM

We’ve entered the third week of the Metaculus Quarterly Cup! I had a few questions resolve my way this week, but took a hit when the Black Sea Grain Initiative wasn’t extended, which I initially gave an 87% chance of happening. I made a bunch of new predictions covering the next Starship launch, the Vegas Sphere, the Stockholm summer sun, and more… let’s dive in!

Current Metaculus Stats
Total Points 128 (-55)
Overall Rank 5708 (+1233)
QC Resolved Qs 3
QC Open Qs 9
QC Rank 28 / 223 (-20)
QC Take 1.0%

New Predictions

Deep dive of the week - When will the second Starship test flight occur?

This question resolves as the date and time of the next successful or failed Starship orbital test flight according to Space Launch Now

Context:

Starship is a giant rocket currently being developed by SpaceX. It’s the tallest, most powerful rocket ever built, and is designed to be fully re-usable. Starship consists of two main sections, the ‘Super Heavy’ booster stage, and the second stage which is also sometimes called ‘Starship’.

The rocket, along with the ‘Starbase’ launch complex it launches from in Boca Chica, TX, have been very publicly in development over the last several years. SpaceX has progressed from launching a little ‘Starhopper’, to larger test vehicles, eventually launching and landing a full Starship upper-stage.

Aerial view of fully stacked Starship with Super Heavy Booster at Starbase launch complex in Boca Chica, TX
Fully stacked Starship with Super Heavy Booster at Starbase launch complex in Boca Chica, TX. Photo from SpaceX

SpaceX launched a fully stacked Starship rocket (Super Heavy booster + upper stage) for the first time in April of this year, attempting to reach orbit. The rocket left the pad and reached an altitude of 39 km, but failed to reach orbit and exploded about 4 minutes after launch. Some version of this outcome was expected - new rocket designs historically have never reached orbit on their first attempt.

This question is asking when the next orbital launch attempt will occur - successful or not. Several notable things went wrong during the first attempt, which are relevant to this question:

  • The pad experienced very significant damage, as the engine exhaust carved a massive hole into the ground underneath the vehicle.
  • The rocket’s “flight termination system” didn’t work correctly. This is a safety feature required for all rockets that’s supposed to instantly detonate the vehicle. In this case, it took over 40 seconds from when the FTS was triggered for the vehicle to explode
  • As a result of the pad damage, the launch created more debris and scattered that debris farther than expected. This led environmental groups to sue the FAA, alleging that they should have required SpaceX to do a more thorough environmental review before approving the launch

Reasoning:

Given SpaceX’s history of rapid iteration, they will probably make their next attempt as soon as they are technically ready and legally allowed. So what are the main technical and legal barriers?

  • Pad repairs & upgrades: The first launch attempt caused major pad damage. This was largely because SpaceX decided not to use a water deluge system, a system used at other launch facilities that protects the pad from the extreme force and heat of the rocket exhaust by pumping vast quantities of water across the pad during launch. SpaceX has now changed course and is building a novel type of deluge system at the pad.

    Based on video footage and reporting, SpaceX has already repaired the hole in the pad’s concrete base, installed a giant steel plate which both shields the concrete and functions as a “shower head” for the new deluge system, and have started testing the new deluge system.

    They’ve also been working through a bunch of other repairs - rebuilding fuel tanks that were damaged, upgrading the launch mount (the pedestal the rocket sits on), and upgrading the “quick disconnect” which supports the top of stacked rocket before launch.

    Of all the repairs, the deluge system seems to be the most likely to dictate the timeline of the next launch. This is where the heaviest damage occurred last time and it's a novel system that may require extensive testing. That said, they seem to be progressing quickly and are already testing the system. So I’m estimating about 25% chance of pad readiness in 1 month, 50% chance in 2 months, and 75% chance in 3 months.

    (Practical Engineering made a great video explaining what went wrong with the pad during the first launch attempt - highly recommended!)

  • Rocket readiness: SpaceX is making lots of changes to the rocket - a new electronic thrust vectoring system, faster engine throttling during launch to minimize pad damage, a change to the second stage separation/ignition process, and surely hundreds of other smaller changes. That said, I don’t expect rocket-readiness to delay the next launch significantly. I’m guessing that real-world test data is a big bottleneck right now in terms of vehicle development. They already have several newer versions of the rocket queued up, so provided they aren’t destroying the pad with every launch, they’ll pretty much try to launch as soon as possible with each new rocket.

    However, there is still routine pre-launch testing they'll need to complete before the next attempt, such as static firing of the Super Heavy booster. This adds some time on top of the pad repair timeline, as they can only do this testing once the pad can handle it. Testing may also be more involved this time around since they’ll likely be testing the new water deluge system along with the rocket. So there’s definitely possibility for a delay here if issues come up during testing.

    I estimate a 25% chance of rocket readiness at 2 weeks, 50% at 3 weeks, and 75% at 5 weeks after the pad is ready.

  • Legal issues: Even if the pad and rocket are good to go, legal issues could delay the next launch. I gather there are basically 2 main legal hurdles they’ll need to resolve before the next attempt

    • Flight termination system recertification: after the first launch attempt, Elon Musk speculated that this could end up being what holds up the next launch. Th FTS failure in the last launch was a pretty serious issue (and not something I can remember seeing happen in any previous rocket tests), and the FAA will probably want very strong evidence that SpaceX has fixed the issue. I’m very uncertain about what timeline to put on this, but for now I’m going with 25% of recertification in 1 month, 50% in 2 months, and 75% in 6 months.
    • Environmental lawsuit: A byproduct of the pad damage during launch was that an unexpectedly large amount of particulate matter was scattered around the region, including in the town of Port Isabel, 6 miles from the launch site. Environmental groups have sued the FAA for granting SpaceX a launch license and are attempting to require a more thorough environmental review before further launches can occur. I think this has a low chance of succeeding (environmental groups have been going after SpaceX the whole time it’s been working in Boca Chica), however, if it does succeed it would almost certainly push the launch outside the time window for the question (Oct. 1).

I should also mention that the likelihood of scrubs due to weather or technical issues will become important to the forecast once the approximate launch date is known. However, I don’t think it’s useful to try and precisely model these until we have a better idea of an actual launch date. For now, I’m just lumping this in with “rocket readiness.”

Forecast:

Now, how do we bring this all together into a single forecast? If either of the major legal issues takes longer to resolve than rocket and pad readiness, rocket and pad readiness don’t matter at all. On the other hand, if the legal issues are resolved soon, they don’t matter at all and rocket and pad readiness are all that matter.

My approach is to make 3 parallel forecasts - one for each of pad/rocket readiness, FTS recertification, and the environmental lawsuit - each assuming that the factor in question ends up with the longest timeline. I then create the total forecast by weighting the 3 individual forecasts based on how likely they are to actually be the hold-up. I can then adjust the individual forecasts and the weights over time as we get more information.

Unfortunately, it’s pretty much impossible to be precise using Metaculus’s tool for continuous question forecasts, but I can at least input something vaguely similar to what I’ve described. As of now, my combined forecast gives a

  • 25% chance of next test by August 21
  • 50% chance of next test by September 25
  • 54% chance the test occurs before the question resolves on October 1

Will the U2 concert at The Sphere on September 29, 2023 take place?

There’s a new music venue in Las Vegas called ‘The Sphere’, which is now the largest spherical building in the world. If you haven’t heard of this thing yet, you absolutely must go look it up right now. The outside is covered in an LED screen, and since it lit up for the first time on July 4th, they’ve displayed all sorts of cool things - the earth, the moon, a basketball, a realistic-looking human eyeball

The venue is set to open on September 29th with a performance by U2. U2 will be doing a ‘residency’ there, meaning they’ll be performing every few days for an extended period. The venue has all sorts of wild technology, including the largest LED screen in the world, haptic feedback in the seats, and “4d features, including scent and wind.” Because so much unique audio-visual prep work is required to perform there, most performers will probably do some kind of residency like this.

This question basically comes down to:

  • Will the venue be ready?
  • Will U2 ready?
  • Are there serious technical difficulties before the concert starts that cause it to be postponed?

From the information I could find, it seems very likely the venue will be ready. The exterior is complete and functional. From pictures in early 2022, interior construction was already very far along. The venue operators did a walkthrough earlier this year, apparently to make sure the venue was on track to be ready before they started selling tickets. At this point, with the first several concerts already sold out, it would be a pretty serious blunder to cancel the event. If they were going to postpone, they probably would have done so already.

U2 not being ready is even more unlikely. They have very rarely cancelled events in the past (less than 0.5x per year) - most recently in 2018 because Bono was having voice issues. They will have had a very long time to prep for this concert, so even with the added technical challenges, there’s a very low chance of them being the cause of a cancellation.

Technical difficulties the day of the concert are possible, but again, they will have had a long time to prepare, this being the first performance at the venue. Also, the concert just has to start for the question to resolve Yes. Technical issues mid-show don’t matter here.

Prediction: 96% Yes

Will Ohio Issue 1 pass in the August 8, 2023 election?

Issue 1 is a ‘legislatively-referred constitutional amendment’ (as opposed to a ‘voter-initiated’ amendment) on the ballot in an August special election in Ohio. If approved by a majority of voters, it would increase the threshold for future constitutional amendments in the state from a simple majority to 60% (plus some other smaller changes which would make amendments harder to pass).

The very important context for this vote is that there’s another amendment on the ballot in the November general election that would add the right to “carry out one’s own reproductive decisions”, i.e. get an abortion, to the state constitution. Issue 1 was proposed by Republicans in the legislature in an effort to make it harder for the abortion amendment to pass.

To estimate Issue 1’s odds of passing, I’m looking at:

  • Similar ballot initiatives from recent years
  • Abortion-related initiatives from recent years
  • Polling related to Issue 1
  • Funding / advertising for each side

Since 2018, there have been 12 related ballot initiatives that have been voted on in different states:

Year State Change Yes No
2022 SD Require 60% threshold for tax increase ballot measures 33% 67%
2022 AR Require 60% vote to approve ballot initiatives 41% 59%
2022 AZ Allow Leg. to repeal voter-approved ballot measures 36% 64%
2022 AZ Initiatives must be limited to a single subject 55% 45%
2022 AZ Require 60% threshold for tax-related ballot measures 51% 49%
2020 FL Voter-approved amendments need to be approved 2x 48% 52%
2020 ND Voter-initiated amendments also then need legislative approval, or a second vote 38% 62%
2020 AR Increase distribution reqs., various other changes 44% 56%
2018 SD Require voter approval if legislature changes voter-approved initiative 45% 55%
2018 SD Add single-subject rule for constitutional amendments 62% 38%
2018 SD Require 55% vote to approve amendment 46% 54%
2018 SD Prohibit out-of-state persons/entities from making contributions to ballot question committees 56% 44%

Among these:

  • Initiatives to increase vote thresholds for ballot initiatives are 1/4 since 2018 (25%)
  • Amendments that generally raise the threshold for amendment approval are 3/10 (30%)
  • Excluding “single subject” initiatives (which arguably just simplify things rather than making approval more difficult), that decreases to 1/8 (12.5%)

As a baseline, I’m going with 20% chance of approval for amendments like this one.

To some extent, this vote is actually about abortion access - the motivation behind it is to make the abortion amendment harder to pass in November. In 2022 (the first election since Roe vs. Wade was overturned), 6/6 abortion-related initiatives went the pro-choice direction, including in red states like Kentucky and Kansas. I think this reduces the probability of Issue 1 passing, though not too much since it is a step removed from the actual abortion amendment and not all voters will necessarily make the connection. (15% Yes)

There’s been 1 poll so far, which showed 38% Yes, 37% No, and 26% undecided on the issue. I’m not weighing this single poll very much due to the small sample size, high level of undecideds, and generally unreliable nature of polling around issue questions like this. That said, it’s definitely worth monitoring the polling as advertising and ramps up closer to the election. (17% Yes)

Finally, it sounds like at least so far, the ‘No’ side has bought more advertising and is likely better funded. Finance reports released on 7/27 should shed more light on this. (14% Yes)

Prediction: 14% Yes

When will ‘Browse with Bing’ be reintroduced on ChatGPT?

‘Browse with Bing’ was a feature that ChatGPT premium subscribers could use that allowed ChatGPT to perform internet searches and browse results to better answer questions about information past its training cutoff window (sometime in 2021). They disabled the feature on July 3rd, providing a brief, cryptic explanation stating that it could “occasionally display content in ways we don't want” and that they were “working to bring the beta back as quickly as possible.”

Most reporting about this speculates that the Browse with Bing web crawler was likely accessing paywalled pages, and that with clever prompting you could get it to read out that content verbatim. Enough companies were probably complaining to OpenAI or threatening legal action that they disabled the feature.

To make a prediction here, I’m considering (a) how easily fixable the issue is, and (b) how much pressure OpenAI feels to put a version of the tool back up quickly. I definitely don’t have strong convictions on either of those questions, so my forecast here has a wide distribution.

In terms of fixability, this seems like it could be rather difficult, although I can imagine some relatively simple “quick fixes” like letting sites add themselves to a blacklist or something that ChatGPT’s crawler would ignore. Also, Bing and Bard don’t seem to have this same problem, which is evidence that it’s fixable.

I doubt that OpenAI feels strong pressure to restore the tool very quickly. I don’t suspect that Browse with Bing was a major driver of ChatGPT use - it was already a premium feature, and I suspect many of ChatGPTs paying users are programmers who care more about the new Code Interpreter tool and basic GPT-4 access than the web browsing feature. On the other hand, there’s heavy competition right now between OpenAI, Google, Microsoft/Bing, and Anthropic to capture as much of the LLM market as possible, and disabling tools that your users doesn’t help there.

My ballpark estimate is 1-2 months mean time to restoration, with a long tail. It seems like a hard enough problem that it isn’t a 1-2 week patch. OpenAI has shipped features very quickly in the past so I can see them fixing it in a month or two, but could also see it being low priority for them and taking many months to restore.

Prediction:

  • 25% by Aug. 4
  • 50% chance by Aug. 27,
  • 75% chance by Sept. 27
  • 23% chance after question resolves on Oct. 1

Will Crook Manifesto by Colson Whitehead be on the August 13th New York Times Bestseller List (Hardcover Fiction)?

Crook Manifesto is a new book by Colson Whitehead, author of The Underground Railroad, Nickel Boys, and several other novels. Crook Manifesto has a publish date of July 18th, which I think means it’s first eligible for the August 6th Bestseller list. The August 13th list would correspond to the book’s first full week of sales.

The New York Times Bestseller list is published weekly, and ranks books in different categories based on a “proprietary method” which ostensibly is based on book sales, as well as ‘other factors’. NYT has stated that the list is editorial and not purely mathematical.

My impression is that the list is very heavily influenced by the press coverage that books/authors receive in the run-up to their release. Those factors, in turn, are heavily influenced by the author’s previous track record and coverage. It’s hard for authors to publish their first bestseller, but once they have, their subsequent books are much more likely to be bestsellers, since they’re likely to get more press as a now well-known author.

Whitehead’s 2011 novel, Zone One made the bestseller list according to Wikipedia, though I haven’t been able to track down when. His next book, The Underground Railroad was massively successful, made the bestseller list the first week eligible, and stayed on the list for many weeks thereafter. His next two books, Nickel Boys and Harlem Shuffle followed suit, both making the list the first week eligible and staying there for many weeks.

Based on this track record, I already give Crook Manifesto a high chance (maybe 80%) of making the list. Press coverage of the book moves the odds even higher for me. Crook Manifesto appears on basically every “notable July books” or “notable books of the summer” list I could find, typically with a ranking near the top. If you go to Amazon.com and click “Books”, it’s immediately visible in the “Best books of July” banner. And, at the time of writing this, when you go to the NYT bestseller list, there’s a link to a positive review of the book directly under the list!

The combination of Whitehead’s recent track record, the widespread advertising of the book (which should lead to actual sales), and the editorial nature of the bestseller list (the book is already editorially popular), I think the chance of this is very high.

Prediction: 94%

Will Stockholm see more than 500 hours of sun in the summer of 2023?

This question is based on a Swedish data journalism website that’s tracking how many hours of sunlight cities across Sweden get throughout the summer. According to their data, which goes back to 1983, the average hours of sunlight for Stockholm over the full summer is 425.9. In the first week of summer, Stockholm received the 2nd most sunlight for that week since 1983 at 104 hours, far above the historical average of 62.4 (presumably what prompted the question). Stockholm has exceeded 500 hours of summer sunlight 6 times since 1983, with 3 of those occurring recently in 2018, 2020, and 2021.

As of the question opening, 3 of the 7 weeks classified as ‘summer’ have elapsed. I’m taking a purely quantitative approach for this one. I calculate the mean and standard deviation for total sunlight hours in the remaining 4 weeks. Assuming a normal distribution (plotting the historical data, it does appear to be roughly normal), I calculate how many standard deviations above the historical average the remaining weeks need to be in order to exceed 500 hours - this comes out to a roughly 20% chance.

Week 4 of 7 is actually mostly over, and while I couldn't find a precise value, it seems like it’s going to end up above the historical average. On the other hand, the 10-day forecast indicates that week 5 of 7 will likely be at or below the historical average. If I assume weeks 4 and 5 together are about average, this drops the chance of exceeding 500 hours down to 6.7% (since we’ve reduced the variance quite a bit).

Prediction: 20%, but I probably will lower this soon unless week 4 ends up way above average.

Resolved Predictions

Will the winner of stage 9 of the Tour de France also lead the overall race at the end of the stage?

Initial prediction: 46% Yes

Resolved: No

I was directionally correct on this one, but less confident than the community prediction. Overall I think my reasoning was solid, but there were a few nuances I missed that I only learned about while watching the race. (By the way, watching was more fun than expected - apparently this year’s Tour is one of the best in a long time!)

In cycling, it’s super important to stay with a group of riders in order to benefit from drafting. The bulk of the riders form the ‘peloton’ which stick together for most of the race. On some days, a “breakaway” group will separate itself from the peloton - if this happens, there’s strategy involved for riders to decide whether to try and join the break, or stay with the peloton.

In this case, a breakaway formed right away and quickly built up a large gap from the peloton. The way things shook out, none of the riders that were contending for the overall lead were in the breakaway (none were within 20 minutes of the overall lead). This meant there was no reason for Vingegaard or Pocagar (the two favorites), or other contenders to push themselves to try and catch the breakaway. They still raced each other at the end, but they weren’t competing for the stage lead. So almost as soon as the race started, the odds of the question resolving Yes went to near zero.

So knowing whether there’s likely to be a breakaway vs. not, plus a bunch of other team strategy stuff that I didn’t really consider at all, could have led to a more confident prediction.

Will General Sergei Surovikin be stripped of his command by July 11th?

Initial prediction - 6% Yes

Resolved: No

I think my reasoning on this one holds up quite well. There was one actual primary source that all the speculation about Surovikin was based on, and in the weeks since the question, there has been basically no new information. Broadly speaking, the question was asking about a highly specific thing happening in a very short time window, so I think a very low probability was reasonable here.

Will the Black Sea grain deal be extended before July 19, 2023?

Initial prediction: 87% Yes

Resolved: No

Took a hit on this one. I adjusted pretty frequently throughout the question period and was down around 60% at the halfway point and 30-45% on the last day. Of course, 87% yes is still 13% no, but I think there were some issues with my initial prediction. Generally, I think I…

  • under-weighted the possibility of Russia temporarily withdrawing and then eventually rejoining (which still may happen). This is a case where temporary withdrawal doesn't have drastic immediate effects - the effects will play out slowly. So the difference between withdrawing for a few days isn’t that different from extending the deal at the last minute
  • under-appreciated how much the broader grain market would have adapted since the deal was initially signed. Surely there are various workarounds and contingencies that all parties involved have planned for, knowing that this deal was fragile. So as time passed, the effects of the deal ending have diminished, at least somewhat.

Still, I think this was a tough one to predict. Scenarios where the deal was ultimately extended could have looked really similar to what actually played out - brinkmanship to try and extract whatever concessions you can, then extending the deal at the last minute.