UK Sales: +44(0)20 3048 3701  Canada Sales: +1 (647) 239-9872

Brexit result Big Data predictions updated

In our last post, my colleague Ben Pritchard referred to some pioneering predictions of the Brexit result made by Dr Xuxin Mao, an Invennt intern . It turned out to be wrong. Not to be disheartened Dr Mao has reviewed his original work and has uncovered the reasons for the poor Brexit prediction result. He has also updated his methodology to improve future predictions.

Big Data Brexit Prediction with Updated Information

TRUST framework used for Brexit prediction

TRUST Framework

The methodology for the Brexit prediction is based on statistical modelling, behavioural economics, natural language processing and Big Data analytics. Xuxin used the Topic Retrieved, Uncovered and Structurally Tested (TRUST) framework (Figure 1) to generate solid models and robust forecasts by retrieving useful information from Internet Big Data. He uncovered  key decision-making factors, and tested these factors with other available data in an advanced statistical model. The TRUST framework has been used to successfully predict the 2014 Scottish referendum, the 2015 UK general election and the 2016 Scottish parliament election. It has also  helped in measuring the construction output and price at ONS and UCL, and also to  predict life insurance demand at L’Institut Europlace de Finance for Groupam.

The first part of the TRUST approach relies on the text mining a very large database of newspapers in print, along with their web-based counterparts, using sophisticated algorithms to represent the topics that will motivate voters. The results are summarised in Table 1 for various periods of the campaign. Xuxin found that EU immigration emerged as a key issue from 22 May to 11 June, and then again from 19 June, the same periods when the Leave side was  generating momentum in the polls and Remain was trailing in the polls. While David Cameron and economy-related topics were key searches in nearly all weeks, Boris Johnson and Labour party also attracted voters’ attention frequently.


Table 1: Text Mined Topics on the EU Referendum during the EU Referendum Campaign Period

Period UK Economy EU Trade Single Market EU Immigration David Cameron Boris Johnson Labour Party
15 Apr-14 May Yes Yes Yes No Yes No No
15 May-21 May Yes No Yes No Yes Yes No
21 May-28 May Yes Yes Yes Yes Yes Yes No
29 May-4 Jun Yes Yes Yes Yes Yes No Yes
5 Jun-11 Jun Yes Yes Yes Yes Yes Yes Yes
12 Jun-18 Jun Yes Yes Yes No Yes No Yes
From 19 Jun No No Yes Yes Yes Yes No



Figure 2 Web Search Interest[1]: EU Immigration (15 May -20 June 2016)

Brexit input data to teh TRUST framework

Figure 3 Web Search Interest: David Cameron (15 May -20 June 2016)

Brext input data 2 for TRUST Framework

Figure 4 Web Search Interest: UK Economy (15 May -20 June 2016)

Brexit input data 3 to the TRUST framework

From Table 1 and Figure 2, Xuxin found  that when voters were very enthusiastic about the immigration issue, the web search interest in this issue increased. There are two periods when the voters are interested in immigration. The first period started on 22 May and 14 June 2016. It ended 2 weeks before the referendum, days before the Jo Cox tragedy and the UKIP poster event. After a decrease in interest between 15 June and 18 June, there was renewed interest in EU immigration in the week of referendum: The web search on EU immigration in the UK increased from 36 to 81, which caused the Remain side to lose 2.7% and boosted the Leave camp by an impressive 3.5%[2].

From 19 June, the web search on David Cameron in the UK increased from 10 to 24, which reduced the Remain vote by 1% and increase the Leave vote by 0.7%. Meanwhile, the interest in the UK economy in the week of the referendum did not increase as fast as other important themes (from 67 to 86.4), which only boosted the Remain camp by 0.2%. In sum, the Remain lost 3.5% in the last week by while the Leave camp gained by 3.8%.

Finally, Xuxin used his statistical model to calculate the predicted outcomes for the referendum. Reported in Table 3 they show that leave will have a clear win in the referendum with a mean poll of 43.3% against Leave’s 48.6%.  By following the data since our first report we could have predicted the final Brexit results.

Table 3: Projecting Referendum Voting Results

Remain Leave
Mean Voting Intention Rate 43.3% 48.6%
Swing votes Range 0-4.2% 0-3.6%
Final Rate Range 45.3%-50.6% 49.4-54.7%
Final Mean Rate 48% 52%

Note: The predictions are based on the data available on 20 June 2016.

Where did we go wrong with the initial predictions?

In essence, Xuxin has shown that his methodology works but that there was a swing in the last days of the campaign from Remain to Leave. The lesson learned here is that the final prediction should have been made on the most up to date data. This is really a resources and process efficiency issue. With further automation, we at Invennt can see that semi-real time predictions can be made and that swings can be tracked as they happen. An exciting prospect for psephologists but also important in many other applications where real-time big data mining and interpretation is valuable.

If you wish to read Xuxin’s original report you can get to his personal website here

[1] The web search interest data is based on Google Trends data between 15 April and 20 June 2016 presented in a [0, 100] interval. The index of a particular term presents the percentage of search volume relative to the largest search volume happened in one day during the whole period. The larger the index is, the higher the information demanded and searches are for this term.

[2] The calculations of the effects of Immigration, Cameron and Economy are all based on Table 2 of the blog

Tim Fitch

Tim Fitch has extensive civil engineering leadership experience, gained particularly in the geotechnical and rail sectors, where he has helped niche businesses become market leaders, and quadrupled turnover in Taylor Woodrow’s rail division.

With a strong background in business development, Tim spearheaded growth at Vinci’s civil engineering division, deploying customer relationship and pipeline management techniques to grow the company’s work in the transport and energy sectors.

This Post Has 2 Comments
  1. “The lesson learned here is that the final prediction should have been made on the most up to date data.” Surely it is common sense to do that anyway? Not that great a mind then!

  2. There is an ancient Arab proverb that says” future tellers are wrong even if they are telling the truth”. The problem with socially charged agenda with vexed emotion, scientific prediction or mathematical formulae don’t work; I would even argue they don’t apply.

    Brexit was personal as well as collective feeling of people and emotions that can’t be detected easily, difficult to detect people motives, also not many people declare their true intentions until it’s too late. My personal view is that the whole Brexit debate was not really properly conducted or defined in an objective way. There was too many unjustified scaremongering that put people off on both sides of the argument.

    Many institutions and big corporations as well as political partisan had not helped the debate,their intervention was premature and negatively impacted on the debate as there was an undercurrent that these institutions were in cahoot with the proponents of the remain side, this has played on people emotions, and the whole issue degenerated into narrow political interests.

    A useful debate may be to discuss the impact of Brexit on engineering development , infrastructure projects and competent resources availability .

    The lesson learnt here is that prediction of human motives is not an exact science and can’t be relied upon to formulate successful policies, more research into this is needed to uncover people true intentions beyond a survey questionnaire.

Leave a Reply

Your email address will not be published. Required fields are marked *