How reliable are Google Analytics statistics?

Most of the data provided by Google Analytics is highly reliable because the tracking code collects ‘primary data’ such as the number of visitors to a website, how long they stay, and if they move to another page on the site. However other data on visitors to the site depend on ‘secondary’ data that Google uses to make assumptions about visitors.

When signing up for web services such as gmail users often give information about themselves such as interests, occupation, age and gender. Frequently the provider of the service will create a file called a ‘Cookie’ on the user’s device that stores this information locally. Google Analytics accesses these cookies as secondary data, which can be used to build statistics for the population of users.

Google Analytics gives data on gender of users and age is reported in the ranges 18-24, 25-34, 35-44, 45-54, 55-64, 65+. How accurate are these statistics?

The Google article About Demographics and Interests: Analyze users by age, gender, and interest categories [1] explains where Google gets its information .

Google Analytics data sources

Once you update Analytics to support Advertising Reporting Features, Analytics collects Demographics and Interests data from the following sources:

Source Applies to Condition Result
Third-party DoubleClick cookie Web-browser activity only Cookie is present Analytics collects any demographic and interests information available in the cookie
Android Advertising ID App activity only You update the Analytics tracking code in an Android app to collect the Advertising ID Analytics generates an identifier based on the ID that includes demographic and interests information associated with users’ app activity
iOS Identifier for Advertisers (IDFA) App activity only You update the Analytics tracking code in an iOS app to collect the IDFA Analytics generates an identifier based on the IDFA that includes demographic and interests information associated with users’ app activity

A report by Humix [2] asks How accurate are these demographic estimates? and makes the following observations:

It’s normal to be sceptical about the demographic data that Google provides. Afterall, it’s an estimate that is based on a sample of your visitors. There are three reasons why you should be careful when interpreting the demographics and interests reports.

1) A cookie is just a cookie
Add blockers tend to prevent the DoubleClick cookie from firing. This means that Google will not capture any data from visitors with ad blockers installed. Besides, when you clear your cookies all data is lost and Google will have to restart assembling your profile.

2) Subject to thresholds
Google applies a threshold to protect the privacy of every individual. They say: thresholds are applied when data might allow the recipient of the report to infer the characteristics of an individual visitor. When this occurs you will be warned by a yellow notice below the report title. This reinforces data sampling even more.

3) Data Sampling
As you probably already know, Google often uses only a subset of the data to compile reports. Data sampling happens automatically when your report includes more than 500.000 visits. Medium to big websites have to deal with this very often. We noticed that demographics reports in particular are exposed to heavy sampling. Often, the reports you see are based on less than 10% of your total visits.

This suggests that there may be important reasons to be careful when interpreting Google Analytics results.

Other data of interest on this course include the user’s country, city, and language. How accurate are these?

The location of a user is tracked by Google analytics according to how users are connected to major nodes in the internet. Generally this makes country data accurate but city location may be less accurate. For example, when testing sites I have observed that Google Analytics often thinks I am in nearby Bletchley but sometimes thinks I am in Southampton or London, both hundreds of kilometers from where I am. Another complication is that some web users disguise their country using websites set up for this purpose.

Google Analytics determines a user’s language from their web browser. Language is user-selectable in most web browsers and usually defaults to the language of the operating system. However, users can change the setting according to their browser preferences.

What do you think?

Do you think the statistics provided by Google Analytics are accurate? Do you think that careful interpretation is required? Should we worry that ‘Big Brother is watching you’? Are users out-smarting Google by disguising who they are, or giving false information about themselves, e.g. giving Gmail incorrect information about their age and gender?

Is the trade-off between giving our personal information to Google outweighed by the many high-quality free services it gives in return? Let us know what you think and share your ideas and views with others on the course.

References

[1] Google. About Demographics and Interests, viewed 3-Oct-2018.

[2] Humix, 17-March-2014, How accurate are Google Analytics Demographic Reports, viewed 3-Oct-2018.

Share this article:

This article is from the free online course:

Introduction to Data Science with Google Analytics: Bridging Business and Technical Experts

UNESCO UNITWIN Complex Systems Digital Campus