Google Analytics 4 (Beta)

Google Analytics 4 support is currently in beta. Please raise any issues you encounter via the chat bubble in the bottom-right of the site, and we'll be happy to help.


Important!

You must enable both the Google Analytics API and the Google Analytics Data API for your project

This may take a few minutes to take effect.

Overview

Our Google Analytics 4 integration allows you to import your data; allowing you to clean, transform, and ask questions via Chat Explore.

Importing Data

To import data, first create a new project.

Click the Google Analytics 4 button.

Click the "Connect Dataset" button.

If prompted, click "Sign In With Google" to log in to Google.

If prompted, select which Google User to use.

Be sure to click "Continue" to grant access.

Connect your Google Analytics 4 account.

Enter your Google Analytics 4 details.

To pull in your data, input the following:

Important!

You must enable both the Google Analytics API and the Google Analytics Data API for your project

This may take a few minutes to take effect.

  • Property Id: Your Google Analytics 4 Property Id. You may find it here.

  • Start Date: The beginning date for your data extract's date range, in YYYY-MM-DD form. Defaults to 1 January, 2023.

  • End Date: The end of your data extract's date range, in YYYY-MM-DD. Optional (defaults to now).

  • Metrics: Which metrics to pull. You may pull multiple, separated by commas. Defaults to totalUsers, userEngagementDuration.

  • Dimensions: Which dimensions to pull. You may pull multiple, separated by commas. Defaults to landingPage.

When complete, hit submit.

We will then validate the connection.

If we're successful, you'll move forward to the data loading screen, where your data is syncing behind the scenes.

You may need to wait for a little bit for the data to import. Rest assured that we're importing data behind the scenes. Small datasets should only take a few minutes; large datasets may take longer.

Once your data is done importing, you'll see it populate onto the screen.

From here, you can do anything you like to the data - clean and transform it on the Prepare tab, ask Chat Explore questions on the Explore tab, you name it!

Discrepancies Between GA4 UI and API/Exported Data

Data from Google Analytics is based on sampling which can cause discrepancies between the GA4 UI and the platform the data gets exported to, such as Akkio. This section covers how these discrepancies are caused, and some solutions.

The two most obvious ways data may differ are:

Column Totals - In the Google Analytics UI it shows totals for columns that are based on an estimation of individual users. Meaning although you have 100 views, it may have estimated that 5 of those views were really from the same user meaning it would show you 95 views as the total.

  • If you manually add up all the values in the column (100) in the Google Analytics UI it will be greater than the total reported at the top of the column (95).

  • When you export this data to a sheet or import it directly into Akkio you will get the raw data and when you aggregate you will also get 100 which will be greater than the column total in the GA4 UI (95).

Google does not provide you with the user IDs in order for you to deduplicate or estimate the number of unique users yourself so be cognizant of this discrepancy. All aggregations will be directionally and in magnitude generally correct. Typically what you are looking for in web traffic data is trends and using the raw non-deduplicated data works well.

To reiterate exported data into GA4 or any other BI tool will not be deduplicated. Making the totals in Akkio/other BI tools the actual total of the column, while totals in GA4's reports are the total after being deduplicated.

The one workaround here is to use BigQuery event export data and HLL++

Total Rows - In the Google Analytics UI the count of total rows will include every URL ever visited, but when the data is exported they may filter out some low visit, long tail urls. This happens because the data is based on sampling and low view URLs may not have been included in the sample.

Anytime you are exporting data, or pulling data through the API, this down sampled amount of rows are exported. This should not meaningfully impact your metrics, and often what gets sampled out are outlier sessions.

Causes of Discrepancies

  • Sampling: Minor impact but can cause some data variations.

  • Latency: Data updates can take up to 48 hours.

  • Double Counting: Sessions aggregated across multiple dimensions can result in noticeable discrepancies. Sessions spanning over midnight typically have a minor impact.

  • Custom Dimensions: Data may be filtered out if it was collected before the implementation of custom dimensions.

General Solutions

  • Filter Data to a Shorter Time Frame: Narrowing the time frame can reduce discrepancies.

  • Allow Time for Data to Update: Wait for up to 48 hours for the most accurate data.

  • Consider Dimensions Used: Be mindful of which dimensions are included in your analysis.

  • Filter Based on Session Start Date: This can help in reducing the discrepancies related to session counts.

  • Custom Dimensions Usage: Be aware of the impact of custom dimensions and adjust your data collection and analysis accordingly.

Last updated