6 Simple Ways to Clean Up Your Data
Andy Semenihin  /  
November 22, 2017

Implementing and fine-tuning the full stack of Google 360 Suite products to meet the demands of your organization takes time. But we strongly believe that this journey must start with cleaning up data in your Google Analytics account. Accurate and reliable incoming data is a ‘must-have’ when it comes to using the cutting-edge analytics solutions to make the best business decisions.

#1 Organize Your Account Structure

 

The way your Google Analytics account is structured will in many ways define how flexible your analytics can become. Before diving into the data clean-up make sure you have all essential Google Analytics Properties and Views in place. Since Google has to process your data before displaying it, all GA assets should be created in the very beginning. This means that whenever a new Google Analytics View is created the data will appear in it only going forward.

 

Carefully plan what you want to see in Google Analytics . Keep different businesses under different properties while multiple subdomains in one.

 

Regardless of the type of business, you will need at least these three Views in your GA account:

 

  • Master View -> main view for all your google analytics
  • Test View -> here you can safely test your improvements
  • Unfiltered View -> unedited raw data is your backup in case of trouble

 

Think about other optional Views you may also need:

 

  • Separated Site Sections
  • Excluding Internal Traffic
  • Separated Traffic (e.g. your internal traffic)
  • User ID View

 

TIP: Always create Views in advance: you will NOT see any historical data if you decide to copy an existing View.

 

 

#2 UTM Clean-Up

 

The very first thing most marketers and web analysts should care about is whether your incoming web traffic is tagged correctly. In fact, in some cases, it may not be tracked at all. This could negatively affect the results of your marketing efforts. For this reason, we suggest starting with a thorough UTM audit. In most cases we see that the traffic is tagged and the UTM architecture could be improved.

 

Since multiple teams may be involved in your Email, AdWords, Bing or any other Social and Paid campaigns, you are most likely to find a variety of vaguely tagged sources of traffic in GA. In this case, you may end up being unable to aggregate this traffic properly. Clear and unified naming conventions for all campaigns is a must. This means that all UTM parameters across the teams and campaigns should be aligned with one single document.

 

Here are some suggestions from our side to structure the UTM parameters.

 

UTM SourceUTM Medium
Paid Searchgoogle, bingcpc
Display Advertisingdfa, dbmcpm
Emailnewsletteremail
Facebookfacebooksocial_cpc, social_organic
LinkedInlinkedinsocial_organic
Twittertwittersocial_cpc, social_organic
Organicyahoo, bing, aolorganic

 

Note that even if you have a large amount of various Email campaigns, the Medium should always stay the same for all of them. If you need to differentiate between your campaigns and tactics – use Campaign Name (&utm_campaign) and Campaign Content (&utm_content) instead. Using Medium for this purpose is likely to result in inaccurate email tracking.

 

To see what has been tracked so far go to Acquisition -> All Traffic -> Source/Medium report and export data for the past six months. Now you can filter out all incorrect sources and share with the corresponding team.

 

Unified UTM parameters will allow you setup accurate Channels (Custom Channel Groupings) and avoid losing data into ‘Other’.

 

TIP: Google analytics source/medium processing is case-sensitive, thus Email and email will be shown in separate lines.

 

 

#3 URL Clean-Up

 

Once your traffic is sorted out correctly you will need to clean the page reporting to be able to track the journey of a user correctly. The URLs tend to be messy and contain so many junk parameters that this can make your Site Content reporting nearly useless. One single page may have hundreds of URL variations due to parameters appending to your URL:

 

/account/login

/account/login?id=3254

/account/login?id=3254&email=1

/account/login?id=3254&email=1&form=onboarding

/account/login?id=3254&email=1&form=onboarding&list=submittedForm

 

TIP: make sure these parameters are not used in your reporting or Goal conditions before removing them.

 

Use this simple spreadsheet to get the list of those permutations and insert them in the query exclusion list separated by commas. You can also do it manually by pulling all Pages and extracting the permutations, but this form will save you a ton of time.

 

To exclude all unnecessary query parameters, go to Admin -> View Settings -> Exclude URL Query Parameters

 

IMPORTANT: Always test-run the changes in your Test View before applying them to the Master View. You can’t reprocess this data once query exclusion list is enabled.

 

 

#4 Transactional Audit

 

If you have ecommerce reports enabled within your Google Analytics, you may need to verify its accuracy. The best way to know the reliability of your ecommerce tracking is to compare it to the data stored in the back-end of your main ecommerce platform. We suggest exporting a few months of orders from your CRM and comparing to the GA transactions. The aim is to define the level of discrepancy between the two systems.

 

We consider 5-7% difference in orders and revenue to be a good result for this test. Everything above probably requires investigating to determine if the tracking is functioning correctly.

 

In our investigations, we try to understand if there is anything in common between orders that are missing in your Google Analytics account. Could that be one distinctive parameter or string (e.g. a coupon code, a discount, a price bucket etc.) that breaks the ecommerce tracking on the confirmation page?

 

FIY: Most likely, it will not become 100% accurate, there are a few reasons for this ‘natural’ discrepancy:

 

  • cancelled or refunded orders
  • cookies disabled in the browsers
  • AdBlock Chrome extension affecting the tracking
  • java script errors

 

TIP: ecommerce implemented via Google Tag Manager tends to be more accurate than the classic inline code.

 

 

#5 Internal & Dev Traffic Exclusion

 

Now and then we see that internal traffic as well as traffic from the staging environment could be mixed with the hits from the actual website. You may even sometimes see the test transactions in the Master View. This may significantly affect your overall performance and KPI reporting.

 

To filter out the internal traffic, you will need the IP exclusion filter implemented on your Master View. Leave the Unfiltered and Test view unedited in this case since you will need to:

 

  1. a) still see your internal team traffic in one of the Views for the test purposes
  2. b) have a completely raw view with all unfiltered data

 

You may also want to create a separate view for this purpose in case you still want to see all incoming data in your Master View.

 

To avoid the dev traffic being included we suggest creating another GA property and using it as a sandbox for your QA environment. The UA id of this property should be placed in the tracking code of your staging site.

 

TIP: Make sure you have identical settings in both Live and Test properties for the sake of data consistency and comparability.

 

 

#6 Referral Exclusion List and Bot Traffic

 

Two simplest ways to make your data more accurate is to ensure spider traffic and referral traffic exclusions are enabled.

 

If you see your own site in the referral traffic, you may need to include it in the referral exclusion list. To do so, go to Admin -> Tracking Info -> Exclude Referral Exclusion List. It may take some time to disappear completely and will decrease gradually over time.

 

TIP: check the referral paths – some of your pages may be missing the tracking resulting in referral traffic.

 

Make sure you check this box to exclude all known bot and spider traffic:

 

This doesn’t guarantee you 100% results but will make your traffic analysis more reliable.

 

TIP: We suggest leaving this feature disabled in the Unfiltered view to get the maximum data available for analysis.

 

 

The Final Takeaway

 

With these six simple Google Analytics updates you can ensure that your data is properly collected and rigorously interrogated. Since honest data is the foundation of a proper analytics framework these steps should be taken at an early stage. Nothing can undermine an otherwise successful marketing program like inaccurate reporting. That is why we believe that having reliable and clean incoming data is the first and foremost step to making informed marketing decisions.

 

 

Ready to learn how Delve can help you hold your data accountable?
Delve is a top-rated Google Analytics Certified Partner with experience managing hundreds of web analytics implementations worldwide.
SEE EXAMPLES of our experience and reviews from our clients .
Shoot us an email or give us a call. We’d love to discuss how our team can help you learn more about ways to improve your GA 360 reporting. Delve can help your business win smarter.