Akkio Docs
  • Akkio Documentation
    • Akkio FAQ
  • Account and Settings
    • Team Settings
    • Organization Settings
    • Account Settings
    • Role Based Access Control
  • Demo Models
    • Demo Models
      • Lead Scoring
      • Retail Sales Forecasting
      • Predict Credit Card Fraud
      • Identify Customer Churn
  • Setting up Integrations
    • Connecting Data
    • Airtable (Beta)
    • Google Ads (Beta)
    • Google Analytics 4 (Beta)
    • Google BigQuery
    • Google BigQuery (Service Account)
    • Google Sheets
    • HubSpot (Beta)
    • MariaDB (Beta)
    • MongoDB (Beta)
    • MySQL (Beta)
    • PostgreSQL (Beta)
    • Redshift (Beta)
    • Salesforce
    • Akkio Data Chat for Slack
    • Snowflake (Username / Password) (Beta)
    • Zapier
  • Prepare your Data
    • Prepare
      • Chat Data Prep
      • Clean
      • Merge & Fuzzy Merge
      • Table View
      • Pivot View
      • Deploying Chat Data Prep
  • Explore
    • Chat Explore
    • Chart Types
  • Building a Model
    • Predict
      • Insights Report - Classification
      • Insights Report - Regression
    • Forecasting
      • Insights Report - Forecasting
    • Model Types
  • Deploying a Model
    • Deploy
      • Google BigQuery
      • Google Sheets
      • HubSpot (Beta)
      • PostgreSQL (Beta)
      • Salesforce
      • Snowflake (Beta)
      • Web App
      • Zapier
  • REPORTING AND SHARING
    • Reports
    • Dashboards
  • REST API
    • API Introduction
      • Quickstart
    • API Options
      • cURL Commands
      • Python Library
      • Node.js Library
    • API FAQ
  • Rest API (v2)
    • Documentation
Powered by GitBook
On this page
  • Default Settings
  • Other Cleaning Options
  • Execute Cleaning

Was this helpful?

  1. Prepare your Data
  2. Prepare

Clean

Settings for auto-cleaning your data

PreviousChat Data PrepNextMerge & Fuzzy Merge

Last updated 1 year ago

Was this helpful?

Data cleanliness is a huge issue for companies with years of historical data, representing hours of work to sift through. Using the prebuilt data cleaning options, you can quickly perform basic data cleanliness steps before doing specific work on your datasets. The options are shown and described below.

Default Settings

Standardize Data Columns - Convert all date columns to ISO 8601 standard format. (YYYY-MM-DD HH: MM) Remove Unexpected Nulls - Remove rows with null values for columns at least 99% filled in. Replace Excess Categories with "Other" - Replace values in categorical columns, not in the top 32 most common values, with "Other." Remove Constant Columns - Remove columns with the same value for every row.

Remove Mostly Unreadable Numerical Columns - Remove numerical columns with at least 99% unreadable values. Remove Mostly Unreadable Data Columns - Remove date columns with at least 99% unreadable values. Remove Mostly Blank Columns - Remove columns that are at least 99% blank values.

Other Cleaning Options

Flag Outliers - For each numerical column, add a column that flags whether or not numerical values in that row are more than three standard deviations from the mean, higher than the 99th percentile, or lower than the 1st percentile. Flag Inliers - For each numerical column, add a column that flags whether or not numerical values in that row are prevalent in the dataset. Clamp Outliers - Replace values in numerical columns that are more than three standard deviations from the mean, higher than the 99th percentile, or lower than the 1st percentile with the nearest value in the range.

Execute Cleaning

Once you have selected the options you want, click Preview to see the changes to your data and apply when happy.