Selecting Model Data

The first step in the Akkio Flow is to connect data, as data is the fuel for any machine learning model. Akkio is a tabular AI tool, which means you’ll want historical data in a tabular format, such as a CSV, Excel file, Snowflake dataset, or Salesforce dataset.

In machine learning, both quality and quantity are important, so high quality, large datasets are preferred. “Quality” means things like having few missing values, having properly formatted data, and having data that’s indicative of the problem you’re trying to solve. There’s no minimum dataset size for connecting to Akkio, but ideally your dataset is at least a couple hundred rows, and preferably thousands (or millions) of rows.

Crucially, your dataset must be indicative of the problem at hand. If you want to predict churn, you’ll need a historical customer dataset with a churn column. If you want to predict employee attrition, you’ll need a historical employee dataset with an attrition column, and so on.

After connecting a dataset, you’ll have an overview of the data, including the name of the dataset, the latest “updated date,” the number of rows, and a scrollable preview of the dataset. You can also click to “Replace” the dataset, or hit “View” to open the dataset in a new tab.

Akkio will automatically recognize the variable types in the dataset, which can be any of the following:

  • Text

  • Number

  • Number (Decimal)

  • ID

  • Date

  • Category

  • Disabled

You can change a column’s variable type by clicking its existing variable type. In the example below, the current variable type of the “Gender” column is correctly selected as “Category,” but you could select “Category” to change it to another variable type.