Chat Data Prep

Overview of the Prepare interface

The Prepare screen houses all the tools needed to transform and clean your data before interacting with it in Explore or building your model. These features are detailed here.

Chat Data Prep

Watch this video walkthrough.

Functional Overview

Click on the Chat Data Prep Icon in the upper left of your table view to begin.

As with any ML-based chat solution, your options for entry are vast. If you are familiar with code-based data manipulation solutions, you can lean on those input types to get particular outcomes, but plain English will also work. In the example above, we could want to compare only houses; condos may skew our results. There are a few ways to do this; we can see if there is a column that denotes structure type, we can specifically call out a feature that would rule out most condos (Remove any rows with lot frontage of zero), or we can ask Chat to figure it out (Remove all condos). Here is the result from asking Chat to do it:

Some other things you could try:

  • Remove rows with any empty columns.

  • Remove rows with typos in any column.

  • Remove any rows less than 18 in the age column.

  • Generate a text summary of age, job, and education.

  • Combine month and year columns.

Feel free to test the limits! Once your data is cleaned up, you will have an even easier time training your model.

Chat Data Prep Example

Let's go through an example of how Chat Data Prep functions. First, we will select data to transform; for this example, we will use the customer churn demo. Go to the home page of the Akkio app and select it from the list of options to follow along.

From the table screen, you can see the Chat Data Prep Icon. Let's start with a transform we know. Let's focus on more recent accounts we lose after a few years. Most internet companies have expired introductory offers, so this could be useful to focus on.

With that in mind, we ask Chat Data Prep to 'Remove anyone with tenure over 36'. This is measured in months, so it will only give us accounts that are three years old or less.

As you can see, the AI understood completely, and we can execute the command safely. After applying, it will note the transform that is in effect in the upper left in gray and will number which transform it is.

Expanding that feature by clicking on it gives us all the relevant information about that particular transform.

Next, we can try a transform that may have less success. Let's say we want to anonymize the data and remove gender as a factor. We go and type in 'remove men and women.' The following error is generated.

The AI has interpreted our request as removing rows with those characteristics, which would, in this case, be all the data. As such, we need to reword. Instead, we can ask it to remove the gender column. For fun, though, let's test our friendly neighborhood LLM. Instead of asking to remove the gender column, we ask it to remove demographic data. As you can see, it understands what that is and will remove all identifying information independent of our product. Not what we want this time but an excellent use for the tool.

Instead, we finish off with 'remove gender,' and the change is what we want. Apply the transform, and we now have two active transforms on the data.

Finally, we can download the transformed data to CSV if we want to take this data out from here for other uses or to back up the original.

Last updated