Akkio FAQ

Frequently Asked Questions about the platform

This section contains common questions we have come across.

Account and Settings

How are an organization's monthly predictions calculated?

1 prediction = 1 row of data fed through a prediction model. This is only for deployed prediction models, the creation of a model does not count towards predictions.

Is there a limit to the number of users on my team?

No.

Can I delete my account?

While trial accounts automatically convert to free accounts and will not charge you unless you upgrade, you can always delete your account and data from the Account page. This can be accessed from the cog on the app's home page.

Is there a way to save my data/ progress when my subscription ends?

You can cancel your subscription in the organization management settings and re-start it when you are ready to use it again. Your work will not be lost.

What languages does Akkio support?

Any chat function such as Chat Explore and Chat Data Prep can use most languages. However, the current UI is only in English.

Uploading Data

What is the maximum file size for a dataset?

1 million rows of connected data without data add-on packages. There is no file size limit, but there is a technical limit of about 5 GB.

Can I append a dataset with new data?

There is not currently an option for appending data in the UI. You can merge datasets horizontally with the merge function. A workaround would be to append in google sheets and refresh the data in the project. Or append through the API.

Can I upload multiple sheets of data into one project?

No. You will need to either merge the datasets into one, or upload them to multiple projects.

If I clean data from my integrated source, can I feed it back into that source?

Yes. After making Chat Data Prep changes, you may deploy those changes. This can either go back to a database or a Web App for applying changes to dataset files.

Building a Model

What’s the best way to deal with class imbalance in Akkio?

The main thing we do for class imbalance is that we use non-parametric models like XGBoost or Random Forests etc. Non-parametric models have much less of an issue with class imbalance. You can do SMOTE or random oversampling with chat data prep, but I would not recommend it because it’ll likely be worse in production than the defaults.

When working with text data, is there a way to ensure certain words are excluded from modeling?

No, Akkio doesn’t have the ability to exclude prediction outcomes based on input works or allow users to select exclusion words.

What kind of NLP algorithms does Akkio use?

Akkio’s algorithms look at 256 features of text (e.g., words, order, length, etc.). Akkio focuses on learning the user’s business language.

What is the Akkio Baseline model?

The baseline is Akkio guessing (predicting) the most frequent class in a dataset, and the comparison to baseline is Akkio’s way of showing how Akkio’s selected model does compare to predicting the most frequent class (e.g., 5.6x better than baseline means we were 5.6 times better than baseline)

Does Akkio use Bayesian models?

We don’t use any Bayes or Naive Bayes models. We use Neural Networks, Random Forests, Linear and Logistic Regression, among other models.

Does Akkio use SMOTE or class weight for dealing with imbalanced classification?

No, Akkio uses model architectures that remove the need for it.

Can I see what models were tested and how they performed during training?

We do not currently support that, but it is a roadmap feature.

When running predictions, not every field used to train will always be filled in; how does Akkio handle these empty fields?

We treat them as null fields; if there are matching null fields in the training set, we look for patterns from there.

Chat Explore

How should I handle missing data during EDA/Data Prep?

Akkio is robust to missing data and will tell a user how accurate the model is with missing data.

Users can improve their model’s performance by providing more data or doing data cleaning/imputations with Chat Explore.

Is Chat Explore case-sensitive?

No, while the tool is evolving and there will be limitations on its understanding, it is not case-sensitive.

Does Chat Explore work on merged data?

As of now, no. After merging the data in Akkio, the best thing to do is download the merged dataset, reupload it, and then Chat Explore can be done.

In the future, merge will be part of data prep, and then Chat Explore will work on merged datasets.

Can I white-label the shareable content generated from Chat Explore?

Yes, all shareable content can be white-labeled on plans that allow white-labeling.

Deploying a Model

How does Akkio address multicollinearity in the data?

Akkio doesn’t remove multicollinearity beforehand but addresses it in the modeling step by trying various models that are sensitive or insensitive to multicollinearity.

How does Akkio avoid overfitting on a model during training?

Akkio uses k-fold cross-validation to avoid model overfitting.

Is Feature Correlations something Akkio does?

This is an upcoming feature but is not currently supported.

Does Akkio automatically scale data (e.g., log) in the modeling phase?

Depending on the data distribution, Akkio might apply a log transform.

Does Akkio show which Time Series algorithm was selected for time series modeling?

At present, we don't show this. However, it is on our roadmap.

Can a model be tweaked where a regressor is added? Can the user configure the model?

This is something our Engineering is discussing. From my understanding, Engineering doesn't think this would be too hard to implement

How does Akkio determine Top Fields?

Top Fields are determined by how much the field (column) corresponds to how much the predicted value changes as the full field (column) changes. Similar to Permutation Importance.

Setting up Integrations

Are there any limitations on using an integration with a free trial?

No. If a user is having trouble connecting with one of the pre-built integrations, they might not have been given sufficient permissions, or there may be an authorization error.

When using an integration, is data moved from the integrated system into Akkio?

Yes, data is moved into Akkio and stored natively.

I have a database that doesn't integrate with Akkio; what are my options?

The API can connect to other systems; we are also always working to expand our native integrations, and we encourage you to reach out to support with requests for new integrations.

Can I use a demo Salesforce account with Akkio?

No, as noted before, you can use a free Akkio trial with integrations, but the Salesforce free demo does not function with Akkio.

When is data updated?

Data automatically gets refreshed in Akkio every day at 12am UTC.

Data Security and Compliance

Does Akkio encrypt data at rest and in flight?

Yes, Akkio encrypts data at rest and in flight.

Does Akkio support having a secure VPN tunnel between Akkio and a data source?

No, however, we are SOC 2 Type II compliant.

What data does Akkio share with OpenAI (GPT)?

Akkio does not send data to OpenAI. We use GPT via a private Azure deployment.

Is my data used independently of my instance, or retained after account deletion?

No data is used to create training sets, update the platform, etc. Data is removed upon account deletion.

Does Akkio support having a TLS tunnel between a data store and an AWS instance of Akkio?

Yes, we do, and we inherit Amazon or Google security.

Is Akkio GDPR or HIPAA Compliant?

Yes, Akkio is GDPR Compliant and HIPAA Compliant. More details on our security page here: https://www.akkio.com/security

What information does Akkio pass to its LLM?

When a user enters a prompt into Chat Explore, Akkio takes the prompt, and meta-data associated with the dataset (e.g., column header names, data types, descriptive summary of the dataset / statistics, etc…), feeds this into a call made to our Azure hosted GPT4 instance, parse the generated code outputted from the LLM API call, apply the generated code to the dataset, then visualize the results in the Chat Explore interface.

Here is a summarized example of what we pass into our LLM calls for a housing / real estate dataset:

  • Dataset Size: The dataset contains 4,550 records (rows) and 18 features (columns).

  • Data Types: We pass the data types of all fields in the dataset

  • Examples of Values in the Dataset (we pass in a max of 5 values):

  • HouseID shows a range of identifiers for houses, such as 1, 3032, 3038, etc.

  • Bedrooms indicate the number of bedrooms per house, ranging from 2 to 6.

  • Bathrooms are represented with a decimal to account for partial bathrooms, with values like 1.0, 2.5, etc.

  • SqftLiving presents the living area square footage, with examples like 1940, 1720 sqft, etc.

  • SqftLot shows the lot size in square feet, with values like 5000, 6000, etc.

  • Floors indicate the number of floors in the house, including half levels (1.0, 2.5, etc.).

  • Waterfront is a binary feature indicating the presence (1) or absence (0) of a waterfront.

  • View is rated from 0 to 4, indicating the quality of the view from the property.

  • Condition rates the overall condition of the house on a scale from 1 to 5.

  • SqftAbove and SqftBasement detail the square footage of the above-ground and basement levels, respectively.

  • YrBuilt and YrRenovated provide the year the house was built and the most recent year of renovation, if any.

  • Street, City, Statezip, and Country provide textual information about the location.

Depending on the data type, we either pass in a random sample of 5 values, or the top 5 values.

What LLM does Akkio use?

We currently use a hybrid of gpt-3.5-turbo-1106 and gpt-4. We are, however, LLM agnostic so we do have the capability to switch models.

About Akkio modeling and predictions

What types of statistical techniques are these models using to make predictions?

We use several modeling methods, including Neural Networks, Random Forests, and Decision Trees. Those are described as such:

Neural networks model complex input-target relationships using linear and non-linear transformations optimized by gradient descent.

Random forests use bagging and feature randomness to combine the outputs of multiple decision trees for higher accuracy and reduced overfitting.

Decision trees recursively split input data based on feature values, aiming for homogeneous target variable subsets determined by techniques like entropy, Gini impurity, or information gain.

How do we know what assumptions the model makes, whether generalizable or even statistically significant?

Different algorithms make varying assumptions about the data distribution. Non-parametric models like decision trees and random forests make fewer assumptions, while neural networks assume differentiability in input-target relationships. Though statistical significance isn't directly evaluated, performance metrics like accuracy, precision, recall, and F1 score can be used to assess a model's effectiveness.

How do we handle multicollinearity and singularity, and outliers?

Multicollinearity is addressed within the platform to help remove redundant features and improve model performance.

Singularity, often caused by a high degree of correlation between features or perfect collinearity, can be resolved by removing one of the collinear features.

We are generally robust to outliers, but if necessary, they can be removed with chat data prep or the soon-to-be-launched data cleaning tool. Some models, like decision trees and random forests, are less sensitive to outliers than others.

Are the modeling processes transparent?

We provide insight into the driving factors for all models as part of the model creation process. We call this the insights report, which makes the model decision-making more transparent.

Our platform aims to provide ML capabilities without the need for code, so the transparency comes in these reports in digestible form. More detail can be found by drilling into the advanced sections of the report.

API

What is the API response time/volumes it can handle?

Five requests per second. However, these requests can be bulk calls, making the API handling more case-by-case. Please feel free to contact support for your specific use case.

Last updated