DataByte
FAQ

Frequently Asked Questions

Straight answers on architecture, rollout speed, connectors, governance, and operations.

01 · Getting Started

What is DataByte and who is it for?
DataByte is a governed, unified data platform for ingestion, transformation, ML, APIs, and operations, built on a single Apache Spark + Kubernetes foundation instead of stitched from separate tools. It's built for enterprise data engineering teams, data architects, and platform owners who are spending more time integrating tools than building useful outcomes.
How do I get started?
Book a 30-minute walkthrough at databyte.tech/book-demo. We will run through modules against your actual sources and connectors. If you prefer to explore first, see the product page.
How long does a typical implementation take?
Implementation is materially faster than a stitched stack because there is no integration layer to build. A single-module pilot (for example, CDC ingestion to a warehouse) usually runs in days. Full rollout depends on source count and deployment model.
Does DataByte require a dedicated data engineering team to operate?
No. The platform supports no-code and pro-code users on the same surface. Analysts can build with visual tooling and the agent layer, while engineers can use Jupyter, custom Spark code, and APIs.
What does "one contract" actually mean — do I still need to license connectors separately?
One contract means one commercial agreement covering ingestion, transformation, ML, governance, API delivery, and operations. Connectors are not licensed separately; 2000+ connectors in Advance ETL are part of the platform.

02 · Platform & Architecture

How is DataByte different from other platforms?
DataByte replaces stitched toolchains with a single architecture: one data model, one RBAC layer, one catalog, and one operational surface. ML, forecasting, anomaly detection, and API delivery are native modules, not integrations you maintain.
Does DataByte replace my data warehouse, or sit on top of it?
DataByte is a data engineering and operations platform. It handles ingestion, transformation, ML, governance, and delivery while connecting to existing warehouses such as Snowflake, BigQuery, and Redshift as source or destination.
What is the underlying compute engine?
Apache Spark on Kubernetes with platform-managed autoscaling.
Can I run DataByte on-premises, or only in the cloud?
Both are supported across AWS, Azure, GCP, and on-premises deployments.
How does the Data Catalog work across modules?
The catalog is native to the platform. Transformations, pipelines, and models contribute lineage automatically, enabling cross-module traceability from BI output back to source records.

03 · Ingestion, ETL & Connectors

Can I use DataByte for real-time and streaming pipelines, or only batch?
All three modes are supported in one engine: batch, CDC, and streaming. The same governed platform handles all modes.
How does CDC work in DataByte?
Data Ingester supports log-based, query-based, and trigger-based CDC, all governed by the same catalog and RBAC.
How many connectors does DataByte include, and are they maintained by the platform?
DataByte ships with 2000+ connectors across databases, warehouses, cloud storage, streaming, SaaS, BI tools, and file formats. Connectors are maintained as part of the platform.
Can non-technical users build pipelines, or is this only for engineers?
Both can build on the same platform. Non-technical users can use visual pipelines and natural-language agents; engineers can use custom Spark and Jupyter.
How does the platform handle schema changes in the source?
Additive schema evolution is handled automatically, with configurable strict/permissive/custom policies and schema history captured in the catalog.

04 · ML, Forecasting & AI Agents

How do I deploy a trained ML model as an API?
In ML Studio, once a model is trained and validated, one-click REST deployment publishes it as a versioned API endpoint. The API is governed by the same RBAC model used across the platform, including row and column-level controls, rate limiting, and authentication, with auto-generated Swagger docs.
What time-series algorithms does the Forecaster support?
Forecaster supports 25+ algorithms including ARIMA, SARIMA, Prophet, LightGBM, XGBoost, N-BEATS, and others, with side-by-side backtests and accuracy dashboards.
What is Sherlock, and how does autonomous RCA work?
Sherlock is DataByte's operations intelligence module. During failures or SLA breaches, it runs no-code decision trees on telemetry and historical patterns, executes remediation, and verifies closure with a health check. See platform operations.
How does AutoML work in ML Studio?
AutoML handles feature engineering, model selection, and hyperparameter tuning for the specified target. Teams can override stages with visual pipelines or Jupyter, while experiment tracking and drift detection continue after deployment.
What AI agents ship with the platform?
The current agent library includes Talk to Your Data, ETL/ELT Designer, Spark Summarizer, ProcBot Designer, Sherlock, DataOps AI, AI Governance & Intelligence, and Data Exploration AI. New agents are shipped as part of the platform.
Does drift detection require manual setup per model?
No. Drift detection is enabled by default for models deployed through ML Studio, monitoring input and output distribution changes against baseline thresholds and supporting automated retraining via workflow triggers.

05 · Security, Governance & Compliance

How does the SMART framework enforce compliance with GDPR, HIPAA, and SOX?
SMART (SLA, Monitoring, Actions, Rules, Traceability) is embedded in every module, not added after the fact. Rules enforce policy, Traceability provides cross-module lineage, and Actions automate responses to violations. Learn more on platform operations.
What does row and column-level security look like in practice?
RBAC is enforced at platform, module, row, and column levels so teams can restrict access by business unit or sensitive fields without duplicating datasets.
What does the audit trail cover?
The traceability layer captures pipeline runs, transformations, model deployments, API activity, and configuration changes across modules, enabling full lineage from BI output to source record.

06 · Pricing, Deployment & Support

How is DataByte priced?
Pricing depends on deployment model, scale, and modules. For a scoped estimate against your current stack, book a session on databyte.tech/book-demo.
What deployment options are available?
DataByte deploys on AWS, Azure, GCP, or on-premises infrastructure. The platform is Kubernetes-based and container-native.
Where can I find documentation?
Documentation is available at databyte.tech/docs. The resource centre at databyte.tech/resources includes architecture guides, use cases, and blog content.

Still have a question that is not covered here?

Book a short session and we will give you a direct, architecture-level answer.