Glossary

DataByte terminology, defined clearly.

A structured reference for platform modules, AI agents, governance, ML, operations, connectors, and industry concepts used across the DataByte ecosystem.

Search glossary

Showing 105 of 105 DataByte terms across architecture, governance, ML, operations, and industry concepts.

9 terms

AI Agents

AI Governance & Intelligence (Agent): An AI agent that continuously enriches metadata, classifies sensitive data, monitors compliance, and generates governance insights across enterprise data assets.
Agentic AI Layer: A growing library of 10+ AI agents embedded directly into the DataByte platform (not a separate product), capable of building pipelines, diagnosing failures, and answering questions over live data in plain English.
Data Exploration AI (Agent): An AI agent that generates transformation code behind the scenes based on natural-language requirements, producing data outputs without manual coding.
DataOps AI (Agent): An AI agent that answers questions about pipeline health and SLAs by querying live telemetry data.
ETL/ELT Designer (Agent): An AI agent that turns plain-English pipeline requirements into fully deployment-ready ETL/ELT pipeline configurations.
ProcBot Designer (Agent): An AI agent that generates working operational scripts (Bash, Python, Terraform, Ansible) from plain-English process descriptions.
Sherlock (as Agent): Autonomous AI agent that assists with problem discovery, detection, auto-remediation, and closure - operating across all DataByte modules.
Spark Summarizer (Agent): An AI agent that distills verbose Apache Spark execution logs into a concise summary of what ran, what failed, what was slow, and recommended fixes.
Talk to Your Data (Agent): An AI agent enabling plain-English natural-language queries over SQL, NoSQL, S3, Cassandra, and API data sources without writing query code.

7 terms

Architecture & Infrastructure

Apache Spark: The core distributed compute engine underlying all DataByte modules. Pipelines, transformations, and decision-tree RCA all execute on Spark, orchestrated via Kubernetes.
Cluster (Kubernetes): A group of compute nodes managed by Kubernetes on which DataByte Spark workloads auto-scale. DataByte supports multi-cluster, cloud-agnostic deployments.
Control Plane / Data Plane Split: A DataByte architectural pattern where the management/orchestration layer (control plane) is separated from the data processing layer (data plane), supporting cloud-agnostic deployments.
GPU (in ML Studio): Graphics Processing Unit - configurable compute option in ML Studio notebooks for accelerating training of machine learning models.
Kubernetes: An open-source container orchestration system. DataByte runs all modules on a Spark + Kubernetes foundation, with auto-scaling, multi-cluster, and cloud-agnostic deployment.
Multi-cluster: DataByte's ability to deploy and manage Spark workloads across multiple Kubernetes clusters, supporting distributed or geo-separated enterprise architectures.
Unified Data Operations: A DataByte architectural concept meaning all ingestion, processing, intelligence, and delivery share one governance layer, one catalog, and one RBAC rather than operating as siloed tools.

3 terms

Company & Product Concepts

CI/CD Pipeline: Continuous Integration/Continuous Delivery - a software delivery mechanism for building, testing, and deploying application code, distinguished from ProcBot's operational governance of live infrastructure scripts.
Integration Tax: A DataByte concept describing the cumulative cost (procurement + ongoing maintenance) of stitching together multiple point tools, which a unified platform eliminates.
VisionWaves: The parent engineering firm that built DataByte, originating from production data and AI implementation work across industries. Product and engineering led from Indore, India; customer success from Singapore and the US.

4 terms

Compliance & Regulations

GDPR: General Data Protection Regulation - an EU data privacy regulation. DataByte's SMART framework and Data Catalog support GDPR reporting through automated PII classification and cross-module audit trails.
HIPAA: Health Insurance Portability and Accountability Act - a US healthcare data regulation. DataByte's SMART framework supports HIPAA compliance reporting.
PII (Personally Identifiable Information): Data that can identify an individual. DataByte's Data Catalog performs automated PII tagging and classification across all assets, supporting GDPR/HIPAA compliance.
SOX: Sarbanes-Oxley Act - a US financial reporting regulation. DataByte's SMART framework and cross-module audit trail support SOX compliance reporting.

8 terms

Connectors & Integration Protocols

Apache Camel: An open-source integration framework referenced in several DataByte connectors (e.g., Olingo4 for OData, Workday component) used to connect to enterprise systems like Oracle ERP, MS Dynamics, and Workday.
Apache Kafka: A distributed event-streaming platform used as a data source/sink in DataByte; supported by the Anomaly Detector, IngesterOps, and various connectors for real-time messaging.
OAuth 2.0: An open standard for access authorization. Used in multiple DataByte connectors (e.g., Salesforce CRM uses OAuth 2.0 Username-Password flow; MS Dynamics 365 uses Azure AD OAuth 2.0 Client Credentials).
OData (v2 / v4): Open Data Protocol - a REST-based data access standard used in DataByte connectors for SAP S/4HANA (OData v2) and Oracle ERP Cloud / MS Dynamics 365 (OData v4 via Apache Camel Olingo4).
REST API: Representational State Transfer Application Programming Interface - the standard web interface DataByte uses for ML model deployment (ML Studio), data delivery (Data Insider), and many connectors.
SOQL / SOSL: Salesforce Object Query Language (SOQL) and Salesforce Object Search Language (SOSL) - query languages supported by DataByte's Salesforce CRM connector for data retrieval.
Swagger: An API documentation standard (OpenAPI). DataByte's Data Insider auto-generates Swagger documentation for every exposed REST API endpoint.
Webhooks: HTTP callbacks that push data in real-time when an event occurs. Supported as a data source in the Anomaly Detector module and in the Streaming & Messaging connector category.

9 terms

Data Ingestion

Advance ETL: An enhanced extract-transform-load mode in the Data Ingester that applies transformation logic during ingestion itself, operating across 2,000+ connectors with drag-and-drop or custom-code options.
Batch (Ingestion Mode): One of three ingestion modes in the Data Ingester - moves data from source to destination in scheduled bulk runs (X→Y pattern), as opposed to streaming or CDC.
CDC (Change Data Capture): A DataByte ingestion mode that replicates database changes in near-real-time using log-based, query-based, or trigger-based methods; one of three modes in the Data Ingester.
Data Ingester: DataByte's core ingestion module - a container-native pipeline engine supporting three modes (X→Y batch/on-demand, CDC, and Advance ETL) across 2,000+ connectors.
Data Streaming: Continuous, real-time movement of data; supported in the Transformer Module and Data Ingester alongside batch and on-demand modes.
ELT (Extract, Load, Transform): A data integration pattern where data is loaded into a destination first and then transformed. The ETL/ELT Designer agent in DataByte converts requirements into deployment-ready pipelines.
ETL (Extract, Transform, Load): A data integration pattern where data is extracted from sources, transformed, and then loaded into a destination. Supported via the Advance ETL mode and the ETL/ELT Designer agent.
On-demand (Ingestion / Scheduling): A trigger-based execution mode in DataByte - pipelines or deployments run immediately on manual trigger, as opposed to batch schedule or recurring cron.
X to Y Pipeline: DataByte's term for a standard batch or on-demand data movement pipeline - data flows from source (X) to destination (Y) in a single directed flow.

7 terms

Data Processing & Transformation

Feature Engineering: The process of deriving new predictive signals from raw data (e.g., seasonality indicators, rolling averages). Applied in Cash Forecaster's normalization pipeline and in ML Studio pipelines.
Jupyter Notebook: An interactive Python coding environment available in both the Transformer Module (for Spark transformations) and ML Studio (for custom data science work with configurable compute).
No-code: A design principle applied across ProcBot, Sherlock, and the Transformer Module, allowing users to build workflows, decision trees, and pipelines without writing code.
Pipeline Builder: A visual, drag-and-drop canvas in the Transformer Module and ML Studio for constructing data transformation or ML pipelines without writing code. Also called the Visual Canvas.
Pro-code: The ability to write custom code (Python, SQL, Spark, etc.) within DataByte modules such as ML Studio notebooks and the Transformer Module, alongside no-code options.
Transformer Module: DataByte's Processing module - a visual no-code canvas for building Spark transformation pipelines, with a Jupyter notebook surface. Supports batch, streaming, and on-demand execution on auto-scaled Kubernetes.
Visual Canvas: The drag-and-drop pipeline builder surface in the Transformer Module and ML Studio for constructing workflows without code. Also called Pipeline Builder.

8 terms

Governance & Security

Business Glossary: A component of the DataByte Data Catalog that stores business-friendly definitions for data terms, making datasets understandable to non-technical stakeholders across the enterprise.
Data Catalog: A DataByte Governance module providing a centralized, searchable inventory of all data assets, enriched with metadata on ownership, lineage, quality, access rights, automated PII tagging, and cross-module lineage.
Data Lineage: The end-to-end traceable map of where a dataset originated, what transformations it passed through, and what downstream reports or models depend on it. Maintained automatically across all DataByte modules in the Data Catalog.
Platform Admin: A DataByte governance module providing unified RBAC, user and role management, data-source administration, and BI enablement - one security model across every module.
RBAC (Role-Based Access Control): A security model that grants system access based on assigned user roles. DataByte implements one unified RBAC model spanning platform, module, row, and column levels across every module.
SLA (Service Level Agreement): Within SMART, a per-pipeline threshold defining expected completion times and performance targets. DataByte alerts before SLA breaches and fires automated corrective actions when thresholds are violated.
SMART Framework: DataByte's embedded governance framework with five primitives: SLA, Monitoring, Actions, Rules, Traceability - woven into every module to make compliance reporting a report, not a project.
Traceability: The T in SMART - a cross-module audit trail recording every transformation, decision, rule evaluation, and access event from source to consumer, enabling compliance and debugging.

8 terms

ML & Model Evaluation Metrics

AUC: Area Under the Curve - a model evaluation metric available in ML Studio's Model Comparison tab, measuring classifier performance across all classification thresholds.
Drift Detection: Monitoring capability in ML Studio and Cash Forecaster that tracks when input data distributions shift relative to the training baseline, triggering retraining or alerts before model accuracy degrades.
Experiment Tracking: A capability in ML Studio that logs every dataset, configuration, hyperparameter, and evaluation metric for all training runs, enabling comparison, reproducibility, and rollback across teams.
Feature Importance: A diagnostic in ML Studio showing which input variables most influence a model's predictions, surfaced via SHAP values and a feature importance chart.
MAE (Mean Absolute Error): A forecast accuracy metric in Cash Forecaster representing the typical dollar-amount error in predictions.
MAPE (Mean Absolute Percentage Error): A forecast accuracy metric in Cash Forecaster showing the average percentage difference between predicted and actual values. Used to monitor model health and trigger retraining.
Population Stability Index (PSI): A statistical metric used in Cash Forecaster's drift detection to measure how much the distribution of an input feature has shifted since the model was trained.
SHAP Values: SHapley Additive exPlanations - a model interpretability technique used in Cash Forecaster to rank feature importance and show which variables drive predictions at each time horizon.

12 terms

ML Algorithms & Models

ARIMA: Autoregressive Integrated Moving Average - one of 25+ time-series forecasting algorithms available in the Forecaster module.
AutoGluon: An advanced AutoML engine option in ML Studio for multi-model ensembling and complex time-series tasks.
AutoML: Automated Machine Learning - a path in ML Studio that automatically evaluates algorithms, tunes hyperparameters, and surfaces the best model without requiring custom code.
Hyperparameter Tuning: The automated optimization of model configuration parameters (e.g., learning rate, tree depth) performed by AutoML in ML Studio using Grid Search or Random Search strategies.
LightGBM: Light Gradient Boosting Machine - a high-performance ML algorithm available in the Forecaster module and used in Cash Forecaster's short-term horizon model.
Logistic Regression: A classification algorithm included in ML Studio's AutoML engine for binary classification tasks.
N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series - a deep-learning time-series algorithm available in the DataByte Forecaster module.
Prophet: A time-series forecasting algorithm developed by Meta, available in the Forecaster module and used in Cash Forecaster's long-term horizon model enhanced with macroeconomic factors.
PyCaret: An open-source AutoML library used as a rapid-prototyping engine option in ML Studio for quick baseline experiments.
SARIMA: Seasonal ARIMA - a time-series forecasting algorithm available in the DataByte Forecaster module that extends ARIMA to model seasonal patterns.
SVM (Support Vector Machine): A classification algorithm included in ML Studio's AutoML evaluation suite.
XGBoost: Extreme Gradient Boosting - a high-performance ML algorithm available in the Forecaster module and used in Cash Forecaster's medium-term horizon model (paired with ARIMA).

15 terms

Operations & Monitoring

Blast Radius: A ProcBot analytics metric showing the average (and P95/max) number of hosts impacted per deployment failure, used to assess incident severity.
DataOps: DataByte's operational command center module providing live pipeline health, SLA tracking, failure-pattern analysis, resource trends, and AI-generated recommendations. Encompasses SparkOps and IngesterOps sub-engines.
Deployment (ProcBot): In ProcBot, a deployment is a procedure combined with execution context: target machines, scheduling, priority level, pre/post stages, and rollback configuration.
Fleet (ProcBot): The collective set of target servers/hosts managed under ProcBot. Fleet-level analysis shows how many hosts are running, queued, or failed at any time.
IngesterOps: The ingestion-focused sub-engine within DataByte's DataOps module, providing a control plane for X→Y, CDC, and Advance ETL pipelines - covering queue utilization, delayed streaming, throughput ranking, and inline resource recommendations.
MTBF (Mean Time Between Failures): A reliability metric in ProcBot analytics tracking how long (on average) between operational failures, showing whether an environment is becoming more stable over time.
MTTR (Mean Time to Recovery): A resilience metric in ProcBot analytics tracking how quickly the team recovers from failures. Pre-configured rollback and real-time visibility are the primary levers for improving MTTR.
NOC Dashboard: Network Operations Center dashboard - a real-time visualization of network KPIs delivered via DataByte Analytics, referenced in the RAN/Core/Transport monitoring use case.
OpenTelemetry: An observability framework for collecting metrics, logs, and traces from distributed systems. Referenced in a DataByte use case for EMS and OpenTelemetry fault monitoring with autonomous RCA.
ProcBot: DataByte's no-code workflow and deployment automation module for IT Operations - centralizing script execution (Bash/Python), approval workflows, scheduling, fleet monitoring, rollback, and SMART governance.
RCA (Root Cause Analysis): The process of identifying the underlying cause of a failure. Sherlock performs autonomous RCA using no-code decision trees on Spark, correlating alarms, change events, and historical failures.
Rollback (ProcBot): A first-class ProcBot capability that reverts infrastructure changes to a previous state if a deployment causes unexpected problems, triggered directly from the execution interface.
Sherlock: DataByte's Operations module for autonomous root-cause analysis. It runs no-code decision trees on Spark, isolates failure points, and verifies remediation with a closed-loop health check.
SparkOps: The Spark-focused sub-engine within DataByte's DataOps module, providing a control plane for transformation workloads - AI health analysis, resource topology, pod health, infrastructure forecasting, and one-click optimization recommendations.
cron / Cron-based Scheduling: A time-based job scheduling method used in ProcBot's Recurring scheduling mode for tasks like nightly backups, weekly compliance checks, and daily health scans.

6 terms

Platform Modules

Anomaly Detector: A DataByte Intelligence module that performs near-real-time anomaly detection on SQL, Kafka, webhooks, and APIs, with continuous re-learning and separate workspaces for monitoring and model configuration.
Cash Forecaster: A DataByte Intelligence module providing AI-powered treasury cash-flow forecasting across three time horizons (short, medium, long-term) using LightGBM, XGBoost+ARIMA, and Prophet models with confidence bands.
Data Insider: DataByte's Delivery module that exposes any SQL or NoSQL source as a versioned, secured REST API with row/column-level security, rate limiting, authentication, and auto-generated Swagger documentation.
Forecaster: A DataByte Intelligence module with 25+ time-series algorithms (ARIMA, SARIMA, Prophet, LightGBM, XGBoost, N-BEATS, etc.) for demand, capacity, or financial forecasting with scheduled runs and accuracy dashboards.
ML Studio: DataByte's end-to-end machine learning platform module covering data preparation, AutoML, visual pipelines, Jupyter notebooks, experiment tracking, one-click REST deployment, and drift monitoring.
MLOps: Machine Learning Operations - the practice of deploying, monitoring, and governing ML models in production. ML Studio is DataByte's MLOps implementation.

3 terms

Telecom & Industry Concepts

EMS (Element Management System): Network management software used in telecom to monitor network elements; referenced in a DataByte use case for fault monitoring with autonomous RCA.
RAN (Radio Access Network): The part of a mobile network connecting end devices to the core network. Referenced in a DataByte telecom use case for near-real-time KPI monitoring.
gNB / eNB: 5G Next Generation NodeB (gNB) and 4G Evolved NodeB (eNB) - types of telecom base station hardware referenced in a DataByte use case for Nokia & Samsung vendor-procedure automation.

6 terms

Treasury & Finance Concepts

BAI2: A standard bank statement file format supported by Cash Forecaster for ingesting bank balance and transaction data.
Confidence Band (P10–P90): In Cash Forecaster, the probabilistic range of predicted cash positions from the 10th to 90th percentile, allowing treasury teams to plan conservatively when bands are wide and act decisively when bands are tight.
Coverage Score: A composite data quality metric in Cash Forecaster that aggregates bank statement coverage, ERP ledger completeness, payroll calendar coverage, feature derivation success, and backfill depth. If below 80%, forecast inference is paused.
DSO (Days Sales Outstanding): A treasury/finance KPI measuring how long it takes to collect receivables; used in Cash Forecaster scenarios to model the impact of customers paying later or earlier.
MT940: A standard SWIFT bank statement format supported by Cash Forecaster for ingesting bank transaction data.
Scenario Analysis (Cash Forecaster): A structured module in Cash Forecaster allowing treasury teams to model stress, base-case, and upside scenarios by adjusting DSO, collection acceleration, AP stretch, and revenue growth parameters.

Need help mapping these concepts to your stack?

Book a short walkthrough and we will connect glossary terms to your architecture, modules, and rollout plan.

Get in touch Explore the product