Modular AI Feature Suite for a Data Analytics Platform
Our partner, a data analytics software provider, aimed to enhance its product with AI-powered features that help users make faster, more accurate, and more scalable decisions.
We implemented a full suite of intelligent tools covering clustering, sentiment analysis, time series analytics, and copilot functionality.
Solution
Unidatalab developed a comprehensive AI feature set integrated directly into the analytics platform via modular services and APIs. Each component was designed to be independently usable and production-ready.
How it works
Advanced Clustering Module includes automated pipelines for missing value handling, categorical encoding, PCA-based dimensionality reduction, and six clustering algorithms (K-Means, DBSCAN, HDBSCAN, MiniBatchKMeans, Gaussian Mixture, Agglomerative). Optimal parameters are automatically selected using silhouette scores.
Multilingual transformer models (RoBERTa, DistilBERT) were benchmarked and optimized for speed and accuracy using ONNX quantization. The models handle sarcasm, emojis, and informal language across 50+ languages
Built using Facebook’s Prophet model, the pipeline automatically extracts trend, seasonality, and decomposes time series. Outputs include trend and seasonality data, ready for visualization and decision-making.
Implemented an LLM-based copilot using OpenAI/Azure with Qdrant vector search and RAG. The copilot helps users understand how the platform works and provides command suggestions needed to launch specific analytics workflows, all inside an integrated widget with links, summaries, and context-aware answers.
Profile report analysis was automated using structured prompts to an LLM, generating field-level severity scores, summary insights, and data cleaning recommendations.
Our challenges:
Manual data profiling and clustering lacked scalability
Users had to manually explore large datasets and try different clustering algorithms, limiting analysis speed and flexibility.
Sentiment models underperformed on non-English texts
Built-in tools could not accurately classify sentiment across languages or detect nuanced emotional tone like sarcasm or slang.
Time series insights required technical expertise
Non-technical users lacked tools to extract trends and seasonality from time-based data without coding.
Project stages
Mapped the functional areas (profiling, clustering, NLP, forecasting) and selected technical frameworks including OpenAI, ONNX, Prophet, and Qdrant for integration.
Each capability was built as an independent module:
Clustering pipeline (data prep → encoding → dimensionality reduction → clustering);
Sentiment analysis engine with multilingual support;
Time series decomposition and trend extraction;
LLM-based Copilot architecture and integration;
Prompt-driven insight generation engine for profiling data.
Benchmarked LLMs for sentiment across languages.
Validated clustering accuracy and runtime.Evaluated model inference time and memory consumption.
Tested time series outputs across multiple granularities (hourly, weekly, yearly).