Data & AI Technology Radar

Introducing the Unit8 Technology Radar - a comprehensive guide that empowers businesses to navigate the ever-evolving landscape of technology. The radar serves as a strategic compass, providing insights into emerging technology trends, platforms, tools, languages, and frameworks. Join us on this journey as we explore and develop the Unit8 Technology Radar and discover the latest technological innovations that can propel your organization to new heights.

Download PDF Key Trends 2024

What’s the radar about?

Technology Radar is a comprehensive tool, inspired by the pioneering efforts of our colleagues at Thoughtworks, that showcases the latest trends and developments in the Data, Advanced Analytics, and AI space. This tool is a culmination of the collective experience of our engineering team, drawing from hundreds of projects and collaborations with our customers each year.

There are no blips on this quadrant, please check your Google sheet/CSV/JSON file once.

Methodology

The radar categorizes key technological trends and tools into four main quadrants: Infrastructure & xOps, Data & Analytics Platforms, ML & Data Science, and GenAI. Additionally, we have ranked these trends and tool by how confidently we would recommend them to our customers.

Adopt: We believe the industry should embrace these technology trends and tools. We incorporate them into our projects and see them as suitable for most of the Enterprises.

Trial: Worth pursuing. Most mature Enterprises should be poised to adopt those trends, even though many of the best practices, whether around architecture or the target operating model, have not been firmly established yet.

Assess: Consider testing the technology to evaluate its maturity and experiment with its potential effects on your Enterprise in the future.

Hold: Proceed with caution. Evaluate carefully if your organisation is internally prepared (talent, skills, infrastructure & data readiness) to embrace the tech trend.

Key trends 2024

1Maturing GenAI Stack

While the hype around new models continues, these models have already reached a level of capability that allows for a wide range of practical enterprise application. It's time for attention to shift from the models themselves to the supporting technology stack that enables their effective deployment. Every layer of this stack—from cost-efficient hardware that accelerates inference time, through advanced tools for experimentation and development, to platforms for building production-ready applications—is rapidly evolving to meet industry demands.

Understanding and leveraging this maturing stack is crucial to unlocking the full potential of generative AI. Building GenAI applications today is markedly different from just a year ago; it's no longer about simply calling an API and prompt engineering. The ecosystem has become more complex and competitive, with numerous players entering the field and a multitude of tools that often overlap in functionality. However, key elements of the workflow are beginning to solidify.

Orchestration frameworks enable the construction of advanced workflows, including agents with access to various tools. Evaluation tools are essential for ensuring quality and implementing guardrails, while vector stores allow applications to tap into relevant knowledge. Additionally, the maturation of hosting tools provides viable alternatives to third-party solutions.

Though this added complexity might seem daunting, the long-term benefits in terms of quality and reliability make the investment worthwhile. The rapidly changing landscape can make it challenging to choose the right tools, especially with multiple contenders vying for dominance in each domain. Nonetheless, the generally good interoperability between tools means this should not be an obstacle.

Early experimentation and adoption give organizations a crucial edge. By building familiarity with these evolving tools and frameworks now, organizations can position themselves for long-term gains in capability and competitiveness in the GenAI space.

2Reaching production

While low-hanging-fruit GenAI applications—such as document chats and private versions of ChatGPT—are already widely adopted, the current challenge for organizations is transitioning into full-scale production environments.

Stakeholder buy-in hinges on trust in the reliability of outputs. The maturing GenAI stack now offers robust tools to ensure that, including guardrails, evaluation methods, and live feedback mechanisms. Although these might introduce excess complexity for PoCs, leveraging them in production settings is often a wise choice, as their increasing ease of integration offers a favorable effort-to-benefit ratio.

However, the key roadblock to production-grade GenAI solutions is data quality and governance. Organizations have traditionally focused on structured data, given the limited business value previously derived from unstructured data. That has changed rapidly with the rise of LLMs, presenting a new challenge for data teams to ensure quality and availability to meet the appetite. Notably, metadata plays a key role in many use cases and must be taken seriously.

Cost is often a major consideration when moving from PoC to production. While costs per token are plummeting (GPT-4o tokens now being 9 times cheaper than those of GPT-4 upon release), modern implementations tend to be way more token-hungry than in the recent past, due to features like long-context RAGs, agents, and chain-of-thought.

Finally, the maturing GenAI stack is unlocking new classes of use cases that may offer greater business value than simple RAGs and "SafeGPTs". This expansion is likely to broaden the scope of production-level applications. Agents, in particular, provide powerful solutions for complex problems but require more substantial development efforts and robust guardrails. It's crucial to assess the business value of the use cases in which they are adopted to ensure a worthwhile return on investment.

3Zoo of LLM models

The landscape of LLMs provides developers with more choices than ever before. Open-source models are now performing competitively in benchmarks and have become easier to host and integrate, thanks to a maturing GenAI stack.

Experimenting with fine-tuning might be worthwhile if standard models do not meet specific use-case requirements, especially since it has become more accessible — not only for open-source, but also for proprietary models like OpenAI’s. While customizing models can yield significant benefits, it's important to note that the state-of-the-art changes quickly. A fine-tuned model might become obsolete if not regularly updated or soon outperformed by newer models that offer better accuracy, efficiency, or features.

However, fine-tuning doesn't always have to be done in-house. A plethora of specialized models fine-tuned by the community are available—such as those on Hugging Face for open-source models or OpenAI's GPT Store for proprietary options. Leveraging these can save time and resources while still providing tailored performance.

The selection of models varies not only in specialization but also in size. Smaller models (7–30 billion parameters) are becoming increasingly capable and popular. They are cost-effective option for fine-tuning and open up new possibilities for edge computing, enabling on-device deployments instead of relying solely on servers.

For enterprises, this means that today there is a wider range of options of deploying LLMs from various providers, enabling them to optimize performance, reduce costs, and potentially gain a competitive advantage, and to avoid lock-in they should adopt a flexible and modular AI strategy.

4The Rise of Interoperable Data Lakes

The era of restrictive data silos is coming to an end. Open data and table formats, such as Apache Iceberg, are breaking down barriers and fostering a new era of collaboration. Now, data teams can seamlessly access the same data with different processing engines—whether it's Databricks, Snowflake, Microsoft Fabric, or others – without needing to create redundant copies. This means you can use the best tool for the job, whether it's for exploratory analytics, BI dashboards, or AI model training, all while working with a single source of truth.

This interoperability is powered by a clever approach to metadata management. These formats abstract the underlying data structure, allowing different engines to understand and query the data in a consistent way. For instance, Databricks' UniForm feature allows Delta Lake to seamlessly interoperate with Iceberg and Hudi, while Apache XTable provides bi-directional conversions between various formats. Even Snowflake is embracing this trend, with their external tables functionality and commitment to open standards like Iceberg, further enhancing the interoperability between Snowflake and other platforms like Microsoft Fabric.

This approach means that organizations can consolidate data in a central repository, like a data lakes on S3 or Azure ADLS, while allowing different teams to use the most suitable processing tool for a given task, irrespective of the initial table format. This can also be a powerful way to save costs! Not only for cloud storage itself (as multiple copies of data are no longer necessary), but also ones related to the effort of migrating and maintaining consistent data between silos.

However, while interoperability solutions are bridging the gap, the choice of your primary table format still matters. Write operations can vary significantly between formats, and some optimization might be lost during metadata conversion. Therefore, it's crucial to select a format that aligns with your primary use case, whether it's high-volume batch processing, real-time streaming, or large-scale analytics.

In conclusion, interoperable data lakes are transforming the way organizations manage and access their data. By embracing open standards and leveraging the right tools, businesses can unlock new levels of efficiency, collaboration, and insight.