Data Innovators Exchange

Write something

Data History Podcast is happening in 3 days

Jun 18 •

In just twelve months, the percentage of organisations reporting complete distrust in their decision-making data has jumped from 55% to 67%—a 12 percentage point increase that should have every data engineer questioning whether their carefully architected pipelines are actually serving the business. This is mirrored at a global level with Edelman reporting significant decline in populations trust in both data and institutions. Check out the full report here https://www.edelman.com/trust/2025/trust-barometer

New comment Jun 18

Samuel Williams

Mar 19 •

In the News

🤔 Data in Motion: Deciding on a Real-Time DB for your business

Selecting the appropriate database system requires careful consideration of your specific use case and requirements. Here's the framework we used to evaluate the TOP 5 OLAP & OLTP DB's in this week's edition of datapro.news 1. Assess Your Workload: Is your application read-heavy or write-heavy? Transactional databases like DynamoDB or MongoDB are optimised for high-volume write operations, making them suitable for scenarios like e-commerce checkouts or user activity logging. Analytical systems like ClickHouse or Snowflake excel at read-heavy workloads, such as generating real-time dashboards or running complex analytical queries. 2. Consider ACID Requirements: If your application requires strict ACID (Atomicity, Consistency, Isolation, Durability) compliance, transactional systems like PostgreSQL or DynamoDB with their transaction support would be more appropriate. Many analytical systems prioritise performance and scalability over strict ACID guarantees. 3. Evaluate Data Velocity: For applications dealing with high-speed data streams, such as IoT sensor networks or real-time bidding platforms, consider analytical databases like ClickHouse or Apache Druid. These systems are designed to ingest and query high-velocity data streams efficiently. 4. Balance Budget and Control: Managed services like Snowflake or Tinybird can significantly reduce operational overhead, allowing teams to focus on data analysis rather than infrastructure management. However, this convenience often comes at a higher cost. Open-source solutions like Apache Pinot or Apache Druid offer more control and potential cost savings but require more expertise to deploy and maintain. 5. SQL vs. NoSQL: Many analytical systems now offer robust SQL support, with platforms like Snowflake and ClickHouse providing familiar interfaces for data analysts. In the transactional space, NoSQL databases like DynamoDB offer flexibility for handling unstructured or semi-structured data. Consider your team's expertise and the nature of your data when making this choice.

Samuel Williams

Mar 19 •

In the News

🤔 Data in Motion: OLAP vs OLTP Poll

Understanding the key differences between Transactional vs Analytical Real-Time Databases. In this week's www.datapro.news we evaluate the top 5 real-time DB applications in each category. Here is the key criteria we used to evaluate them... 1. Primary Use Case: Transactional (OLTP) databases excel at handling CRUD (Create, Read, Update, Delete) operations and serve as the backbone for application backends. They're designed to process many small, discrete transactions quickly. Analytical (OLAP) databases, on the other hand, are built for aggregation, reporting, and business intelligence tasks. They're optimized for complex queries that scan large portions of the dataset. 2. Data Structure: Transactional databases typically use a row-based storage model with normalised data structures. This approach optimises for quick updates to individual records.Analytical databases often employ columnar storage with denormalised data models. This structure allows for faster scans and aggregations across large datasets. 3. Scalability: Transactional systems often scale vertically (by adding more resources to a single machine) and use sharding for horizontal scaling. This approach maintains low latency for individual transactions.Analytical databases are designed for horizontal scaling across distributed systems. This allows them to handle massive datasets and parallel query processing. 4. Latency: Transactional databases aim for microsecond to millisecond response times, crucial for real-time application responsiveness.Analytical databases typically operate in the millisecond to second range, focusing on processing complex queries over large datasets quickly. 5. Typical Workloads: A typical transactional workload might involve processing 10,000 concurrent order placements in an e-commerce system. An analytical workload could involve analysing 1 billion rows of sales data to identify trends and generate reports. Which has been your primary experience with?

Poll

1 member has voted

Paul Barlow

Jan 27 •

In the News

An LLM from China crashing the US stock market?

Well not in a back door, espionage dodgy way, but it looks like the Chinese company DeepSeek has shocked the US AI market by producing an LLM called R1that's on par with ChatGPT but runs at a fraction of the cost of OpenAI, Google, Meta and other popular AI models - which has shocked the US markets. Here's the CNN report on it - https://edition.cnn.com/2025/01/27/tech/deepseek-stocks-ai-china/index.html

Paul Barlow

Sep '24 •

In the News

Is hashing good enough for anonymising data?

https://www.rnz.co.nz/news/business/527419/inland-revenue-giving-thousands-of-taxpayers-details-to-social-media-platforms-for-ad-campaigns?fbclid=IwY2xjawFLSwJleHRuA2FlbQIxMAABHfiQoZd2lKNuLPKWDo5IrGSrtYtTKNwWBrS0kfJBtccVTTWP9FPKrjY3zg_aem_jp9YxiznNYVeVO5Oo9wqfA

New comment Sep '24