Documentation

Data Pipeline

Bloom.ai processes global data through a 3-layer pipeline. Raw data is ingested, classified with AI, correlated across sources, and transformed into actionable intelligence.

All documentation

Four data sources,
one unified pipeline

News / RSS

Global news articles from RSS feeds, classified and enriched in real time.

Stock Market

Live market data from Yahoo Finance - indices, equities, and commodities.

Crypto

Cryptocurrency prices and market caps via CoinGecko.

Geopolitical

Conflict and crisis data from ACLED and the Uppsala Conflict Data Program.

Layer 1

Ingestion & Classification

Specialized processors normalize each source into a common raw_event schema. Each event is then enriched with AI-generated metadata including categories, severity scores, and 1024-dimensional embedding vectors via Google Gemini.

Classification
Severity scoring
Region assignment
Keyword extraction
Entity extraction
Embedding generation

Processing Flow

Raw Data

Source ProcessorAI

raw_event

LLM AgentAI

Enriched Event

1024

Embedding dims

Gemini

Embedding model

embedding similarity

Example Correlation

Sanctions article

NEWS

Crypto market crash

CRYPTO

Stock market sell-off

STOCKS

→Unified geopolitical event

Layer 2

Correlation & Event Aggregation

Clusters related raw events from different sources into unified event entities using cosine similarity on embedding vectors. A sanctions article, a crypto crash, and a stock sell-off become one coherent geopolitical event.

Title & description synthesis
Category & severity assignment
Latitude / longitude geocoding
Source event linking with traceability

scored events

Layer 3

Insights & Predictions

Operates on scored events to detect trends, generate predictions, and power the user-facing chat agent. All AI calls route through the user's selected model - supporting Anthropic, OpenAI, xAI, DeepSeek, and Google.

Personalization

Suggestions based on user preferences and engagement patterns using vector similarity.

Trend Detection

Patterns across events over time - escalations, de-escalations, and emerging narratives.

Chat Agent

Q&A with source references via RAG. Retrieves events, raw sources, and citations.

Predictions

Market and geopolitical forecasts derived from event severity and historical patterns.

RAG Retrieval Chain

User query

Events

Raw events

Original sources

→Answer + citations

Dashboard

Map Pins

Chat

Alerts

Technical details

Key implementation details behind the pipeline.

Embedding model

Google Gemini gemini-embedding-001 producing 1024-dimensional vectors stored via pgvector in PostgreSQL.

Deduplication

Enforced via unique index on (source_type, source_id) to prevent duplicate ingestion.

Multi-provider AI

All LLM calls route through a provider-agnostic client supporting Anthropic, OpenAI, xAI, DeepSeek, and Google.

Cosine similarity

Event correlation uses cosine distance on 1024-dim embedding vectors to cluster related raw events.

Explore the full platform

See how the pipeline powers the dashboard, API, and AI chat - or dive into the user engagement system.

View the dashboard User engagement

Data Pipeline

Four data sources,one unified pipeline

News / RSS

Stock Market

Crypto

Geopolitical

Ingestion & Classification

Correlation & Event Aggregation

Insights & Predictions

Personalization

Trend Detection

Chat Agent

Predictions

Technical details

Embedding model

Deduplication

Multi-provider AI

Cosine similarity

Explore the full platform

Four data sources,
one unified pipeline