Documentation

Data Pipeline

Bloom.ai processes global data through a 3-layer pipeline. Raw data is ingested, classified with AI, correlated across sources, and transformed into actionable intelligence.

Four data sources,
one unified pipeline

News / RSS

Global news articles from RSS feeds, classified and enriched in real time.

Stock Market

Live market data from Yahoo Finance — indices, equities, and commodities.

Crypto

Cryptocurrency prices and market caps via CoinGecko.

Geopolitical

Conflict and crisis data from ACLED and the Uppsala Conflict Data Program.

Layer 1

Ingestion & Classification

Specialized processors normalize each source into a common raw_event schema. Each event is then enriched with AI-generated metadata including categories, severity scores, and 768-dimensional embedding vectors via Google Gemini.

  • Classification
  • Severity scoring
  • Region assignment
  • Keyword extraction
  • Entity extraction
  • Embedding generation
Processing Flow
Raw Data
Source ProcessorAI
raw_event
LLM AgentAI
Enriched Event

768

Embedding dims

Gemini

Embedding model

embedding similarity
Example Correlation
Sanctions article
NEWS
Crypto market crash
CRYPTO
Stock market sell-off
STOCKS
Unified geopolitical event

Layer 2

Correlation & Event Aggregation

Clusters related raw events from different sources into unified event entities using cosine similarity on embedding vectors. A sanctions article, a crypto crash, and a stock sell-off become one coherent geopolitical event.

  • Title & description synthesis
  • Category & severity assignment
  • Latitude / longitude geocoding
  • Source event linking with traceability
scored events

Layer 3

Insights & Predictions

Operates on scored events to detect trends, generate predictions, and power the user-facing chat agent. All AI calls route through the user's selected model — supporting Anthropic, OpenAI, xAI, DeepSeek, and Google.

Personalization

Suggestions based on user preferences and engagement patterns using vector similarity.

Trend Detection

Patterns across events over time — escalations, de-escalations, and emerging narratives.

Chat Agent

Q&A with source references via RAG. Retrieves events, raw sources, and citations.

Predictions

Market and geopolitical forecasts derived from event severity and historical patterns.

RAG Retrieval Chain
User query
Events
Raw events
Original sources
Answer + citations
Dashboard
Map Pins
Chat
Alerts

Technical details

Key implementation details behind the pipeline.

Embedding model

Google Gemini text-embedding-004 producing 768-dimensional vectors stored via pgvector in PostgreSQL.

Deduplication

Enforced via unique index on (source_type, source_id) to prevent duplicate ingestion.

Multi-provider AI

All LLM calls route through a provider-agnostic client supporting Anthropic, OpenAI, xAI, DeepSeek, and Google.

Cosine similarity

Event correlation uses cosine distance on 768-dim embedding vectors to cluster related raw events.

Explore the full platform

See how the pipeline powers the dashboard, API, and AI chat — or dive into the user engagement system.