Documentation
Data Pipeline
Bloom.ai processes global data through a 3-layer pipeline. Raw data is ingested, classified with AI, correlated across sources, and transformed into actionable intelligence.
Four data sources,
one unified pipeline
News / RSS
Global news articles from RSS feeds, classified and enriched in real time.
Stock Market
Live market data from Yahoo Finance — indices, equities, and commodities.
Crypto
Cryptocurrency prices and market caps via CoinGecko.
Geopolitical
Conflict and crisis data from ACLED and the Uppsala Conflict Data Program.
Layer 1
Ingestion & Classification
Specialized processors normalize each source into a common raw_event schema. Each event is then enriched with AI-generated metadata including categories, severity scores, and 768-dimensional embedding vectors via Google Gemini.
- Classification
- Severity scoring
- Region assignment
- Keyword extraction
- Entity extraction
- Embedding generation
768
Embedding dims
Gemini
Embedding model
Layer 2
Correlation & Event Aggregation
Clusters related raw events from different sources into unified event entities using cosine similarity on embedding vectors. A sanctions article, a crypto crash, and a stock sell-off become one coherent geopolitical event.
- Title & description synthesis
- Category & severity assignment
- Latitude / longitude geocoding
- Source event linking with traceability
Layer 3
Insights & Predictions
Operates on scored events to detect trends, generate predictions, and power the user-facing chat agent. All AI calls route through the user's selected model — supporting Anthropic, OpenAI, xAI, DeepSeek, and Google.
Personalization
Suggestions based on user preferences and engagement patterns using vector similarity.
Trend Detection
Patterns across events over time — escalations, de-escalations, and emerging narratives.
Chat Agent
Q&A with source references via RAG. Retrieves events, raw sources, and citations.
Predictions
Market and geopolitical forecasts derived from event severity and historical patterns.
Technical details
Key implementation details behind the pipeline.
Embedding model
Google Gemini text-embedding-004 producing 768-dimensional vectors stored via pgvector in PostgreSQL.
Deduplication
Enforced via unique index on (source_type, source_id) to prevent duplicate ingestion.
Multi-provider AI
All LLM calls route through a provider-agnostic client supporting Anthropic, OpenAI, xAI, DeepSeek, and Google.
Cosine similarity
Event correlation uses cosine distance on 768-dim embedding vectors to cluster related raw events.
Explore the full platform
See how the pipeline powers the dashboard, API, and AI chat — or dive into the user engagement system.