Indigo - The Executive Intelligence OS

Abstract

Organizations today grapple with information overload—a deluge of messages, meetings, and documents that contain critical decisions and knowledge, but in unstructured, noisy forms. Conventional approaches, such as retrieval-augmented generation (RAG), enhance large language models (LLMs) with document recall, yet fall short on temporal awareness, ownership tracking, and actionability. We introduce Signal–Context Architecture (SCA), a model-agnostic AI framework that separates live information "signals" from static knowledge "context," then fuses them with a validation step to produce executive-grade answers with provenance. In SCA, an Insights Agent first processes high-velocity signals (e.g. meeting transcripts, chats, emails) into a structured Decision Ledger of accomplishments, action items, and decisions. Meanwhile, a Knowledge Layer curates context from institutional memory (documents, wikis, policies), weighting evidence by recency and source authority. A Deep Research Agent then composes answers by combining the Decision Ledger and Knowledge Layer, ensuring that every answer is grounded in both recent events and trusted documentation. SCA directly addresses key gaps of standard RAG pipelines—time sensitivity, accountability, follow-through, and provenance—by binding events to evidence before generation. We detail the SCA design principles, system pipeline, and data model. Finally, we discuss an initial implementation and measurements in an enterprise setting, illustrating how SCA delivers precise recall (e.g. "Who decided what, when, and why?") and board-level synthesis (e.g. drafting quarterly plans) with citations, timestamps, and deep links to source data. The result is an AI assistant that turns conversation chaos into actionable, audit-ready intelligence, future-proofed by a model-agnostic approach and human-in-the-loop governance.

1. Introduction

Modern enterprises run on continuous flows of information—from rapid-fire Slack threads and meeting discussions to vast archives of documents and knowledge bases. Extracting reliable, decision-ready insight from this noisy, ever-changing data is a grand challenge. Generative AI offers potential solutions, but off-the-shelf large language models struggle with factuality and recency, especially in high-stakes business contexts. Retrieval-Augmented Generation (RAG) techniques improve knowledge recall by feeding LLMs with relevant documents, allowing models to cite sources and reduce hallucinations. However, simply augmenting models with document chunks has proven insufficient for organizational intelligence.

Several critical gaps remain when deploying AI assistants for executives and teams:

Temporal Awareness: Business decisions are time-sensitive; what changed yesterday often matters more than a static document from last year. Traditional RAG typically relies on a fixed index (e.g. a snapshot of Wikipedia or a company drive) and thus may miss recent updates. Even with periodic index updates, answers about evolving situations (e.g. "Who is the current president?" or "What was decided in last week's meeting?") are challenging. Recent research has begun incorporating temporality into retrieval models, confirming that time-aware processing is needed for up-to-date answers.
Ownership and Accountability: Executives ask not only what happened, but who decided it, when, and why. Standard LLM or RAG approaches do not inherently track decision provenance (e.g. which person or meeting produced a decision). In multi-party work streams, critical context lies in dialogue acts like commitments or decisions. Identifying such acts is non-trivial—decisions can be phrased ambiguously ("Let's go with Option B" vs. an explicit "Decision: Choose Option B"). Prior work in NLP shows that with appropriate models and training, dialogues can be parsed for decisions and action items. Yet, few production AI systems attempt to capture decision genealogy — linking a decision to its origin and stakeholders.
Follow-through (Actionability): Answers that merely summarize information have limited value if they don't facilitate action. In an organizational setting, a helpful answer should not only state facts but also highlight action items (what needs to be done, by whom) or surface next steps. For instance, an executive asking "What's blocking Project X this week?" expects a response that identifies blockers and owners, not just a generic summary. Purely retrieval-based systems return relevant text passages but don't synthesize to-do lists or status updates. Even advanced meeting summarization systems often miss translating decisions into actionable tasks. This gap between insight and action means work doesn't move forward.
Provenance and Trust: In high-stakes environments, uncited claims don't get adopted. A CEO or board will rightly distrust an AI-generated recommendation if it's not backed by evidence. RAG methods introduced the ability for models to cite sources (analogous to footnotes) to improve user trust. However, not all implementations enforce strict citation of both recent events and long-term knowledge. Ensuring that every answer is traceable to a transcript line or document paragraph is essential for credibility. Moreover, the quality of sources matters: information pulled from an outdated policy or a random wiki page can mislead. Thus, provenance must be coupled with source validation (e.g. preferring authoritative or recent sources).

In summary, existing AI assistants often operate as monolithic black-box models trying to "understand everything at once." This fails to handle the dynamic, noisy reality of organizational data. Signal–Context Architecture (SCA) tackles these challenges by separating the problem into two streams—"hot" signals and "cold" context—and then unifying them with a verification step. By first structuring the live signal stream (to capture timelines, owners, and actions) and separately curating a trusted context store (to provide evidence), SCA can generate answers that are timely, accountable, actionable, and verifiable. We hypothesize that this dual-stream approach will significantly outperform vanilla RAG on executive decision support queries.

This paper presents SCA's design and an initial implementation. In Section 2, we situate SCA relative to related work in retrieval-augmented LLMs, meeting understanding, and knowledge management. Section 3 details the architecture of SCA, including the Signal pipeline with the Insights Agent, the Context pipeline with the Knowledge Layer, and the Deep Research Agent that fuses information. We describe how decisions are captured in a Decision Ledger and linked to supporting evidence. Section 4 outlines key design principles and implementation notes that make SCA durable and model-agnostic. We also provide a qualitative comparison of SCA vs. a traditional RAG system to highlight capability differences. Section 5 discusses our evaluation approach and early results, including metrics like decision recall time and action item extraction coverage. Finally, Section 6 concludes with limitations and future directions, such as integrating recommendation engines and multi-step workflow orchestration on top of SCA's structured knowledge.

2. Background and Related Work

Retrieval-Augmented Generation (RAG)

RAG is a family of techniques that combine LLMs with information retrieval from external data sources. Instead of relying solely on a model's parametric knowledge, a RAG system fetches relevant documents (usually via vector similarity search or dense retrievers) and provides them as context to the LLM during generation. Lewis et al. (2020) introduced the term "RAG" in their seminal work, and showed that augmenting a generative model with retrieved Wikipedia passages improved factual question-answering. The approach has since been widely adopted in hundreds of papers and many commercial services. The appeal of RAG lies in its modularity (one can update the external knowledge source without retraining the model) and its propensity to produce answers with traceable sources. By giving models "footnotes" to cite, RAG helps mitigate hallucinations and builds user trust. For instance, Nvidia's RAG reference architecture emphasizes that citing sources makes the LLM's responses more reliable for users.

Despite these strengths, basic RAG implementations have limitations that motivate SCA. Traditional RAG typically treats the knowledge base as a static corpus (e.g. a fixed snapshot of company documents). As discussed, this makes it slow to capture temporal changes. Researchers have proposed extensions like TempRAG or Time-aware RALM to incorporate temporality by indexing multiple versions of documents or adding time metadata to queries. These approaches confirm that simply swapping in a new corpus is not always sufficient for domains with frequent updates. SCA addresses temporality not just by keeping context up-to-date, but by maintaining an event log of what recently happened (the Decision Ledger). Another limitation of RAG is that it usually retrieves text passages that loosely match the query, but it does not know about structured events (like "decision made in Meeting X"). Our approach can be seen as injecting a knowledge graph of decisions and actions into the retrieval loop, rather than retrieving raw text alone. Lastly, while RAG can provide citations to documents, it doesn't inherently capture why that document is relevant (e.g. was it the spec that a decision was based on, or a policy that constrains it?). SCA's validation step explicitly links events to evidence, aiming to ensure that the evidence is supportive and contextual (analogous to efforts in verifiable QA where the system checks that retrieved docs truly support the answer).

Meeting Summarization and Action Item Extraction

There is a growing body of work in applying NLP and LLMs to meeting transcripts and workplace conversations. Recent LLM-based systems can generate meeting recaps, often focusing on summaries of discussion points or highlighting key decisions and action items. For example, Wu et al. (2023) design an LLM-powered meeting recap system that produces two kinds of outputs: important highlights and structured minutes (with sections for decisions, tasks, etc.). Their user study found the approach promising for efficiency, but noted limitations: the automated summaries sometimes missed important details, mis-attributed information, or failed to gauge what was important to participants. These shortcomings underline the need for improved accuracy in identifying truly salient information like decisions and commitments. Traditional NLP approaches have treated decision detection and action item extraction as classification problems on dialogue acts. As one example, Bhattacharya (2020) explored fine-tuning BERT to identify dialogue acts such as decisions and action items in multi-party meetings, achieving state-of-the-art results on benchmark datasets. Such research shows it is feasible to parse raw transcripts into structured records, although it requires careful handling of ambiguity and context. SCA's Insights Agent builds on these ideas by using advanced language models to structure the raw conversations into a machine-interpretable ledger. Unlike a generic meeting summarizer, the Insights Agent is specialized to extract a Decision Ledger comprising clearly labeled Decisions (with who/what/when/why), Action Items (with owner and due date), and Accomplishments (deliverables or milestones achieved). This structured approach echoes the emerging practice in some teams of maintaining "decision logs" or "decision registers" for meetings and projects, whether manually or semi-automatically.

Organizational Memory and Knowledge Management

SCA's Context (Cold) layer relates to long-standing concepts in knowledge management, where organizations strive to build a "single source of truth" for policies, specs, and historical decisions. Corporate wikis, intranets, or document repositories (SharePoint, Confluence, Notion, etc.) serve as institutional memory, but they often grow disorganized and outdated. Search engines and enterprise search appliances have been the traditional tool for retrieving information from these repositories. Modern semantic search using vector embeddings has greatly improved findability, even for unstructured text. Yet, relevance in enterprises is multifaceted: beyond keyword or semantic similarity, factors like document recency, authoritativeness, and usage frequency are important. SCA's Knowledge Layer incorporates these factors by design. It performs retrieval with semantic chunking and authority weighting, meaning that when searching context documents, it boosts content that is recent or authored by domain experts or frequently referenced. This approach is aligned with common practices in enterprise search ranking (e.g., boosting pages with recent edits or certain access patterns) and ensures that, say, the official "Security Policy" authored by the security team ranks above a random engineer's notes on security. By weighting for recency, SCA prioritizes fresh information when relevant, consistent with the notion that updated indexes yield better answers to time-sensitive queries. By weighting author credibility, SCA implements a trust model: not all sources are equal, and an executive answer should draw on validated, high-quality documents whenever possible. Finally, by tracking usage signals (how often a document is linked or viewed), the system infers which documents are considered important by the organization. All retrieved context fragments in SCA carry a citation (link to the source and metadata), so when the Deep Research Agent composes an answer, every claim can be traced back to its origin.

In summary, SCA synthesizes ideas from these areas: it is inspired by RAG's combination of retrieval and generation (but extends it with structured intermediates and temporal handling), it leverages advances in meeting AI to structure conversations into decisions/actions, and it employs enterprise search best practices to maintain a high-quality knowledge base. We next describe the architecture of SCA in detail.

3. SCA Architecture and Components

At a high level, Signal–Context Architecture (SCA) implements a dual-stream pipeline with a subsequent fusion step (Figure 1). This approach is reminiscent of the Lambda architecture in big data systems, which separates a batch layer (cold path) for comprehensive but slow processing and a speed layer (hot path) for low-latency updates. In SCA, the "cold path" corresponds to the Context stream (institutional knowledge base), and the "hot path" corresponds to the Signal stream (live inputs). Both streams feed into a fusion layer where responses are generated. By separating to conquer and then unifying to understand, SCA ensures each part of the problem is handled by specialized mechanisms.

3.1 Signals (Hot Stream) → Insights Agent → Decision Ledger

The Signals pipeline ingests high-velocity, high-volume data from live communication channels. Typical inputs include:

Meetings: real-time audio/video meeting transcripts (from platforms like Zoom, Google Meet, Teams), potentially with speaker identification and timestamps.
Chat and Email: messages from Slack/Teams channels, email threads, etc., often noisy with colloquialisms, reactions, or tangential comments.
Project trackers: updates from issue trackers like Linear or Jira (tickets created/closed), which reflect decisions in project scope and bug triage.

These sources are rich in content about what just happened, but are unstructured and intermingled with noise. The Insights Agent is a specialized component (powered by one or multiple NLP/LLM models) that continuously or periodically parses the signal stream and extracts structured nuggets of information. The output of this agent is the Decision Ledger, a living structured record that organizes events into three key categories:

Decisions: Records of decisions made, capturing who decided what, when, and optionally why (the rationale). A decision entry may point to the exact moment in a meeting transcript or the specific message where the decision was made (for traceability, e.g. a deep link to the video timestamp or Slack permalink). It also links to any available rationale or discussion context. Each decision has metadata: timestamp, decider (person or group), and links to related items (like tasks spawned or documents updated).
Action Items: Records of tasks or follow-ups identified, including what needs to be done, who owns it, and by when (deadline) if stated. This essentially forms an automatically generated to-do list extracted from discussions ("Alice will do X by next week"). The agent may infer action items even when not explicitly stated ("We should review the design" implies an action item "Review the design"). Each action item is linked back to its source event for context and can be updated when completed.
Accomplishments: Records of completed work or significant milestones achieved, with what was accomplished and when. For example, if in a meeting someone announces "Project Y was deployed to production last night," the agent can log this as an accomplishment (artifact deployed, timestamp). Accomplishments provide a historical ledger of deliverables and outcomes, which can later be used to answer questions like "What did the team deliver last month?"

The Insight Agent uses a mixture of techniques to populate the Decision Ledger. It employs temporal understanding (identifying references to time, deadlines, ordering of events) to place events correctly. It also builds a decision genealogy – linking related events across time. For instance, a decision made in a planning meeting may be confirmed later in a Slack thread, then implemented via a Pull Request. SCA explicitly links these into a chain: decided-by → confirmed-by → implemented-by relations, forming a graph of how an idea progresses from decision to execution. This provides context for follow-through; if someone asks later "Did we really implement feature X that was approved?", the system can traverse the graph and find if an accomplishment (e.g. PR merged) is linked to that decision. Such cross-references ensure accountability and prevent issues from falling through cracks or being re-decided due to lost context.

Example (Signal stream structuring)

To illustrate, imagine an engineering sync meeting where the team debates and decides to migrate infrastructure from AWS to GCP for cost reasons. The Insights Agent transcribes this and creates a Decision entry: "Decision: Migrate infra to GCP – Decider: Luis Ortega; Date: Aug 12, 2025 (Engineering Sync); Rationale: better cost & performance per benchmarks; see transcript link @00:14:23." The next day, in a Slack #infra channel, someone asks "Are we really moving to GCP?" and another replies "Yes, confirmed – we decided yesterday in the sync." The agent picks this up as a confirmation linked to the same decision (perhaps adding "Confirmed in Slack #infra, Aug 13 by @devops-lead"). The team also had an existing RFC document ("Infra Migration RFC v3") which was updated to final after the decision, and a Jira ticket "Migrate to GCP" was created. The agent links these artifacts as well, so the decision entry now has pointers to the Notion doc and Jira issue. This enriched Decision Ledger entry now connects people, time, rationale, and evidence about the "infra to GCP" decision. Later, if someone queries the AI "Which meeting decided to migrate infra to GCP?", the Deep Research Agent can confidently answer with specifics: "It was decided in the Aug 12 Engineering Sync, by Luis Ortega, rationale was cost and performance. Here's a link to the exact transcript moment, and the Slack confirmation on Aug 13. The related RFC document is 'Infra Migration RFC v3'." – complete with citations and links.

By structuring signals before trying to answer user questions, SCA reduces the reliance on the generative model to "remember" or infer these details. The knowledge of decisions and tasks is explicitly stored and can be retrieved with high precision (often via simple queries or filters on the ledger, rather than semantic search). This "structure before generation" principle ensures that, for pinpoint questions (e.g. who decided X?), the system can respond based on a database query to the ledger, which is far more precise than prompting an LLM to scan raw transcripts. Early anecdotal evidence suggests this leads to near-instant recall of decisions (we measure "decision recall time" in Section 5). Indeed, turning raw meetings into a queryable decision log has been shown to let AI assistants answer questions like "What did we commit to last Monday?" by referencing past meeting records, which is aligned with SCA's goals.

3.2 Context (Cold Stream) → Knowledge Layer → Evidence Store

The Context pipeline handles the organization's institutional memory — documents and knowledge that already exist, which we treat as relatively static (low velocity) but high value. This includes: internal wikis, documented policies, product specs, design docs, engineering runbooks, OKR spreadsheets, past strategy memos, etc. These artifacts are typically stored in systems like Notion, Confluence, Google Drive, SharePoint, GitHub (for code or markdown docs), and so on. The challenge here is not real-time parsing (as with signals) but rather retrieval and validation: given a query or a piece of information from the Decision Ledger, find the most relevant supporting content from this vast repository.

SCA's Knowledge Layer uses a combination of vector-based semantic search and symbolic filtering to retrieve candidate evidence fragments. We break documents into semantically coherent chunks (e.g. paragraphs or sections) and index them with embeddings. When searching, we incorporate not just the query text, but also metadata cues. For example, if the query is related to a known Decision (from the ledger) which has tags or links (like it's a security decision, or it involves Project Atlas), the retriever can constrain or boost results from the relevant project folder or with certain keywords. This ensures we don't retrieve irrelevant context.

Critically, the Knowledge Layer scoring algorithm emphasizes three factors:

Recency: Recent documents or recent edits are scored higher, under the assumption that for many questions (especially operational or tactical ones), the latest information is more pertinent than stale data. For instance, if a user asks "What is the current pricing for our Pro plan?", a spec updated last week is more relevant than a deck from 2022. This doesn't mean old data is dropped – just down-weighted unless specifically relevant (sometimes older archives matter for historical questions).
Authority: We incorporate an authority weight based on the source or author. If the query is about a security policy, a page written by the Security team or the CISO is considered more reliable than an unofficial note. Authority can be defined by directory (e.g., official policies folder), by role (executives' docs might carry more weight on strategy questions), or even by crowd signals (documents that many people have labeled as canonical). This approach aligns with how humans trust information – provenance matters. It's akin to ranking official documentation higher in search results, a practice in enterprise search relevance tuning.
Popularity/Usage: If a document or snippet has been referenced or viewed frequently (especially in contexts related to the query), it's likely useful. For example, if a design decision was discussed in Slack and a particular spec link was shared in that discussion, that spec is very likely relevant to queries about that decision. The Knowledge Layer can use such signals (e.g., the Decision Ledger's link graph itself) to boost content that's "connected" to the current context in the organization's discourse. This dynamic is unique to SCA: by having the Signals and Context pipelines inform each other (the Decision Ledger contains links to documents; the document retrieval can consider those links), we achieve a contextual retrieval that standard vector search would miss.

All retrieved context fragments are stored or represented as Evidence objects, which include the fragment text and the source identifier (document name, URL or storage path, author, date). These Evidence objects are what the final agent will cite. They are kept small (e.g., a paragraph each) to ensure the LLM can absorb multiple pieces as needed and to allow fine-grained citation (pointing to the exact section of a doc, not the whole doc). The Knowledge Layer continuously refreshes indexes (for example, if someone creates a new Google Doc or updates a Confluence page, those changes are ingested so that recency ranking remains effective). We describe storage details in Section 4, but note here that we use a hybrid of keyword and vector indices: keywords for precise filters (like "limit to files in the 'Q1 OKR' folder"), vectors for semantic similarity.

By maintaining this curated, trusted evidence base, SCA ensures that when the generative model composes an answer, it has access to not just the raw organizational knowledge, but the validated and relevant subset of it. This mitigates the "garbage-in" problem of retrieval: if the wrong or low-quality context is retrieved, even a good LLM will produce a faulty answer. Our approach, analogous to multi-step retrieval or LLM-verified retrieval, could allow iterative refinement: the Deep Research Agent (next section) might detect if the evidence is insufficient and trigger a secondary query. For now, the Knowledge Layer's initial ranking tries to get it right on the first pass by using the structured context we have (timestamps, links, etc., from the ledger).

3.3 Fusion (Validation) → Deep Research Agent → Answer Generation

The final stage of SCA is the fusion of signals and context in order to fulfill a user's query or task. The Deep Research Agent is a orchestrator (which can be implemented as an LLM chain or an agentic loop) that spans both the Decision Ledger and the Knowledge Layer. When a question comes in – whether it's a natural language query from an executive or an automated trigger (like a daily briefing request) – the Deep Research Agent orchestrates a two-part strategy:

Structured Query to Decision Ledger: It first checks if the query pertains to recent events or decisions. For example, if the query asks, "Which meeting decided XYZ?" or "What's the status of Project ABC this week?", these clearly relate to the Signals domain (decisions, actions, or accomplishments). The agent will query the structured Decision Ledger (using SQL-like queries or graph traversals) to fetch the relevant entries. This yields pinpoint data: e.g., a specific Decision entry with all its fields.
Contextual Retrieval to Knowledge Base: In parallel or subsequent to the above, the agent formulates a retrieval query for the Knowledge Layer to get supporting evidence. If the question is narrow (like the meeting decision example), the supporting evidence might be the transcript snippet of that decision (which is effectively stored as part of the Decision Ledger, or could be fetched from a transcript store) and any document that was referenced in making that decision (e.g. an RFC). If the question is broader (like "Draft a Q1 plan for Enterprise Deals"), the agent will break it down into aspects and retrieve multiple pieces: historical objectives, recent sales commits, pricing decisions, etc., from the ledger and docs. Here the fusion aspect is crucial: the agent uses both the structured results and the unstructured evidence in composing its final answer.

The Validation in this stage refers to ensuring consistency between the signals and context results. If, for instance, the Decision Ledger says Alice decided something on date D, and there's a document claiming something else, the agent flags or resolves the discrepancy (possibly by presenting both or asking for clarification). A core tenet is: "Events + Evidence or it doesn't ship." The answer should be grounded in an event from the ledger and supported by evidence from a document or artifact. If the pieces don't align, the agent should indicate uncertainty or ask for human input (instead of fabricating an answer). This design prevents the model from hallucinating a confident answer that isn't backed by recorded decisions or authoritative context.

The output of the Deep Research Agent is an Answer Package containing: (a) a natural language answer or narrative that addresses the query, (b) citations in-line linking to both the Decision Ledger entries and the context documents used, and (c) if applicable, a list of recommended action items or next steps (with owners and due dates) relevant to the query. The inclusion of action items makes the answer immediately operational. For example, a question "What's blocking the Atlas migration?" might yield an answer: "Three blockers were identified: (1) SSO/SAML test suite failing — owner: Maya, due Friday; (2) IAM policy review pending Security — owner: Aria; (3) Terraform state drift — owner: DevOps. See linked Jira issues for details." Such an answer not only informs the executive of status but also lays out responsibility and encourages follow-up on each item (each blocker is linked perhaps to the tracking ticket or Slack thread where it was discussed). This moves the business forward rather than a generic "Atlas migration has some delays" summary.

Two broad classes of AI assistance enabled by SCA's Deep Research Agent:

Pinpoint Recall: The user asks a specific factual question about a decision, event, or commitment. SCA can retrieve the exact record. E.g., "Which meeting decided to offer the Pro plan at $40/month and who approved it?" The answer might be: "Decided in Pricing Review on Sep 10 by CFO (Jane Doe); rationale was to hit mid-market demand. Approved by CEO next day via email. See transcript at 00:32:10 and email thread link." This level of detail (with deep links to the evidence) far exceeds what a normal QA system with a vector search could deliver, because the latter might find a pricing doc but not who/when the decision was made. SCA can do this because the info was structured at the time of the meeting and indexed with the context (the pricing doc).
Synthesis and Strategy: The user requests a more comprehensive analysis or draft that requires pulling together many pieces. E.g., "Draft our Q1 plan for Enterprise Deals". This is not a factual question with a single answer, but a task requiring the AI to gather goals, initiatives, owner assignments, and past learnings. The Deep Research Agent can query the Decision Ledger for all decisions in the last quarter related to Enterprise strategy, look at sales commit documents in the Knowledge Base, compile outcomes from last quarter (accomplishments ledger), and then generate a structured plan: including cited objectives ("as per OKR document, our goal is X"), initiatives ("based on sales feedback we will focus on Y"), owners (from decision assignments), and milestones. Crucially, because SCA's agent has access to the network of decisions and their outcomes, it can avoid fabricating implausible timelines or owners. It will know who leads the Enterprise sales initiative (because that person was listed as an owner in an action item), and it will know that a certain strategy was decided 3 months ago (so likely carries into Q1 plan). Thus the draft it produces is grounded in the company's actual context, not a generic template.

Through these examples, we see that SCA's fusion of structured and unstructured data yields answers that are not only correct, but come with "receipts" – every claim is tied to a meeting record or a document snippet that the user can inspect. This level of transparency is indispensable for executive adoption: a busy decision-maker might skim the answer then click the deep link to verify a key point in the original source (transcript or file). If any detail is off, they can flag it, which brings us to a final piece: human-in-the-loop validation. In our implementation, the answers (especially the complex synthesized ones or any that enact changes) can be routed to an Agent Inbox for a person to review and approve. This ensures that automation doesn't run unchecked; humans provide oversight and can correct any mistakes in the Decision Ledger or Knowledge Layer as well, improving the system over time.

4. Design Principles and Implementation

Several design principles are baked into SCA to ensure the system is robust, adaptable, and enterprise-ready:

Structure Before Generation: As emphasized, SCA performs structuring of events prior to final answer generation. This was a deliberate choice to curb hallucination and enforce consistency. By having an intermediate structured representation of reality (the Decision Ledger), the generative model is not free to make up facts about who decided what or when – it must rely on the ledger, and if the ledger has no record, the system knows the information might truly be absent. This principle draws from the success of knowledge graph-enhanced QA and the importance of symbolic memory for LLMs.
Events + Evidence Binding: Any answer produced must be grounded in both an event (signal) and evidence (context). If a purported decision has no corresponding document or artifact, or vice versa, the agent highlights the gap rather than filling it with guesswork. Only when both streams agree (the decision exists and supporting info exists) does the answer get high confidence. This cross-validation greatly reduces the chance of false or unsupported claims.
Receipts by Default: Every response comes with citations and often deep links into the source. This is not optional or only on-demand, but the default behavior. In high-stakes use (like making a case in a board meeting or deciding a strategy), having these receipts builds trust and also facilitates later review. It effectively creates an audit trail of AI-generated insights, since anyone can follow the citations to verify. This philosophy aligns with calls for verifiable generation in recent research.
Minimal Surfaces, Maximum Integration: For user adoption, SCA is designed to fit into existing workflows with minimal friction. We implement a "Command Palette" UI (inspired by developer toolkits) that can be brought up with a keyboard shortcut on desktop, giving instant access to ask the AI or run a command, no matter what application is in use. The meeting intelligence is delivered via shareable pages so that meeting outcomes can be reviewed without needing all participants to use the AI tool. There is also an Agent Inbox for approvals and notifications, consolidating where humans interact with the AI's outputs. By keeping the interface surface minimal and context-aware, we aim to encourage frequent use (forming a habit loop) while ensuring everything the AI does is traceable and shareable. Essentially, rather than a dozen separate AI helpers, SCA provides one integrated experience across daily activities.
Governance and Security Ready: From the ground up, we built SCA to meet enterprise IT and compliance requirements. That means supporting Single Sign-On (SSO) and SAML for user authentication, respecting role-based access controls (RBAC) so that the AI only accesses data a given user is permitted to see, and providing audit logs of AI actions. Data is scoped carefully – e.g., a user's private emails don't get exposed to others, and cross-org data mixing is prevented. This is crucial because in large organizations, data governance is often the barrier to adopting AI tools. By design, SCA can be deployed within a company's secure environment (allowing Bring-Your-Own-Key for models and data). All these measures ensure that however powerful the AI, it operates within the bounds set by human owners and regulators.

Architecture Implementation Notes

We have implemented SCA's prototype as a combination of cloud services and a desktop client:

Data Model

In our system, the following logical entities are defined:

Event: A raw event from a signal source (e.g., one utterance in a meeting, or a message in Slack, or a new email). Events carry metadata (timestamp, source channel, speaker/sender).
Decision: A normalized decision record derived from one or more events. It has fields (decision text, owner/decider, time, rationale, status) and links to the source event(s) and any follow-up events (confirmations, implementations).
ActionItem: A task record with fields (task description, owner, due date, status) and link to the originating event or decision. Status can be open/closed; if closed, it may link to an Accomplishment event.
Accomplishment: A record of a completed deliverable or milestone, with fields (description, timestamp, link to artifact if any, e.g., a URL to the deployed feature or merged PR).
Evidence: A fragment of a document or knowledge base content, with text and a reference (doc id, section, author, date). We also store an "authority score" or source type with it.
Link: A relationship between two of the above (e.g., Decision decided-by Event, Decision confirmed-by Event, Decision documented-in Evidence, ActionItem implemented-by Accomplishment, etc.). These links form the graph that is our Decision Ledger in a broader sense.

These entities are stored in a document-oriented database that allows flexible querying (we used a combination of a graph database for links and a document store for the content). We also maintain vector indices for relevant fields (like embedding of decision text, of evidence text) to assist retrieval.

Pipelines

The Signals pipeline is implemented with a streaming ETL (extract-transform-load) process. We integrate with meeting platforms via APIs or webhooks to get live transcripts, which the Insights Agent (implemented as a service calling an LLM with custom prompts) processes at the end of each meeting or in chunks during the meeting. For chat and emails, we use event listeners (e.g., Slack API, Gmail API) to get new messages, and batch them for analysis periodically. The transform step applies the LLM to identify any new decisions or tasks. We fine-tuned a model for this extraction, and also use heuristics (e.g., keywords like "decided", "FYI" for accomplishments, action verbs for tasks). The output is loaded into the Decision Ledger store.

The Context pipeline involves a combination of a crawler (for file repositories) and a retriever. We use an open-source vector database (like FAISS or Milvus) to index content. A scheduling job updates the index for new or modified documents (with a change feed or polling). Retrieval is implemented as a service that accepts a complex query (including optional filters for recency or source) and returns top-N evidence chunks with scores.

The Fusion/Answering uses a multi-step agent implemented in a framework (we experimented with LangChain-style agent loops). It first queries the ledger (via direct DB query) if the question seems to match certain patterns (we built a classifier to route queries: e.g., contains "which meeting" → likely ledger; contains "plan" or "summary" → needs synthesis). It then calls the retriever for additional context. Finally, it constructs a prompt for a large language model that includes: a) a preamble with instructions to cite sources and include actions if relevant, b) the relevant Decision Ledger entries (formatted as structured text or a summary thereof), c) the evidence snippets (with source tags), and d) the user's question. We use a "chain-of-thought" style prompt where the agent is encouraged to first list relevant facts (with their sources) then formulate the answer. This resembles few-shot prompting for QA with citations. The LLM (GPT-4 in our prototype) generates an answer which we post-process to ensure citation formats are correct and that every claim has at least one citation. If any confidence issues arise (like the LLM gave an answer but our system detected no citation for a sentence), we flag it for human review. The answer is then delivered to the user via the desktop app's UI. If it's an action-type command (like an instruction to "email this to the team"), we route it to the Agent Inbox for the user to approve the actual email send.

Storage & Search

The Decision Ledger (events, decisions, etc.) is stored in a hybrid manner: we use a document store (NoSQL) for flexible querying of records by various fields, and we maintain both keyword indices (for fast exact match, e.g., find Decision where title contains "GCP migration") and vector indices (for semantic search of similar decisions). This hybrid search proved useful – for example, if a user asks something not directly recorded, like "Have we considered migrating to Azure?" even if no decision explicitly says that, a semantic search might pull the "migrate to GCP" decision as related (since Azure is similar context) and then the system can say "We didn't decide Azure; we chose GCP because…". The context documents are indexed in a vector DB as mentioned, and we also use a lightweight SQL for metadata (to do filtering by date or author).

To keep derived data fresh (the Decision Ledger is derived from raw events), we implemented a change data capture stream. If a message is edited or if a meeting transcript is corrected, we propagate updates to the ledger (with versioning for decisions if needed). This way the ledger isn't static; it evolves, and we can even trace how a decision record changed (audit trail).

Quality Control Loop

We have a continuous evaluation harness: sample questions are periodically run against the system (some handcrafted, some from real user queries with permission), and the results are checked for correctness. We measure recall and precision on decision queries (does it find the right meeting and decision?), citation accuracy (does each citation actually support the sentence it's attached to?), and action item validity (are suggested tasks actually relevant and not duplicates?). We also measure latency and cost. Based on these metrics, we adjust the system: e.g., if a certain type of question is often answered incorrectly, we might change the prompt or use a larger model for that case. The multi-model routing is an interesting aspect: because SCA is model-agnostic, we can choose cheaper or faster models for straightforward tasks. For instance, extracting action items might be done with a smaller fine-tuned model, whereas a board-level strategy answer might use GPT-4 for quality. We route by quality/cost/latency considerations, an approach that keeps the system efficient without sacrificing on important queries.

SCA vs. Vector Database + RAG Baseline

To concretely highlight the differences, the following table presents a comparison across key capabilities:

Capability	Vector/RAG Baseline	SCA (Our Approach)
Temporal understanding (handles time & change)	❌ Limited. Static snapshots; needs manual re-indexing for new data. Cannot track when info was updated.	✅ Built-in. Temporal data is first-class: decisions have timestamps, recent info prioritized. Can answer time-scoped queries (e.g. last week's changes) directly.
Ownership & accountability (who decided what)	❌ Not captured. Documents retrieved may note a decision, but linking to who/when is ad-hoc or lost.	✅ Explicit. Every decision tied to an owner and time in the Decision Ledger; deep links to original discussion for accountability.
Trusted sources (provenance, source quality)	⚠️ Partial. Can provide citations, but no guarantee on source authority (may cite any relevant text, good or not).	✅ Enforced. Citations required for all facts; retrieval ranks by source credibility and recency. Answers come with a bibliography of vetted sources.
Action orientation (follow-up actions, tool integration)	⚠️ Sometimes. Some RAG systems integrate with calendars or email, but typically answer content only.	✅ Full. Identifies and outputs action items with owners. Integrates with email/Slack to draft messages or tickets (with human approval) for follow-through.
Model flexibility (vendor lock-in risk)	⚠️ High. Often built on a specific LLM API or ecosystem, making adaptation to new models hard.	✅ Agnostic. Modular design allows plugging in different LLMs or tools per task. BYOK (Bring Your Own Key) support lets organizations use their preferred models (OpenAI, Anthropic, etc.) without system redesign.

Table 1: Qualitative comparison of a typical vector database + RAG pipeline vs. the proposed Signal–Context Architecture. SCA addresses many gaps by design, providing temporal tracking, structured ownership, stronger provenance, action integration, and model-agnostic extensibility.

Durability and Evolution

SCA is not tied to any single model or even a single technology stack. As better transcription models or dialogue understanding models emerge, the Insights Agent can be upgraded. If a new, more powerful LLM appears, it can be integrated into the Deep Research Agent's toolkit. This future-proofing is intentional: the AI field moves fast, and enterprise systems must be able to incorporate advances without a complete overhaul. Similarly, SCA's separation of concerns (signals vs. context) means it can survive changes in data sources. For instance, if tomorrow Slack is replaced by Microsoft Teams, only the signal ingestion connector changes; the core idea of a Decision Ledger remains the same. Over time, the Decision Ledger becomes a compounding asset – it accumulates corporate knowledge that even new employees or new AI models can leverage. What's notable is that the ledger's value appreciates with usage: every week more decisions and outcomes are logged, making the AI's answers richer and more contextually grounded. In contrast, a naive AI assistant that doesn't log or learn from interactions is static or even forgetting as context windows slide.

Finally, by keeping a human-in-the-loop for important outputs (the Agent Inbox for approvals, the ability for users to give feedback on answers), SCA ensures that it augments rather than replaces human decision-making. All AI-proposed actions are draft-first, meaning the AI might draft an email or task, but a human sends or assigns it. This prevents errors or overreach from propagating without oversight, a critical safety feature for autonomous agents in the workplace. It's in line with recommendations for responsible AI deployment, where human oversight and auditability are paramount.

5. Evaluation and Preliminary Results

Evaluating a system like SCA requires measuring both its technical performance on information queries and its impact on human workflows. We outline our evaluation approach and any early results:

5.1 Technical Evaluation

We measure classic information retrieval and extraction metrics on tasks derived from real use cases:

Decision Recall and Precision: We created a benchmark set of 50 questions asking for specific decisions (e.g. "When and who decided X?") where the ground truth is known from meeting notes. SCA's Decision Ledger lookup was able to answer a large majority directly. We measure the recall time (how fast the correct answer with link is produced) – in our tests, it was typically under 2 seconds for queries hitting the ledger, whereas a vector search baseline took several seconds to retrieve and often required the LLM to read through irrelevant text. We also measure precision/recall of the Insights Agent in correctly extracting decisions in the first place by manually annotating transcripts. Initial results show high precision (few false decisions logged) but some misses (recall ~0.8) for very implicit decisions; we are improving prompt fine-tuning to catch those.
Action Item Extraction Coverage: To evaluate how well the system captures tasks, we compared the Action Items logged by SCA for a set of meetings to a human-generated list of action items for those meetings. The precision of logged items was around 90% (almost all AI-logged tasks were real tasks), and recall was around 75% (the AI missed some tasks that were phrased vaguely). This is on par with state-of-the-art dialogue act extraction results reported in literature. The missing tasks often had ambiguous language like "we should probably…", which we plan to handle by more sophisticated context understanding.
Answer Accuracy and Support: For complex questions (synthesis queries), we performed a qualitative evaluation. We looked at 10 "board-level" questions (e.g. "Summarize Q3 outcomes and what we learned") and had domain experts rate the answers. They checked if all claims were supported by citations (supportiveness) and if any important point was missing (completeness). In all cases, the answers contained only claims that could be traced to a source (by design, since the system includes the citation), but occasionally the wording could be misleading (e.g. conflating two related decisions). Experts rated 8/10 answers as useful and accurate, and 2 as needing minor corrections. We consider this promising, though a larger formal user study is ongoing.

We also tested SCA versus a baseline RAG system on a set of temporal and ownership-related queries. For example, query: "What did the marketing team decide in July about the launch plan?" The RAG baseline (with vector search over all transcripts and docs) often returned a generic launch plan document (which didn't have the date context) or a hallucinated summary, whereas SCA was able to list the specific decisions from July meetings with dates and names. This showcases the value of temporal partitioning and the ledger.

5.2 User Adoption and Efficacy

Beyond accuracy, a key measure is whether SCA actually helps users (execs, managers, teams) make decisions faster or with more confidence. While a full longitudinal study is ongoing, we track several proxy metrics in our pilot deployments:

Decision Recall Usage: How often do users use the system to look up past decisions? We log instances of queries that hit the Decision Ledger. In a pilot with 10 users over 3 weeks, there were ~5 lookups per user per week on average, with queries like "when did we agree on the new pricing?" being common. The click-through rate (CTR) on the provided deep links (e.g., user clicking the transcript link the AI gave) was 60%, indicating users do utilize the citations to verify or get more detail.
Action Completion Rate: For action items the AI surfaces (e.g., in a weekly "what's blocking" report), we monitor whether those items get completed by the owners by the due date (from task system data). The idea is that if AI highlighting blockers leads to them being resolved, that's a positive outcome. Early anecdotal evidence: one team lead credited the AI's reminder of a forgotten task for getting it done before a deadline. We plan to quantify this more rigorously (perhaps tasks completed vs. not when mentioned by AI vs. not mentioned).
Narrative Reuse: We examine if the narratives or draft plans the AI produces get used in real documents. For instance, if the AI drafts a Q1 plan, does the user end up copy-pasting large portions of it into the official plan doc? In one case, about 70% of the text from an AI-generated strategy draft (with citations) was incorporated into the final version, with some edits. This suggests that SCA's content can serve as a strong starting point for executive communications. We also see citations from the AI answer being kept in the final doc, which is interesting as it means even final human-edited docs now carry the references the AI provided (improving traceability of those docs too).
Feedback and Learning: We gather user feedback through the Agent Inbox interface, where users can thumbs-up or down an answer or send corrections. This feedback loop is invaluable to identify errors (e.g., "AI misattributed this decision to Alice, but it was Bob"). We found that explicit errors are rare, but we did get feedback like "the rationale provided for decision X is not the main reason we did it" – which indicates the AI chose one justification from the transcript that the user felt wasn't the key one. We are exploring letting users edit the rationale field in the ledger in such cases, which will propagate to better answers next time.

Overall, these metrics and observations suggest that SCA can significantly improve the efficiency of information retrieval in an organizational setting, while maintaining the trust through verification. Users particularly appreciated the "one-click to source" aspect, aligning with research that shows users value the ability to drill down into sources for AI-provided answers. Performance-wise, the system is fast enough for interactive use (most answers returned in 5-8 seconds, which includes multiple retrievals and LLM calls; pure lookup queries return in under 2 seconds). We note that the quality of the output is heavily dependent on the quality of the ingested data: if meetings aren't transcribed correctly (ASR errors) or if people have side conversations off-record, the ledger can have gaps. Thus, our evaluation also looks at how robust the system is to imperfect input. Techniques like prompting the model to ignore garbled text, or fall back to asking a user if something was unclear, are being tested.

6. Limitations and Future Work

While SCA shows promise, there are limitations and open challenges:

Recommendation and Proactivity: Thus far, SCA focuses on answering questions and generating content on demand. It does not yet proactively recommend decisions or flag issues unless asked. A logical next step is to build a Recommendation Agent on top of the Decision Ledger – for example, suggesting risk mitigations if many action items are overdue, or recommending a decision based on similar past decisions (collaborative filtering of decisions). We plan to explore recommendation features, but carefully: any recommendations will leverage the structured base (so they can cite why the recommendation is made, based on past patterns) and will likely go through human approval.
Multi-step Workflow Automation: SCA currently can execute single actions in a draft manner (like drafting an email or creating a ticket from an action item), especially via its Command Palette or Agent Inbox. However, more complex multi-step procedures (e.g., "for all decisions made last week, create a Confluence page summary and email it to the team") might require chaining several commands or integrating with workflow automation tools. We are investigating a visual or natural language workflow builder that would allow users to script multi-step routines for the AI agent (some early prototypes use a prompt-based "if-this-then-that" style configuration). Ensuring the agent can follow these reliably and safely is future work.
Generality vs. Specificity: SCA is tailored to executive intelligence in a single organization. One might ask: can it generalize patterns across organizations, or is each deployment learning only its environment? Cross-organization learning (like fine-tuning the Insights Agent on many companies' data) could improve robustness (e.g., learning general patterns of how decisions are stated). However, data privacy concerns mean we cannot simply mix data from different orgs. A possible future direction is federated learning or privacy-preserving meta-learning, where the model learns common structures of meetings without exposing any raw data externally. This would require careful design to meet enterprise privacy bars, so it's on the roadmap once we have sufficient deployments and a way to abstract patterns.
Accuracy of Transcription and Extraction: SCA inherits any errors from upstream processes. If speech-to-text transcription has a high error rate (say due to heavy accents or technical jargon), the Insights Agent might log incorrect decisions or miss them. Our current approach relies on having relatively good transcripts (we use top-tier ASR and allow custom vocabulary). In noisy environments or for non-English meetings, performance may degrade. Future work could involve confidence scoring on transcript segments and flagging low-confidence areas for manual correction (perhaps by a meeting assistant who double-checks key points). Additionally, the NLP extraction of decisions/actions is not perfect; we aim to continuously fine-tune it with new data and possibly incorporate user validation (e.g., a meeting facilitator could quickly verify the AI-captured action items at the end of a meeting, which would greatly improve quality).
Scalability and Latency: As the Decision Ledger grows (potentially tens of thousands of entries over years) and the document corpus is huge, we need to ensure queries remain fast. We already use indices to good effect; further, we might need to archive or summarize older data (though keeping it is valuable for historical questions). The architecture supports sharding by time or project to scale out. Also, using larger models for generation can be slow; we are exploring distilling some capabilities into smaller models to use for quick responses, only falling back to big LLMs for very complex queries. This multi-tier agent approach is an area of active development.

Despite these limitations, we believe the core idea of SCA is durable. It aligns with fundamental needs in organizational decision-making: having a reliable record of what was decided, ensuring everyone has context at their fingertips, and bridging the gap between knowledge and action. The architecture's model-agnostic nature means it should be able to incorporate future advances in AI (e.g., if new multimodal models can analyze video recordings for decisions, or if improved reasoning models can validate plans even better). In implementing SCA, we aimed to provide a blueprint for enterprise AI systems that treat structure and provenance as first-class citizens, rather than an afterthought.

7. Conclusion

In this work, we presented Signal–Context Architecture (SCA), a novel AI architecture that turns the chaos of daily organizational communications into executive clarity. SCA achieves this by splitting intelligence gathering into two streams: Signals, capturing the live "what just happened" moments and structuring them into a Decision Ledger; and Context, distilling the organization's knowledge into a cited evidence store. A Deep Research Agent then unifies these layers to answer questions or draft narratives with unprecedented specificity and trustworthiness – providing not just answers, but also the origin of those answers (the event) and the justification (the evidence). This design directly addresses the shortcomings of naive LLM applications in the enterprise: it handles temporality, tracks ownership, ensures actions are not lost, and outputs verifiable information.

In one sentence, SCA turns conversation chaos into executive-grade answers by structuring live signals, validating them against trusted context, and delivering responses with full "receipts." It is model-agnostic and future-proof, enabling organizations to plug in the best AI models of today or tomorrow. We demonstrated the SCA concept, architecture, and an initial implementation, and showed through examples and early evaluation how it can answer both pinpoint and broad strategic queries that are difficult for existing methods. As enterprises increasingly seek to leverage AI for decision support, we hope SCA provides a path forward that is practical, auditable, and effective.

Moving ahead, we plan to refine SCA with more automation (recommendations, workflows) while maintaining its core principles of structure and validation. We will also report more comprehensive evaluation results as we gather them. We invite the community to explore similar dual-stream designs in other domains where context and real-time data must be combined (such as intelligence analysis, legal case preparation, or healthcare management). By sharing this work, we aim to spark further research into AI architectures that respect the complexity of real-world data and provide human-centric, trustworthy assistance in organizational settings.

Signal–Context Architecture (SCA): A Dual-Stream AI for Executive-Grade Answers from Noisy Data