Code-switched NLP, multi-tier memory retrieval, and on-demand social graph extraction built for the 500M people who think in more than one language at once
Hard engineering problems at the intersection of code-switched NLP, episodic memory, and social graph inference
A single tokenization pipeline handles intra-sentence language switches across three scripts — English, Hindi (Devanagari), and Punjabi (Gurmukhi) — plus two code-switched modes: Hinglish and PunjabiEnglish. Script detection routes each segment to the appropriate sub-model with no preprocessing step. Emotion labels are unified across all five surface forms.
Three retrieval tiers assembled on-demand per request: a hot cache (last 7 days, O(1) key lookup), a warm tier (emotional episode extraction with temporal triggers on phrases like "last week" or "kal"), and a cold tier (dense vector search over older episodes using cosine similarity against a 768-dim embedding index). Total injection overhead: zero added latency on cache hit.
After each conversation turn, a background extraction pipeline identifies named entities, infers relationship types (colleague, partner, family), and writes structured records to a partitioned vector collection — one partition per community context. This builds a persistent social knowledge graph from unstructured natural conversation with no explicit user input.
All conversation messages are AES-256 encrypted at the application layer before write — keys are derived per-user, never stored alongside ciphertext. The storage layer sees only opaque blobs. Memory retrieval, embedding, and summarization all operate on decrypted payloads in-process with no plaintext persistence. Encryption adds no user-facing latency.
Unsolved: reliable emotion classification on intra-sentence switches where the sentiment word and the subject are in different scripts. Coreference resolution across sessions without a persistent entity store. Measuring response quality for emotional support without ground-truth labels. These are the real frontiers.
Four sequential stages. Every message, every time.
Script analysis, Hinglish marker matching, language routing — single tokenization pass before the LLM call
Hot cache lookup → warm episode extraction → cold vector retrieval, resolved in priority order and token-budgeted
Language pack, memory context, and persona signals compose a structured prompt — culturally-grounded, zero hallucinated facts
Post-response: entity extraction, session stats, memory indexing, graph writes — all non-blocking, zero impact on response latency
AES-256 at the application layer before any DB write. The storage tier is treated as untrusted. Decryption happens in-process, never cached to disk.
Hot-tier memory is a keyed cache lookup. Cold-tier vector search only triggers when temporal signals appear in the message. No semantic search on every request.
Script and language identity are detected per-message, not per-session. Response style, vocabulary, and slang mirroring adapt within a single conversation turn.
Entity extraction, graph writes, session stats, and embedding indexing are all fire-and-forget after the stream completes. Response latency is bounded by generation, not bookkeeping.
Concrete technical objectives, not aspirational metrics
Resolve entity mentions across sessions without a full entity store — link "he" in session 12 to the named person from session 3
Accurate emotion label when sentiment word and subject are in different scripts within the same clause
Learn per-user drama intensity and slang mirroring coefficients from conversation history rather than static defaults
Proxy metrics for emotional support quality — return rate, session depth, sentiment shift — without requiring human annotation
Interested in our research or want to collaborate?
Experience the InnovationLearn more about our mission or explore how it works