Capabilities

The full map of what POLYROB is. Everything below is in the codebase today. The self-evolving and goal-seeking loops are on by default in personal-agent mode (POLYROB_LOCAL=true) and off on a shared server until you opt in; some integrations are opt-in behind a flag or an install extra.

Autonomy & self-evolution

Background loops that turn experience into durable capability. On in personal-agent mode; every self-modification is quarantined and reviewed before it takes effect.

Durable goal board

A cross-session backlog (SQLite, atomic claims, circuit breakers) the agent works through on its own — and that survives process restarts, unlike a normal prompt→response loop.

Scheduled runs (cron)

Natural-language or 5-field schedules that run the agent unattended and deliver results out-of-band to a chat surface.

Self-wake

Re-enters idle sessions to continue or follow up, with depth + backoff guards so it never loops.

Writes its own skills

A background reviewer distills reusable procedures from what just happened and saves them — quarantined in .pending, scanned, and reviewed before they can act.

Curates skills over time

Unused authored skills are retired automatically and revived the moment they're useful again. System and user skills are never touched.

Evolving identity

A two-layer self: SOUL (operator-authored, frozen) and SELF (agent-writable, scanned + quarantined) — the agent refines how it works with you as it learns.

H-MEM — hierarchical memory

Not a flat log. POLYROB implements H-MEM, a hierarchical memory architecture (arXiv:2507.22925), so it keeps useful cross-session context across long tasks instead of drowning in history.

Organized in phases

Findings are grouped into semantic phases (discovery, collection, …) — a session summary, phase memories, and a rolling window of recent steps — that can be revisited without fragmenting.

Forgets by importance

Memories are pruned by importance — a weighted blend of recency, relevance and frequency — not just age, so recall stays sharp on long runs.

Reflective consolidation

On phase completion an auxiliary LLM synthesizes a tight summary that preserves concrete facts, names and numbers — fail-open to a plain concat.

Cross-session recall, zero-dep by default

Default recall is SQLite keyword search (FTS5) — no external service, tenant-scoped. Opt into local hybrid keyword+vector recall (MEMORY_BACKEND=local_vector) for semantic matching; it fails open to keyword if the embeddings model isn't present.

Knowledge base & @-context

Ingest folders/files into a per-tenant KB (auto-recalled alongside memory), and drop @file / @folder / @url / @diff into a message to expand it inline — with secret-scanning on the way in.

Adaptive context & thinking

Context scales to the model's window with prompt caching for stable prefixes; long sessions are compacted and synthesized; optional extended thinking turns up reasoning budget per model.

Tools & integrations

What the agent can actually reach. Marquee integrations are named; transport libraries stay out of your way.

Web & browser

A fast stateless reader (web_fetch: URL → markdown, no Chromium) for the common case, and full browser automation — log in, navigate, fill forms, extract, screenshot — when a page needs it.

AnySite

Structured data from 40+ real-world sources — LinkedIn, X, Reddit, GitHub, SEC filings, news, jobs, reviews — plus a universal scraper for any URL, all through one tool.

Perplexity research

Real-time web search and synthesis for fact-finding and research, with citations.

MCP — bring your own tools

A full Model Context Protocol client: connect any MCP server (filesystem, search, GitHub, or your own) and its tools and resources appear to the agent dynamically, including live resource subscriptions.

Coding & execution

Edit a codebase (str_replace, grep, run tests) and run code in a timeout-and-output-capped subprocess whose environment never inherits your API keys.

Files, data & documents

Read/write/transform files and structured data (text / JSON / CSV / markdown / PDF / docx) in a confined workspace.

Outreach

Post to Twitter/X (threads, replies, DMs, media) and send email — outbound tools the agent calls to reach the world. Voice transcription turns audio into text locally.

Parallel sub-agents

Delegate a focused goal — or fan out 2–5 subtasks in parallel — to least-privilege child agents, synchronously or in the background.

One agent, every channel

POLYROB reaches you across chat surfaces through a single inbound/outbound contract — the same agent powers every channel, no per-platform rewrite.

Pluggable surfaces

Telegram, WhatsApp, email and the terminal are interchangeable adapters on one Surface contract. Adding a channel means implementing the contract, not rebuilding the agent.

Run them all at once

polyrob gateway launches every enabled surface in one process, sharing one agent, one router, and one set of session bindings — so cross-channel routing just works.

Streaming & delivery

Surfaces declare their own capabilities (streaming, message edits, service windows); a durable outbound bus with a circuit breaker handles delivery and restart recovery.

Security — hard to hijack

Autonomy is only safe if untrusted input can't seize the wheel. POLYROB treats the outside world as hostile by construction.

Untrusted-input wrapping

Web pages, emails, tweets and tool output are structurally framed as data, never instructions, so injected text can't pose as a command. On by default.

Three-tier access

When you expose the agent to other people, every sender resolves to OWNER (steers), CORRESPONDENT (reply is data, never a command), or DENIED. Fail-closed.

Least-privilege delegation

Sub-agents get a narrowed toolset (no money, comms or code-exec) and can't re-delegate — enforced by role + depth, not convention.

Schema sanitization

Hostile or malformed tool-schema constructs are hardened before they ever reach a provider. On by default.

Skills scanned + quarantined

Every authored or installed skill lands in .pending, is scanned (fail-closed on error), and is reviewed before it can act. Forged turns can never auto-activate one.

Approval & threat-scan

Named tools can require explicit approval (fail-closed on denial/timeout); an optional memory threat-scan rejects injected jailbreak / persona-rewrite patterns at write time.

Crypto, payments & on-chain trust

On-chain identity & reputation (ERC-8004)

Implements the ERC-8004 Trustless Agents standard: register the agent on-chain as an ERC-721 identity, accumulate verifiable reputation, and have its work validated — across reputation, crypto-economic and TEE-attestation trust models, with EIP-712-signed feedback.

Agent wallet

A built-in wallet with a policy layer (rolling daily spend cap), a signer, and an append-only audit log — the agent transacts within hard limits.

x402 pay-per-request

Pay for services per call in USDC on Base (Coinbase facilitator; free testnet on base-sepolia) — and expose your own agent endpoints as paid, on-chain services.

Markets & token-gating

Read Polymarket prediction markets and Hyperliquid perp/spot data; verify token / NFT ownership (CollabLand, Alchemy) to gate access.

ERC-8004 identity + A2A discovery + x402 payments make a POLYROB agent a full participant in an open, trustless agent economy: discoverable, payable, and reputation-backed on-chain.

Interfaces — and building on it

Self-hosted, not a SaaS — there's no hosted version. But the primitives to build your own agent product on your own instance are in the core.

A2A protocol

Google's Agent-to-Agent spec: a discovery agent card, JSON-RPC, and SSE streaming, with API-key, x402, or wallet-SIWE auth — other agents can find and use yours.

OpenAI-compatible API

A drop-in /v1/chat/completions + /v1/models surface so existing OpenAI SDK clients talk to POLYROB unchanged.

REST API + dashboard

A full REST API and an optional single-user web dashboard (real-time, Socket.IO) for session monitoring and live guidance.

Multi-tenant + metering

Per-tenant isolation, usage metering and credit balances are in the core, so you can serve and meter multiple users on your own deployment.

Any model, native

OpenAI, Anthropic, Gemini, DeepSeek, OpenRouter and NVIDIA NIM through a native multi-provider layer (no third-party agent framework) — switch model mid-session; vision on vision-capable models.

Self-hosted, Apache-2.0

The whole engine is free and runs on your machine. Your keys, your data, your infrastructure — no feature behind a paywall, no hosted dependency.

Honest notes: autonomy loops are scoped to personal-agent mode (POLYROB_LOCAL=true) and stay off on a shared server until you opt in. Polymarket / Hyperliquid are surfaced for market data today — trade execution is still being hardened. Local code execution is convenience-sandboxed (timeouts, output caps, secret-free env), not a hard multi-tenant sandbox — keep it off in shared deployments. Twitter/X is an outbound tool, not an inbound chat surface.