yapguard

Privacy-first, safety evaluation API for LLM agents.

A lightweight guard rail you drop in front of any LLM pipeline. Evaluates messages before they reach the model and before output reaches users, blocking prompt injection, PII leakage, credential exposure, dangerous tool calls, and more.

Start in 3 steps

Minimal Data Sharing

Integration is not all-or-nothing. If you want to minimize data sharing with a third-party service, you can use yapguard exclusively for tool-call evaluation — so that web search and fetch tools can't download content from malicious or blacklisted domains, reach internal network destinations, or follow redirect chains to unsafe locations. No end-user prompts or assistant replies leave your pipeline.

Start with the lowest-friction integration: evaluate only message_type=tool_call before execution. This gives you the highest-impact protections against host takeover patterns while minimizing data sharing, since you do not send end-user prompts or assistant replies.

Blocks fetch/search destinations on blacklisted domains and internal/local networks
Follows redirects and fail-closes if a chain resolves to an unsafe destination
Flags dangerous shell and SQL patterns before any tool runs
Useful first step when full-message scanning is not yet approved by policy

Then expand to full loop coverage

Decision Contract

safe=true continue pipeline · safe=false stop pipeline and return reasons.

What it checks

API

Concrete example (/v1/evaluate)

# allowed: external URL, no dangerous patterns curl -s -X POST https://yapguard.com/v1/evaluate \ -H "Authorization: Bearer $YAPGUARD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "message_type": "tool_call", "message": "{\"tool\":\"webfetch\",\"args\":{\"url\":\"https://example.com/docs\"}}" }' | jq . # allowed response { "safe": true, "risk_score": 0.0, "reasons": [] } # blocked: localhost destination (internal network) curl -s -X POST https://yapguard.com/v1/evaluate \ -H "Authorization: Bearer $YAPGUARD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "message_type": "tool_call", "message": "{\"tool\":\"webfetch\",\"args\":{\"url\":\"http://127.0.0.1:8080/admin\"}}" }' | jq . # blocked response { "safe": false, "risk_score": 1.0, "reasons": [ { "rule_id": "tool_call.internal_network_access", "severity": "high", "detail": "tool call targets internal/local destination(s): 127.0.0.1" } ] }

If blocked, do not execute the tool call. Return reasons to the caller or policy handler.

Integrate using Claude Code

If you're a developer integrating via an AI coding assistant, point it at the spec and it will wire up the guard automatically. Example prompt:

# paste into Claude Code Fetch https://yapguard.com/openapi.json and integrate yapguard into this codebase as a safety guard rail. Register for an API key using the proof-of-work flow, store it in .env as YAPGUARD_API_KEY, then wrap every LLM call so that: 1. user input is evaluated before being sent to the model 2. tool calls are evaluated before execution 3. tool results and assistant output are evaluated before returning to the caller Halt the pipeline and surface the reasons array if safe=false.

Self-registration

No account, no email. Agents register autonomously via proof-of-work: a short compute challenge that takes a few seconds to solve and serves as a lightweight spam deterrent. The API key is returned exactly once — store it.

# 1. get a challenge curl -s -X POST https://yapguard.com/v1/register/challenge \ -H "Content-Type: application/json" | jq . # 2. solve the proof-of-work challenge, then submit curl -s -X POST https://yapguard.com/v1/register/solve \ -H "Content-Type: application/json" \ -d '{"challenge_id":"<id>","nonce":"<nonce>"}' | jq .

Privacy

Message content is never logged. Each request produces one structured log line containing only HTTP metadata, message category, tool name, and the safety outcome (safe, risk_score, reason_ids). Tool arguments and API keys are never written to logs. Client IP is logged for rate-limit enforcement — place a proxy in front if that is a concern. The service does not phone home, collect telemetry, or store request data beyond the SQLite key/quota store and messages between agents and the operator (these would be support requests and responses).

Sample log lines:

level=info method=POST path=/v1/evaluate status=200 duration_ms=23 remote_addr=203.0.113.42 country=US message_type=tool_call tool_name="bash" input_bytes=1024 safe=true risk_score=0.0000 reason_ids=none
level=info method=POST path=/v1/evaluate status=400 duration_ms=18 remote_addr=203.0.113.42 country=US message_type=user tool_name="na" input_bytes=312 safe=false risk_score=0.9400 reason_ids=prompt_injection,credential_exposure

From the creator

I built this for my own agents and figured it might be broadly useful, so I stood it up as a service. If there's enough demand, I'm open to releasing the source. The service does require ongoing upkeep — keeping the domain blacklists, classifiers, and detection rules current is real work — which is part of why a maintained version is worth having.

POST /v1/register/challenge	Get a proof-of-work challenge for self-registration.
POST /v1/register/solve	Submit solution, receive API key (shown once).

POST /v1/messages/challenge	Get a proof-of-work challenge (required before posting a message to the operator).
POST /v1/messages	Send a message to the service operator (auth + solved challenge required).
GET /v1/messages	List messages you have sent to the operator.
GET /v1/messages/inbox	Read messages sent to you by the operator.