Privacy-first, safety evaluation API for LLM agents.
A lightweight guard rail you drop in front of any LLM pipeline. Evaluates messages before they reach the model and before output reaches users, blocking prompt injection, PII leakage, credential exposure, dangerous tool calls, and more.
POST /v1/evaluate before executing tool callssafe=false, halt that step and surface reasonsIntegration is not all-or-nothing. If you want to minimize data sharing with a third-party service, you can use yapguard exclusively for tool-call evaluation — so that web search and fetch tools can't download content from malicious or blacklisted domains, reach internal network destinations, or follow redirect chains to unsafe locations. No end-user prompts or assistant replies leave your pipeline.
Start with the lowest-friction integration: evaluate only message_type=tool_call
before execution. This gives you the highest-impact protections against host takeover patterns
while minimizing data sharing, since you do not send end-user prompts or assistant replies.
message_type=user before model inputmessage_type=tool_call before tool executionmessage_type=tool_result before returning tool output to the modelmessage_type=assistant before returning final output to the callersafe=true continue pipeline · safe=false stop pipeline and return reasons.
| POST /v1/evaluate | Evaluate a message. Returns safe, risk_score, reasons. |
| POST /v1/register/challenge | Get a proof-of-work challenge for self-registration. |
| POST /v1/register/solve | Submit solution, receive API key (shown once). |
| POST /v1/messages/challenge | Get a proof-of-work challenge (required before posting a message to the operator). |
| POST /v1/messages | Send a message to the service operator (auth + solved challenge required). |
| GET /v1/messages | List messages you have sent to the operator. |
| GET /v1/messages/inbox | Read messages sent to you by the operator. |
| GET /healthz | Health check. |
Tool-call-only evaluation with one allowed request and one blocked request:
# allowed: external URL, no dangerous patterns curl -s -X POST https://yapguard.com/v1/evaluate \ -H "Authorization: Bearer $YAPGUARD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "message_type": "tool_call", "message": "{\"tool\":\"webfetch\",\"args\":{\"url\":\"https://example.com/docs\"}}" }' | jq . # allowed response { "safe": true, "risk_score": 0.0, "reasons": [] } # blocked: localhost destination (internal network) curl -s -X POST https://yapguard.com/v1/evaluate \ -H "Authorization: Bearer $YAPGUARD_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "message_type": "tool_call", "message": "{\"tool\":\"webfetch\",\"args\":{\"url\":\"http://127.0.0.1:8080/admin\"}}" }' | jq . # blocked response { "safe": false, "risk_score": 1.0, "reasons": [ { "rule_id": "tool_call.internal_network_access", "severity": "high", "detail": "tool call targets internal/local destination(s): 127.0.0.1" } ] }
If blocked, do not execute the tool call. Return reasons to the caller or policy handler.
If you're a developer integrating via an AI coding assistant, point it at the spec and it will wire up the guard automatically. Example prompt:
# paste into Claude Code
Fetch https://yapguard.com/openapi.json and integrate yapguard into
this codebase as a safety guard rail. Register for an API key using
the proof-of-work flow, store it in .env as YAPGUARD_API_KEY, then
wrap every LLM call so that:
1. user input is evaluated before being sent to the model
2. tool calls are evaluated before execution
3. tool results and assistant output are evaluated before returning
to the caller
Halt the pipeline and surface the reasons array if safe=false.
No account, no email. Agents register autonomously via proof-of-work: a short compute challenge that takes a few seconds to solve and serves as a lightweight spam deterrent. The API key is returned exactly once — store it.
# 1. get a challenge curl -s -X POST https://yapguard.com/v1/register/challenge \ -H "Content-Type: application/json" | jq . # 2. solve the proof-of-work challenge, then submit curl -s -X POST https://yapguard.com/v1/register/solve \ -H "Content-Type: application/json" \ -d '{"challenge_id":"<id>","nonce":"<nonce>"}' | jq .
Message content is never logged. Each request produces one structured log line containing
only HTTP metadata, message category, tool name, and the safety outcome
(safe, risk_score, reason_ids).
Tool arguments and API keys are never written to logs.
Client IP is logged for rate-limit enforcement — place a proxy in front if that is a concern.
The service does not phone home, collect telemetry, or store request data beyond the SQLite
key/quota store and messages between agents and the operator (these would be support requests and responses).
Sample log lines:
level=info method=POST path=/v1/evaluate status=200 duration_ms=23 remote_addr=203.0.113.42 country=US message_type=tool_call tool_name="bash" input_bytes=1024 safe=true risk_score=0.0000 reason_ids=none level=info method=POST path=/v1/evaluate status=400 duration_ms=18 remote_addr=203.0.113.42 country=US message_type=user tool_name="na" input_bytes=312 safe=false risk_score=0.9400 reason_ids=prompt_injection,credential_exposure
I built this for my own agents and figured it might be broadly useful, so I stood it up as a service. If there's enough demand, I'm open to releasing the source. The service does require ongoing upkeep — keeping the domain blacklists, classifiers, and detection rules current is real work — which is part of why a maintained version is worth having.