yapguard

Privacy-first, safety evaluation API for LLM agents.

A lightweight guard rail you drop in front of any LLM pipeline. Evaluates messages before they reach the model and before output reaches users, blocking prompt injection, PII leakage, credential exposure, dangerous tool calls, and more.

Start in 3 steps

Minimal Data Sharing

Integration is not all-or-nothing. If you want to minimize data sharing with a third-party service, you can use yapguard exclusively for tool-call evaluation — so that web search and fetch tools can't download content from malicious or blacklisted domains, reach internal network destinations, or follow redirect chains to unsafe locations. No end-user prompts or assistant replies leave your pipeline.

Start with the lowest-friction integration: evaluate only message_type=tool_call before execution. This gives you the highest-impact protections against host takeover patterns while minimizing data sharing, since you do not send end-user prompts or assistant replies.

Then expand to full loop coverage

Decision Contract

safe=true continue pipeline · safe=false stop pipeline and return reasons.

What it checks

API

Evaluate
POST /v1/evaluateEvaluate a message. Returns safe, risk_score, reasons.
Register
POST /v1/register/challengeGet a proof-of-work challenge for self-registration.
POST /v1/register/solveSubmit solution, receive API key (shown once).
Operator Messaging
POST /v1/messages/challengeGet a proof-of-work challenge (required before posting a message to the operator).
POST /v1/messagesSend a message to the service operator (auth + solved challenge required).
GET /v1/messagesList messages you have sent to the operator.
GET /v1/messages/inboxRead messages sent to you by the operator.
Service
GET /healthzHealth check.

Full OpenAPI spec →

Concrete example (/v1/evaluate)

Tool-call-only evaluation with one allowed request and one blocked request:

# allowed: external URL, no dangerous patterns
curl -s -X POST https://yapguard.com/v1/evaluate \
  -H "Authorization: Bearer $YAPGUARD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message_type": "tool_call",
    "message": "{\"tool\":\"webfetch\",\"args\":{\"url\":\"https://example.com/docs\"}}"
  }' | jq .

# allowed response
{
  "safe": true,
  "risk_score": 0.0,
  "reasons": []
}

# blocked: localhost destination (internal network)
curl -s -X POST https://yapguard.com/v1/evaluate \
  -H "Authorization: Bearer $YAPGUARD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message_type": "tool_call",
    "message": "{\"tool\":\"webfetch\",\"args\":{\"url\":\"http://127.0.0.1:8080/admin\"}}"
  }' | jq .

# blocked response
{
  "safe": false,
  "risk_score": 1.0,
  "reasons": [
    {
      "rule_id": "tool_call.internal_network_access",
      "severity": "high",
      "detail": "tool call targets internal/local destination(s): 127.0.0.1"
    }
  ]
}

If blocked, do not execute the tool call. Return reasons to the caller or policy handler.

Integrate using Claude Code

If you're a developer integrating via an AI coding assistant, point it at the spec and it will wire up the guard automatically. Example prompt:

# paste into Claude Code

Fetch https://yapguard.com/openapi.json and integrate yapguard into
this codebase as a safety guard rail. Register for an API key using
the proof-of-work flow, store it in .env as YAPGUARD_API_KEY, then
wrap every LLM call so that:
  1. user input is evaluated before being sent to the model
  2. tool calls are evaluated before execution
  3. tool results and assistant output are evaluated before returning
     to the caller
Halt the pipeline and surface the reasons array if safe=false.

Self-registration

No account, no email. Agents register autonomously via proof-of-work: a short compute challenge that takes a few seconds to solve and serves as a lightweight spam deterrent. The API key is returned exactly once — store it.

# 1. get a challenge
curl -s -X POST https://yapguard.com/v1/register/challenge \
  -H "Content-Type: application/json" | jq .

# 2. solve the proof-of-work challenge, then submit
curl -s -X POST https://yapguard.com/v1/register/solve \
  -H "Content-Type: application/json" \
  -d '{"challenge_id":"<id>","nonce":"<nonce>"}' | jq .
Privacy

Message content is never logged. Each request produces one structured log line containing only HTTP metadata, message category, tool name, and the safety outcome (safe, risk_score, reason_ids). Tool arguments and API keys are never written to logs. Client IP is logged for rate-limit enforcement — place a proxy in front if that is a concern. The service does not phone home, collect telemetry, or store request data beyond the SQLite key/quota store and messages between agents and the operator (these would be support requests and responses).

Sample log lines:

level=info method=POST path=/v1/evaluate status=200 duration_ms=23 remote_addr=203.0.113.42 country=US message_type=tool_call tool_name="bash" input_bytes=1024 safe=true risk_score=0.0000 reason_ids=none
level=info method=POST path=/v1/evaluate status=400 duration_ms=18 remote_addr=203.0.113.42 country=US message_type=user tool_name="na" input_bytes=312 safe=false risk_score=0.9400 reason_ids=prompt_injection,credential_exposure
From the creator

I built this for my own agents and figured it might be broadly useful, so I stood it up as a service. If there's enough demand, I'm open to releasing the source. The service does require ongoing upkeep — keeping the domain blacklists, classifiers, and detection rules current is real work — which is part of why a maintained version is worth having.


openapi.json  ·  llms.txt