I Mass-Rejected 847 GenAI Pull Requests. Line 673 Would Have Leaked Credit Card Numbers.

Pranjal Gupta

Jan 07, 2026

Last Month, 2:47 AM.

I’m reviewing code for a Fortune 500’s “revolutionary AI assistant.” 847 lines. Looks clean. Ships tomorrow.

Line 673: print(f"Debug: {user_input}")

That “user_input” contained credit card numbers. In production logs. Unencrypted.

One junior dev. One debug statement. $50M lawsuit waiting to happen.

I mass-rejected the entire PR.

The Math That Should Terrify You

$40+ billion invested in enterprise GenAI in 2024. (IDC)

80% of AI projects fail. That’s double the failure rate of traditional IT projects. (RAND Corporation)

Only 48% of AI projects make it to production. (Gartner)

Only 25% of AI initiatives delivered expected ROI. (2025 CEO Survey)

Every. Single. Week. I watch another GenAI project die.

Not because GPT-4 isn’t smart enough.

Because nobody built the boring parts.

The Dirty Secret Nobody Tells You

Every GenAI tutorial teaches you this:

response = openai.chat.completions.create(
    model=”gpt-4”,
    messages=[{”role”: “user”, “content”: prompt}]
)

Congratulations. You can call a function.

That’s not a GenAI application. That’s a hello world with a $0.03 price tag.

The 90% that fail? They stopped here.

What Production GenAI Actually Looks Like

I’ve deployed GenAI systems that process thousands of requests daily. Here’s what the folder structure looks like for the ones that survive:

genai_project/
├── config/
│   ├── __init__.py
│   ├── model_config.yaml      # Model settings, not hardcoded
│   ├── prompt_templates.yaml  # Prompts as config, not code
│   └── logging_config.yaml    # You WILL need logs
│
├── src/
│   ├── llm/
│   │   ├── __init__.py
│   │   ├── base.py            # Abstract base (swap providers)
│   │   ├── claude_client.py   # Anthropic
│   │   ├── openai_client.py   # OpenAI
│   │   └── fallback.py        # When primary fails
│   │
│   ├── prompt_engineering/
│   │   ├── __init__.py
│   │   ├── templates.py       # Jinja2 or similar
│   │   ├── few_shot.py        # Example management
│   │   └── chainer.py         # Multi-step prompts
│   │
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── rate_limiter.py    # Don't get rate limited
│   │   ├── token_counter.py   # Know your costs
│   │   ├── cache.py           # Don't pay twice
│   │   └── logger.py          # Debug at 2 AM
│   │
│   └── handlers/
│       ├── __init__.py
│       └── error_handler.py   # When (not if) it breaks
│
├── data/
│   ├── cache/                 # API response cache
│   ├── prompts/               # Version-controlled prompts
│   ├── outputs/               # What the model returned
│   └── embeddings/            # If using RAG
│
├── examples/
│   ├── basic_completion.py
│   ├── chat_session.py
│   └── chain_prompts.py
│
├── notebooks/
│   ├── prompt_testing.ipynb   # Iterate here first
│   ├── response_analysis.ipynb
│   └── cost_analysis.ipynb    # Track your spend
│
├── tests/
│   ├── test_prompts.py        # Yes, test your prompts
│   ├── test_handlers.py
│   └── test_integration.py
│
├── requirements.txt
├── setup.py
├── README.md
├── Dockerfile
└── .env.example               # Never commit .env

The 7 Folders That Separate Survivors From Casualties

1. `config/` — Configuration as Code

What dies: Hardcoded API keys, model names in every file, prompts buried in code.

What survives: YAML configs that can change without code deploys.

# model_config.yaml
models:
  primary:
    provider: anthropic
    model: claude-3-5-sonnet
    max_tokens: 4096
    temperature: 0.7
  fallback:
    provider: openai
    model: gpt-4-turbo

Career tip: The engineer who can swap models in 5 minutes instead of 5 days gets promoted.

2. `src/llm/` — Provider Abstraction

What dies: Direct API calls scattered everywhere.

What survives: Abstract base class with swappable implementations.

# base.py
from abc import ABC, abstractmethod

class LLMClient(ABC):
    @abstractmethod
    def complete(self, prompt: str) -> str:
        pass

    @abstractmethod
    def stream(self, prompt: str):
        pass

Why this matters: When OpenAI has an outage (they will), you switch to Claude in one line.

3. `src/utils/rate_limiter.py` — Don’t Get Rate Limited

What dies: Raw API calls with no throttling.

What survives: Token bucket or sliding window rate limiting.

# Simplified example
class RateLimiter:
    def __init__(self, calls_per_minute: int):
        self.calls_per_minute = calls_per_minute
        self.calls = []

    def wait_if_needed(self):
        now = time.time()
        self.calls = [c for c in self.calls if now - c < 60]
        if len(self.calls) >= self.calls_per_minute:
            sleep_time = 60 - (now - self.calls[0])
            time.sleep(sleep_time)
        self.calls.append(now)

The cost of skipping this: One bad loop can burn through your monthly budget in an hour.

4. `src/utils/cache.py` — Don’t Pay Twice

What dies: Same prompt, same response, charged twice.

What survives: Content-addressed caching.

import hashlib
import json

def cache_key(prompt: str, model: str) -> str:
    content = f”{model}:{prompt}”
    return hashlib.sha256(content.encode()).hexdigest()

Real savings: I’ve seen teams cut API costs by 40% just by caching repeated queries.

5. `data/prompts/` — Version-Controlled Prompts

What dies: Prompts edited in production, no history, no rollback.

What survives: Prompts in git with semantic versioning.

data/prompts/
├── summarization/
│   ├── v1.0.txt
│   ├── v1.1.txt
│   └── v2.0.txt      # Breaking change
└── classification/
    ├── v1.0.txt
    └── v1.1.txt

Why this matters: When the CEO asks “why did the model say that last Tuesday?” you can answer.

6. `handlers/error_handler.py` — When, Not If

What dies: Try/except that swallows errors.

What survives: Structured error handling with fallbacks.

class LLMError(Exception):
    pass

class RateLimitError(LLMError):
    pass

class ContentFilterError(LLMError):
    pass

class ModelOverloadError(LLMError):
    pass

def handle_llm_call(func):
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except RateLimitError:
            # Switch to fallback model
            return fallback_call(*args, **kwargs)
        except ContentFilterError:
            # Log and return safe response
            log_content_filter(args)
            return SAFE_RESPONSE
    return wrapper

Production reality: Every LLM will reject content, hit rate limits, or time out. Plan for it.

7. `tests/test_prompts.py` — Yes, Test Your Prompts

What dies: “It worked in the playground.”

What survives: Automated prompt regression tests.

def test_summarization_maintains_key_facts():
    “”“Ensure summary contains critical information.”“”
    input_text = SAMPLE_DOCUMENT
    result = summarize(input_text)

    assert “quarterly revenue” in result.lower()
    assert “CEO” in result
    assert len(result) < len(input_text) * 0.3

def test_classification_known_inputs():
    “”“Test classification against labeled examples.”“”
    for example in LABELED_EXAMPLES:
        result = classify(example.text)
        assert result == example.expected_label

The uncomfortable truth: Most teams don’t test prompts because they don’t know how. Now you do.

The Pattern Nobody Talks About: Graceful Degradation

Here’s the architecture that survives production:

Request
   ↓
[Validation Layer] — Reject bad inputs early
   ↓
[Cache Check] — Return cached if available
   ↓
[Rate Limiter] — Throttle if needed
   ↓
[Primary Model] — Try Claude/GPT-4
   ↓ (on failure)
[Fallback Model] — Try alternative
   ↓ (on failure)
[Cached Response] — Return stale if available
   ↓ (on failure)
[Graceful Error] — User-friendly message

Every layer has a fallback. That’s what production looks like.

What This Means for Your Career

Here’s the uncomfortable truth nobody in tech Twitter wants to admit:

80% of AI projects fail. (RAND)

Only 25% of AI initiatives deliver expected ROI. (IBM CEO Study)

Companies are DESPERATE for people who can ship GenAI to production. Not demo it. Ship it.

I’ve seen senior engineers with 15 years experience get passed over for juniors who understood this stuff.

The career play: Be the person who builds systems, not demos.

That person is rare.

That person names their salary.

The Minimum Viable Structure

If you take nothing else, start here:

my_genai_project/
├── config/
│   └── settings.yaml
├── src/
│   ├── llm_client.py      # Abstracted provider
│   ├── rate_limiter.py    # Don't get banned
│   └── cache.py           # Don't pay twice
├── prompts/
│   └── v1/                # Version your prompts
├── tests/
│   └── test_prompts.py    # Test your prompts
└── requirements.txt

That’s 6 files. If your GenAI project doesn’t have at least this, you’re building a demo, not a product.

Next Steps

Audit your current project. How many of these 7 folders do you have?
Add rate limiting today. It’s the fastest way to prevent a $10,000 mistake.
Version your prompts. Future you will thank present you.
Write one prompt test. Just one. The habit matters more than coverage.

The Bottom Line

90% of GenAI projects fail.

Not because the AI isn’t smart enough.

Because the humans building it skipped the boring parts.

Seven folders. That’s it.

The difference between a $1.9M write-off and a product that actually ships.

Build the boring parts.

That’s where the money is.

P.S. The engineers who understand this will be unhireable in 2 years. Not because they’re bad. Because everyone will want them.

Structure diagram inspiration: Brij Kishore Pandey

Sources:

Neural Foundry

The line 673 anecdote is brutally effective framing. Rate limiting and cache layers aren't optional extras when you're burning $0.03 per call, they're survival mechanisms. I've debugged enough 3AM incidents where a recursive loop ate through $15k in API credits to know graceful degredation isn't about elegance, its about staying employed when things break. The version-controlled prompts folder feels almost obvious in hindsight but I rarely see it implemented correctly.

Expand full comment

Discussion about this post

Ready for more?