I Built a Vibe-Check Tool — AI Code Scoring Gone Wrong

What is the Vibe-Check Tool and Why Does It Matter?

Learn more about anthropic accuses labs of using 24,000 fake accounts to exploit claude

In recent weeks, I developed a command-line interface (CLI) tool called vibe-check. This tool evaluates how much of your codebase is AI-generated, scoring each file from 0 (human) to 100 (AI-generated). The goal is clear: identify signs of AI-generated code by analyzing patterns such as over-commenting, generic naming conventions, hallucinated imports, repetitive structures, and placeholder code. After testing it on various projects and achieving plausible scores, I felt ready to share my findings in this blog post and on GitHub.

When I tested vibe-check on a full-stack web application featuring a React frontend, Express middleware, and a FastAPI ML backend, the results surprised me: a score of 0/100, indicating "MOSTLY HUMAN." This outcome raised questions about the tool's effectiveness in assessing modern AI-generated code.

How Does Vibe-Check Analyze Your Codebase?

The project I selected for testing had a diverse codebase:

Approximately 10,000 lines of Python for the machine learning backend
Around 7,000 lines of TypeScript
Roughly 14,000 lines of JavaScript

Since vibe-check only analyzes Python files, it couldn't evaluate a significant portion of the application. The React frontend, with its complex structure and components, went unnoticed by vibe-check, leading to an inaccurate assessment of the overall codebase.

What Did Vibe-Check Discover?

📚 For a deep dive on tim cook hints at apple's next major product category, see our full guide

When analyzing the project, vibe-check provided the following summary:

Scan path: /path/to/project
Files analyzed: 244   Skipped: 11   Errors: 0

📚 [For a deep dive on diode: build, program, and simulate hardware effortlessly, see our full guide](/diode-build-program-and-simulate-hardware-effortlessly)

╭───────────────────────────── Repository Summary ─────────────────────────────╮
│   Repo Vibe Score              0/100 — MOSTLY HUMAN                          │
│   Average Score                2                                             │
│   Highest Score                13                                            │
│   Lowest Score                 0                                             │
│   High Risk Files (>=60)       0                                             │
│   Medium Risk Files (40-59)    0                                             │
╰──────────────────────────────────────────────────────────────────────────────╯

In contrast, I tested another tool called commit-prophet, which was built entirely by an AI agent. It scored 2/100, also indicating "MOSTLY HUMAN." This result was perplexing; a fully AI-generated tool appeared to pass as human-written.

Why Did Vibe-Check Misfire?

The primary issue with vibe-check stems from its limited analysis capabilities. While it effectively detects generic patterns in Python code, it overlooks the broader context of a polyglot codebase. The organization of the React frontend, with its consistent naming conventions, was misinterpreted as human-written simply because vibe-check couldn't analyze TypeScript or JavaScript files.

Key Patterns Identified by Vibe-Check

Generic AI Naming: Vibe-check searches for common AI naming patterns, such as "helper" or "manager." However, many variable names in the ML backend were domain-specific, indicating human expertise.
Docstring Coverage: The tool checked for docstrings but missed their uniformity across functions. Human-written code often shows inconsistent docstring coverage.
Error Handling: Comprehensive error handling across API endpoints was flagged as human-like, but this could also indicate advanced AI capabilities.
Style Guide Adherence: The absolute consistency across different languages raised a red flag for human-written code. In reality, human codebases often reveal subtle inconsistencies.

How Can Detection Tools Evolve?

The misclassification of AI-generated code highlights a significant gap in tools like vibe-check. To enhance accuracy, the following improvements are essential:

Cross-Language Support: Implement language-agnostic detectors and abstract syntax tree (AST) analysis for TypeScript and JavaScript.
Consistency Scoring: Measure variance in naming, comments, and style to identify uniformity across files.
Vocabulary Specificity Index: Create a corpus of domain-specific terminology for better detection of AI-generated vocabulary.
Commit-Level Analysis: Examine commit history for patterns indicative of AI-generated code.
Test-to-Source Ratio Analysis: Investigate coverage distribution across modules to identify discrepancies.

What Does the Future Hold for AI Code Detection?

As AI technology evolves, so must the tools designed to detect it. The key insight from this experience is that modern AI-generated code often exhibits a level of expertise that can rival human-written code. Consequently, detection methods must shift from identifying obvious flaws to recognizing sophisticated patterns.

Frequently Asked Questions

What is vibe-check?

Vibe-check is a CLI tool designed to score the percentage of AI-generated code in a codebase, primarily focusing on Python files.

Why did vibe-check score 0/100 on an AI-built codebase?

The tool only analyzes Python files, missing significant portions of the codebase written in TypeScript and JavaScript, leading to an inaccurate assessment.

How can detection tools improve?

Detection tools can improve by implementing cross-language support, consistency scoring, and analyzing commit histories to recognize sophisticated patterns in AI-generated code.

Conclusion

Building vibe-check has revealed harsh realities about AI detection tools. While early AI-generated code displayed clear anomalies, modern AI models create code that often resembles expert-level work. This shift means detection tools must adapt rapidly to keep pace. As developers, we must remain vigilant in recognizing AI capabilities and adjust our approaches accordingly.

If you're interested in following my progress as I rebuild vibe-check, check it out on GitHub: github.com/LakshmiSravyaVedantham/vibe-check.

Additional Frequently Asked Questions

Q: What is Artificial Intelligence?
A: Artificial Intelligence is a fundamental concept in modern development, referring to...

Q: Why should I learn Artificial Intelligence?
A: Learning Artificial Intelligence helps you write better, more maintainable code and stay current with industry best practices.

Q: When should I use Artificial Intelligence?
A: Use Artificial Intelligence when you need to...

Q: How do I get started with Artificial Intelligence?
A: Getting started with Artificial Intelligence is straightforward. First, ensure you have the necessary prerequisites installed, then follow the tutorials above.

Q: What's the difference between Artificial Intelligence and Developer Tools?
A: While both Artificial Intelligence and Developer Tools serve similar purposes, they differ in implementation and use cases...

Continue learning: Next, explore meta and amd's $100 billion ai chips deal: a game changer

Continue learning: Next, explore meta and AMD's $100 billion AI chips deal: a game changer