← Back to hard AIs

← Models Catalog

How This Catalog Works

All entries are researched and drafted with AI tools using primary sources wherever possible; claims that can't be verified don't make it in. That said, AI-assisted research is not infallible and model details — particularly pricing and licensing — change quickly. Before committing to a model for a business use case, verify the specifics with the model's creator or official documentation.

How This Catalog Works

Sources, methods, and why you should trust this catalog — or not.


The Problem

The information you need to choose an AI model is out there, but it's scattered across a dozen developer-facing sites and written by engineers for other engineers.

Hugging Face alone hosts around 2.8 million models, and most of them are fine-tunes, merges, and quantizations of a few hundred real originals. You wouldn't know that from looking at the page — what you'd see is a wall of names like OBLITERATUS/gemma-4-E4B-it-OBLITERATED, parameter counts in the billions, and acronyms like GGUF and MoEA model architecture that splits the model into many smaller specialized "expert" networks, only activating a handful per input rather than running the whole model every time. The practical effect: you get the knowledge capacity of a big model with the compute cost of a much smaller one. Mistral Large 3 and Mistral Small 4 are both MoE models. with no explanation. Model cards assume you already know what the terms mean.

That's just one site. The rest of the landscape is no friendlier. Technical papers live on arXiv. Source code and release notes live on GitHub. Benchmark comparisons live on LMSYS Chatbot ArenaA public leaderboard where users chat with two anonymous models side by side and pick which response they prefer. Aggregated votes produce an Elo-style ranking. Unlike static benchmarks, Chatbot Arena reflects what humans actually prefer in real interactions. It's one of the most reliable signals of real-world chat quality, though it favors style and friendliness alongside raw capability., Open LLM Leaderboard, and Artificial AnalysisAn independent benchmarking site that runs standardized tests across commercial and open-weight models and publishes comparable results on capability, speed, and cost. Widely cited for API provider comparisons — if you want to know whether Llama 3.3 70B is faster on Groq or Together, Artificial Analysis is the reference.. Official announcements are scattered across the individual creator blogs — Meta AI, Mistral, Anthropic, OpenAI, Google DeepMind, Alibaba's QwenLM, DeepSeek, and dozens more. License files live on project sites, GitHub repositories, or buried in Hugging Face folders. API pricing lives on each provider's own page — OpenAI, Anthropic, Together, Groq, Fireworks, Replicate, OpenRouter — and each one prices differently. Kaggle hosts datasets and competition-tuned models under its own conventions. Community discussion happens on Reddit's r/LocalLLaMA, Discord servers, and Twitter.

None of those sites are designed to help a business owner compare their options. They're designed to help developers find what they need, once those developers already know what they're looking for. If you're trying to figure out which model fits your business and you're not a developer, every one of those sites is a wall.

The hard AIs Model Catalog exists to bridge that gap. It takes the scattered, developer-facing reality of the AI model landscape and translates it into something a non-developer can actually navigate: browsable, comparable, written in plain English, and honest about what each model does well, where it falls short, and whether you can legally use it for your business.


What This Catalog Promises

Before you use anything you find here, read this:

The hard AIs Model Catalog is designed to save you time by consolidating thousands of AI models from across the web into one browsable, plain-English resource. All entries are researched and drafted with AI tools using primary sources wherever possible; claims that can't be verified don't make it in. That said, AI-assisted research is not infallible and model details — particularly pricing and licensing — change quickly. Before committing to a model for a business use case, verify the specifics with the model's creator or official documentation.

I want to unpack a few pieces of that, because they're the load-bearing commitments.

"Consolidating thousands of AI models." I don't try to catalog all 2.8 million models on Hugging Face. I catalog the ones worth knowing about — the foundation models that everything else is built on, plus the fine-tunes, merges, and quantizations that offer meaningful differentiation. Everything else is findable through the catalog's search function, but it doesn't get a curated entry. Signal over volume.

"Primary sources wherever possible." A primary source is the horse's mouth — the model creator's own announcement, the official model cardThe documentation that ships with a model, typically on Hugging Face or the creator's site. A good model card lists the architecture, training data summary, intended uses, limitations, evaluation scores, and license. Model cards are the catalog's primary source for listing entries; each catalog entry links back to the canonical model card., the original research paper, the actual license file. Not a summary. Not a paraphrase. Not a secondhand article recapping what someone else said. When I cite a fact in an entry, I'm citing what the creator themselves published, and the link in the entry's Sources block will take you there.

"Claims that can't be verified don't make it in." This one matters. If I can't trace a claim back to a primary source, I don't guess. I leave the field unknown, flag it as unverified, or pull the whole entry to the Watchlist until I can verify it. A catalog full of confident-sounding guesses is worse than a catalog that admits what it doesn't know.

"AI-assisted research is not infallible." It isn't. I've caught errors in my own drafts. AI tools can misread a license, miss a version release, or confidently repeat something that was true six months ago and isn't now. That's why the next line exists.

"Verify the specifics." Before you commit to a model for a real use case — especially around license, cost, or regulatory fit — go to the source and confirm. The catalog tells you what to look at and what questions to ask. The model's creator tells you what's currently true.


Where the Information Comes From

When I build a catalog entry, I work through sources in a specific order of trust.

Primary sources come first and are required. That means the creator's own announcements and documentation — Meta AI's blog, mistral.ai, anthropic.com, openai.com, Google's DeepMind and AI Studio pages, Alibaba's QwenLM GitHub, DeepSeek's technical reports — plus the model's official card on Hugging Face when one exists, the original research paper on arXiv, and the actual license file fetched directly from where the creator publishes it. Every substantive claim in an entry traces back to a primary source that's linked in the entry's Sources block. If I can't link you to it, it doesn't go in.

Secondary sources provide reality-checking and context. Independent benchmark leaderboards — LMSYS Chatbot Arena for head-to-head capability comparisons, Open LLM Leaderboard for aggregated benchmarks, Artificial Analysis for API latency and cost data. Named, traceable reviewers whose reputations I can verify — people like Simon Willison and Nathan Lambert who do real, attributable work in this space. Tech publications with real reporting behind them, not summaries of press releases.

Tertiary sources provide signal, not truth. Hugging Face download counts tell me what's being adopted. Forum discussions on r/LocalLLaMA tell me what people are running into in practice. GitHub star counts and release activity tell me whether a project is alive. Neither of these go in the catalog as facts — they shape my sense of what's worth paying attentionThe mechanism inside a Transformer that lets the model weigh which parts of the input matter most when processing each word. When you read "the cat sat on the mat," attention is how the model knows that "it" in a later sentence refers back to the cat, not the mat. Attention is what made modern language models possible. to, and that's where they stop.

What I don't use: vendor marketing pages that aren't the model's official card, aggregator sites scraping without attribution, anonymous posts without primary-source citations, and benchmark numbers I can't trace back to their methodology. A claim you can't verify isn't useful. A source you can't cite isn't a source.


How Entries Are Built

Here's how every catalog entry actually gets made: I use AI tools — Claude, from Anthropic, as the primary tool — in combination with other tools and methods to do the heavy lifting on research and drafting. That combination can do in minutes what would take a single person hours: pull up source documents, compare versions across a family of models, draft sections, surface meaningful comparisons. That's the only way a catalog at this scale can exist, let alone stay current.

I'm not going to walk through the specifics of which other tools I use or how the pieces fit together. That's the proprietary part of what hard AIs, LLC does as a consulting practice, and frankly, how I build is part of what my clients pay me for. What I will tell you is the outcome: AI tools handle the research and drafting at speed and scale; I review, edit, and sign off on every entry before it publishes. I spot-check license claims against the actual license files. I verify pricing against provider pages. I read the Quick Take and the Best For and Not For sections and ask whether they'd actually help a real client I've worked with.

My name goes on the content. My reputation takes the hit if the content is wrong. You're reading human-verified content produced at a speed and scale no human-only editorial team could match — that combination is the entire point, and the accountability for what publishes is mine.


Who Stands Behind This

I'm Caleb Bryant, and I founded hard AIs, LLC to help non-developer businesses actually use AI.

My background isn't in AI research or software engineering. It's in business — specifically in businesses that run on data. I spent 27 years in mortgage data analysis, including 8 years owning my own mortgage company. What that means practically is that I've spent my career translating between the people who build tools and the people who use those tools to run businesses. I know firsthand how often the gap between them is the thing that keeps good tools from being useful.

hard AIs, LLC is the continuation of that work. My clients are business owners, operators, and professionals who need to use AI but don't have the time — or interest — to decode developer-speak to do it. I understand that side of the table, because it's the side I spent almost three decades on, including as a business owner myself. The Model Catalog is the public-facing reference layer of my practice: the kind of research I do for clients, translated into plain English, so any business owner can make an informed model choice without needing a developer to interpret it for them.

I care more about getting this right than about making it look polished. That's why the catalog shows visible uncertainty when I'm uncertain, plain-English license summaries where most sites would paraphrase the legal document into something equally opaque, stale-date warnings when information might have changed since I last checked, and entry-level ratings I've actually justified in writing. It's also why my name is on every entry's footer, and why this page exists in the first place.


How Entries Stay Current

Information in this field moves fast. A provider's pricing page changes without notice. A license gets amended. A model ships a point release that quietly changes its capabilities or context windowThe maximum amount of text the model can "see" at once — prompt plus prior conversation plus any documents you give it. Measured in tokens (which are roughly three-quarters of a word each). A 128K context window is about 96,000 words of input — roughly a 400-page book. Larger context windows let the model work with bigger documents but cost more to run.. A catalog that doesn't keep up is worse than no catalog at all, because out-of-date information delivered in a confident tone is exactly how business decisions get made on bad assumptions.

Three mechanisms keep the catalog honest about freshness.

Every entry carries a visible "last reviewed" date. If I haven't looked at it in a while, you can see that on the page. After ninety days without a review, the page shows a "last reviewed" note in the header; after six months, that note becomes a stronger warning; after a full year, the entry auto-archives pending review and stops showing up in browse results.

Every entry with pricing carries a separate "last verified" date on the pricing specifically, because pricing is the field that moves fastest. If it's been more than sixty days, the page will show a warning on the cost section with a link to the provider's pricing page so you can check for yourself.

Every entry has a maintenance status — whether the creator is still actively updating the model, or whether it's been feature-frozen or abandoned. That shows up on the entry so you don't build on something the original team has walked away from.

If you land on an entry and see freshness warnings, trust the warnings over the content. That's what they're for.


Corrections

If you find something wrong in a catalog entry, tell me. Every entry has a "suggest a correction" link in its footer. Submissions come directly to me. I verify against primary sources and, if the correction is valid, I update the entry with a note in the changelog. If a submitted correction is valid and I can't resolve it within thirty days, I either update the entry with whatever I can confirm or I pull the affected field entirely until I can.

That's not a customer service pitch. It's a credibility commitment. A catalog that doesn't correct itself isn't worth trusting, and I'm not going to build one that doesn't.


When to Verify

Everything in this catalog is a translation layer. It's designed to help you navigate, compare, and shortlist — not to be your final source of truth. Before you commit to a model for a real business use case, go verify the specifics at the source. Read the actual license, don't just take my summary. Check today's pricing on the provider's page. Confirm the model is available in your region and for your use case.

The catalog tells you what to look at and what questions to ask. The model's creator tells you what's currently true. You need both, and you need to know which is which.

— Caleb Bryant Founder, hard AIs, LLC