Hugging Face Provider Guide¶
Access 100,000+ open-source AI models through Hugging Face's free inference API
Overview¶
Hugging Face is the world's largest platform for open-source AI models, hosting over 100,000 models spanning text generation, code generation, translation, summarization, and more. NeuroLink's Hugging Face provider gives you free access to this vast ecosystem through a unified interface.
Free Tier Advantage
Hugging Face's inference API is completely free with no rate limits for most models. Perfect for development, testing, and low-to-medium production workloads without any cost concerns.
Key Benefits¶
- 🆓 Free Access: No API costs - completely free to use
- 🌍 100,000+ Models: Largest collection of open-source models
- 🔓 Open Source: All models are open and transparent
- ⚡ Quick Start: No credit card required
- 🎯 Specialized Models: Models fine-tuned for specific tasks
- 🔬 Research-Friendly: Access to latest research models
Use Cases¶
- Experimentation: Try different models without cost concerns
- Research: Access cutting-edge research models
- Budget-Constrained: Production usage without API costs
- Specialized Tasks: Fine-tuned models for specific domains
- Learning: Perfect for students and developers learning AI
Quick Start¶
1. Get Your API Token¶
- Visit Hugging Face
- Create a free account (no credit card required)
- Go to Settings → Access Tokens
- Click "New token"
- Give it a name (e.g., "NeuroLink")
- Select "Read" permissions
- Copy the token (starts with
hf_...
)
2. Configure NeuroLink¶
Add to your .env
file:
Security Best Practice
Never commit your API token to version control. Always use environment variables and add .env
to your .gitignore
file.
3. Test the Setup¶
# CLI - Test with default model
npx @juspay/neurolink generate "Hello from Hugging Face!" --provider huggingface
# CLI - Use specific model
npx @juspay/neurolink generate "Write a poem" --provider huggingface --model "mistralai/Mistral-7B-Instruct-v0.2"
# SDK
node -e "
const { NeuroLink } = require('@juspay/neurolink');
(async () => {
const ai = new NeuroLink();
const result = await ai.generate({
input: { text: 'Hello from Hugging Face!' },
provider: 'huggingface'
});
console.log(result.content);
})();
"
Model Selection Guide¶
Popular Models by Category¶
1. General Text Generation¶
Model | Size | Description | Best For |
---|---|---|---|
mistralai/Mistral-7B-Instruct-v0.2 |
7B | High-quality instruction following | General tasks, fast responses |
meta-llama/Llama-2-7b-chat-hf |
7B | Meta's open chat model | Conversational AI |
tiiuae/falcon-7b-instruct |
7B | Efficient, multilingual | Multiple languages |
google/flan-t5-xxl |
11B | Google's instruction-tuned | Q&A, summarization |
2. Code Generation¶
Model | Description | Best For |
---|---|---|
bigcode/starcoder |
Code generation specialist | Writing code |
Salesforce/codegen-16B-mono |
Python-focused | Python development |
WizardLM/WizardCoder-15B-V1.0 |
Code instruction following | Complex coding tasks |
3. Summarization¶
Model | Description | Best For |
---|---|---|
facebook/bart-large-cnn |
News summarization | Articles, news |
sshleifer/distilbart-cnn-12-6 |
Faster BART variant | Quick summaries |
google/pegasus-xsum |
Extreme summarization | Very brief summaries |
4. Translation¶
Model | Languages | Best For |
---|---|---|
facebook/mbart-large-50-many-to-many-mmt |
50 languages | Multi-language translation |
Helsinki-NLP/opus-mt-* |
Language pairs | Specific language pairs |
5. Question Answering¶
Model | Description | Best For |
---|---|---|
deepset/roberta-base-squad2 |
SQuAD-trained | Factual Q&A |
distilbert-base-cased-distilled-squad |
Faster QA | Quick answers |
Model Selection by Use Case¶
// General conversation
const general = await ai.generate({
input: { text: "Explain quantum computing" },
provider: "huggingface",
model: "mistralai/Mistral-7B-Instruct-v0.2",
});
// Code generation
const code = await ai.generate({
input: { text: "Write a Python function to sort a list" },
provider: "huggingface",
model: "bigcode/starcoder",
});
// Summarization
const summary = await ai.generate({
input: { text: "Summarize: [long article text]" },
provider: "huggingface",
model: "facebook/bart-large-cnn",
});
// Translation
const translation = await ai.generate({
input: { text: "Translate to French: Hello, how are you?" },
provider: "huggingface",
model: "facebook/mbart-large-50-many-to-many-mmt",
});
Free Tier Details¶
What's Included¶
- ✅ Unlimited requests to public models
- ✅ No cost - completely free
- ✅ No credit card required
- ✅ Rate limits: 1,000 requests/day per model (generous)
- ✅ Access to 100,000+ public models
Rate Limits¶
- Per Model: ~1,000 requests/day
- Strategy: Use different models to scale
- Best Practice: Combine with other providers for production
// Rate limit friendly approach
const ai = new NeuroLink({
providers: [
{ name: "huggingface", priority: 1 }, // Free tier first
{ name: "google-ai", priority: 2 }, // Fallback to Google AI
],
});
Limitations¶
⚠️ Free Tier Constraints:
- Models load on-demand (first request may be slow)
- Rate limits per model (use multiple models to scale)
- No guaranteed uptime (community infrastructure)
- Some popular models may have queues
💡 For Production:
- Use Hugging Face for experimentation
- Consider paid inference for critical workloads
- Combine with other providers for reliability
SDK Integration¶
Basic Usage¶
import { NeuroLink } from "@juspay/neurolink";
const ai = new NeuroLink();
// Simple generation
const result = await ai.generate({
input: { text: "Write a haiku about coding" },
provider: "huggingface",
});
console.log(result.content);
With Specific Model¶
// Use Mistral for instruction following
const mistral = await ai.generate({
input: { text: "Explain Docker in simple terms" },
provider: "huggingface",
model: "mistralai/Mistral-7B-Instruct-v0.2",
});
// Use StarCoder for code generation
const starcoder = await ai.generate({
input: { text: "Create a REST API endpoint in Express.js" },
provider: "huggingface",
model: "bigcode/starcoder",
});
Multi-Model Strategy¶
// Try multiple models for best results
const models = [
"mistralai/Mistral-7B-Instruct-v0.2",
"meta-llama/Llama-2-7b-chat-hf",
"tiiuae/falcon-7b-instruct",
];
for (const model of models) {
try {
const result = await ai.generate({
input: { text: "Your prompt here" },
provider: "huggingface",
model,
});
console.log(`${model}: ${result.content}`);
} catch (error) {
console.log(`${model} failed, trying next...`);
}
}
With Streaming¶
// Stream responses for better UX
for await (const chunk of ai.stream({
input: { text: "Write a long story about space exploration" },
provider: "huggingface",
model: "mistralai/Mistral-7B-Instruct-v0.2",
})) {
process.stdout.write(chunk.content);
}
With Error Handling¶
try {
const result = await ai.generate({
input: { text: "Your prompt" },
provider: "huggingface",
maxTokens: 500,
temperature: 0.7,
});
console.log(result.content);
} catch (error) {
if (error.message.includes("rate limit")) {
console.log("Rate limited - try another model or wait");
} else if (error.message.includes("loading")) {
console.log("Model is loading - try again in a moment");
} else {
console.error("Error:", error.message);
}
}
CLI Usage¶
Basic Commands¶
# Generate with default model
npx @juspay/neurolink generate "Hello world" --provider huggingface
# Use specific model
npx @juspay/neurolink gen "Write code" --provider huggingface --model "bigcode/starcoder"
# Stream response
npx @juspay/neurolink stream "Tell a story" --provider huggingface
# Check available models
npx @juspay/neurolink models --provider huggingface
Advanced Usage¶
# With temperature control
npx @juspay/neurolink gen "Creative story" \
--provider huggingface \
--model "mistralai/Mistral-7B-Instruct-v0.2" \
--temperature 0.9 \
--max-tokens 1000
# Save output to file
npx @juspay/neurolink gen "Technical documentation" \
--provider huggingface \
--model "tiiuae/falcon-7b-instruct" \
> output.txt
# Interactive mode
npx @juspay/neurolink loop --provider huggingface
Model Comparison¶
# Compare different models
for model in "mistralai/Mistral-7B-Instruct-v0.2" \
"meta-llama/Llama-2-7b-chat-hf" \
"tiiuae/falcon-7b-instruct"; do
echo "Testing $model:"
npx @juspay/neurolink gen "What is AI?" \
--provider huggingface \
--model "$model"
echo "---"
done
Configuration Options¶
Environment Variables¶
# Required
HUGGINGFACE_API_KEY=hf_your_token_here
# Optional
HUGGINGFACE_BASE_URL=https://api-inference.huggingface.co # Custom endpoint
HUGGINGFACE_DEFAULT_MODEL=mistralai/Mistral-7B-Instruct-v0.2 # Default model
HUGGINGFACE_TIMEOUT=60000 # Request timeout (ms)
Programmatic Configuration¶
const ai = new NeuroLink({
providers: [
{
name: "huggingface",
config: {
apiKey: process.env.HUGGINGFACE_API_KEY,
defaultModel: "mistralai/Mistral-7B-Instruct-v0.2",
timeout: 60000,
},
},
],
});
Troubleshooting¶
Common Issues¶
1. "Model is currently loading"¶
Problem: Model hasn't been used recently and needs to load.
Solution:
# Wait 20-30 seconds and retry
# Or use a popular model that's always loaded
npx @juspay/neurolink gen "test" \
--provider huggingface \
--model "mistralai/Mistral-7B-Instruct-v0.2"
2. "Rate limit exceeded"¶
Problem: Hit the ~1,000 requests/day limit for a model.
Solution:
// Switch to a different model
const alternativeModels = [
"mistralai/Mistral-7B-Instruct-v0.2",
"tiiuae/falcon-7b-instruct",
"meta-llama/Llama-2-7b-chat-hf",
];
// Or use multi-provider fallback
const ai = new NeuroLink({
providers: [
{ name: "huggingface", priority: 1 },
{ name: "google-ai", priority: 2 }, // Fallback
],
});
3. "Invalid API token"¶
Problem: Token is incorrect or expired.
Solution:
- Verify token at https://huggingface.co/settings/tokens
- Ensure token has "Read" permissions
- Check for typos in
.env
file - Token should start with
hf_
4. "Model not found"¶
Problem: Model name is incorrect or private.
Solution:
# Verify model exists at huggingface.co
# Use exact model ID: username/model-name
npx @juspay/neurolink gen "test" \
--provider huggingface \
--model "mistralai/Mistral-7B-Instruct-v0.2" # ✅ Correct format
5. Slow Response Times¶
Problem: Model is loading or under high load.
Solution:
- Use popular models (always loaded)
- Add timeout handling
- Consider caching results
- Use streaming for long responses
const result = await ai.generate({
input: { text: "Your prompt" },
provider: "huggingface",
timeout: 120000, // 2 minute timeout
});
Best Practices¶
1. Model Selection¶
// ✅ Good: Use appropriate model for task
const code = await ai.generate({
input: { text: "Write a function" },
model: "bigcode/starcoder", // Code specialist
});
// ❌ Avoid: Using general model for specialized tasks
const badCode = await ai.generate({
input: { text: "Write a function" },
model: "google/flan-t5-xxl", // General model
});
2. Rate Limit Management¶
// ✅ Good: Rotate between models
const models = [
"mistralai/Mistral-7B-Instruct-v0.2",
"tiiuae/falcon-7b-instruct",
"meta-llama/Llama-2-7b-chat-hf",
];
let requestCount = 0; // Track the number of requests
const modelIndex = requestCount % models.length;
const result = await ai.generate({
input: { text: prompt },
provider: "huggingface",
model: models[modelIndex],
});
requestCount++; // Increment after each request
3. Error Handling¶
// ✅ Good: Handle model loading gracefully
async function generateWithRetry(prompt, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await ai.generate({
input: { text: prompt },
provider: "huggingface",
});
} catch (error) {
if (error.message.includes("loading") && i < maxRetries - 1) {
console.log("Model loading, waiting 30s...");
await new Promise((resolve) => setTimeout(resolve, 30000));
} else {
throw error;
}
}
}
}
4. Production Deployment¶
// ✅ Good: Use Hugging Face with fallback
const ai = new NeuroLink({
providers: [
{
name: "huggingface",
priority: 1,
config: {
defaultModel: "mistralai/Mistral-7B-Instruct-v0.2",
},
},
{
name: "google-ai", // Free tier fallback
priority: 2,
},
{
name: "anthropic", // Paid fallback for critical
priority: 3,
},
],
});
Performance Optimization¶
1. Model Warm-Up¶
// Keep popular models warm with periodic requests
setInterval(async () => {
await ai.generate({
input: { text: "ping" },
provider: "huggingface",
model: "mistralai/Mistral-7B-Instruct-v0.2",
maxTokens: 1,
});
}, 300000); // Every 5 minutes
2. Caching¶
// Cache responses for repeated queries
const cache = new Map();
async function cachedGenerate(prompt) {
if (cache.has(prompt)) {
return cache.get(prompt);
}
const result = await ai.generate({
input: { text: prompt },
provider: "huggingface",
});
cache.set(prompt, result);
return result;
}
3. Parallel Requests¶
// Use different models in parallel to avoid rate limits
const prompts = ["prompt1", "prompt2", "prompt3"];
const models = [
"mistralai/Mistral-7B-Instruct-v0.2",
"tiiuae/falcon-7b-instruct",
"meta-llama/Llama-2-7b-chat-hf",
];
const results = await Promise.all(
prompts.map((prompt, i) =>
ai.generate({
input: { text: prompt },
provider: "huggingface",
model: models[i],
}),
),
);
Related Documentation¶
- Provider Setup Guide - General provider configuration
- SDK API Reference - Complete API documentation
- CLI Commands - CLI reference
- Multi-Provider Failover - Enterprise patterns
Additional Resources¶
- Hugging Face Models - Browse all models
- Hugging Face Inference API - API documentation
- Model Cards - Understanding model capabilities
- Hugging Face Hub - Platform documentation
Need Help? Join our GitHub Discussions or open an issue.