Hugging Face Provider Guide¶

Access 100,000+ open-source AI models through Hugging Face's free inference API

Overview¶

Hugging Face is the world's largest platform for open-source AI models, hosting over 100,000 models spanning text generation, code generation, translation, summarization, and more. NeuroLink's Hugging Face provider gives you free access to this vast ecosystem through a unified interface.

Free Tier Advantage

Hugging Face's inference API is completely free with no rate limits for most models. Perfect for development, testing, and low-to-medium production workloads without any cost concerns.

Key Benefits¶

🆓 Free Access: No API costs - completely free to use
🌍 100,000+ Models: Largest collection of open-source models
🔓 Open Source: All models are open and transparent
⚡ Quick Start: No credit card required
🎯 Specialized Models: Models fine-tuned for specific tasks
🔬 Research-Friendly: Access to latest research models

Use Cases¶

Experimentation: Try different models without cost concerns
Research: Access cutting-edge research models
Budget-Constrained: Production usage without API costs
Specialized Tasks: Fine-tuned models for specific domains
Learning: Perfect for students and developers learning AI

Quick Start¶

1. Get Your API Token¶

Visit Hugging Face
Create a free account (no credit card required)
Go to Settings → Access Tokens
Click "New token"
Give it a name (e.g., "NeuroLink")
Select "Read" permissions
Copy the token (starts with hf_...)

2. Configure NeuroLink¶

Add to your .env file:

HUGGINGFACE_API_KEY=hf_your_token_here

Security Best Practice

Never commit your API token to version control. Always use environment variables and add .env to your .gitignore file.

3. Test the Setup¶

# CLI - Test with default model
npx @juspay/neurolink generate "Hello from Hugging Face!" --provider huggingface

# CLI - Use specific model
npx @juspay/neurolink generate "Write a poem" --provider huggingface --model "mistralai/Mistral-7B-Instruct-v0.2"

# SDK
node -e "
const { NeuroLink } = require('@juspay/neurolink');
(async () => {
  const ai = new NeuroLink();
  const result = await ai.generate({
    input: { text: 'Hello from Hugging Face!' },
    provider: 'huggingface'
  });
  console.log(result.content);
})();
"

Model Selection Guide¶

Popular Models by Category¶

1. General Text Generation¶

Model	Size	Description	Best For
`mistralai/Mistral-7B-Instruct-v0.2`	7B	High-quality instruction following	General tasks, fast responses
`meta-llama/Llama-2-7b-chat-hf`	7B	Meta's open chat model	Conversational AI
`tiiuae/falcon-7b-instruct`	7B	Efficient, multilingual	Multiple languages
`google/flan-t5-xxl`	11B	Google's instruction-tuned	Q&A, summarization

2. Code Generation¶

Model	Description	Best For
`bigcode/starcoder`	Code generation specialist	Writing code
`Salesforce/codegen-16B-mono`	Python-focused	Python development
`WizardLM/WizardCoder-15B-V1.0`	Code instruction following	Complex coding tasks

3. Summarization¶

Model	Description	Best For
`facebook/bart-large-cnn`	News summarization	Articles, news
`sshleifer/distilbart-cnn-12-6`	Faster BART variant	Quick summaries
`google/pegasus-xsum`	Extreme summarization	Very brief summaries

4. Translation¶

Model	Languages	Best For
`facebook/mbart-large-50-many-to-many-mmt`	50 languages	Multi-language translation
`Helsinki-NLP/opus-mt-*`	Language pairs	Specific language pairs

5. Question Answering¶

Model	Description	Best For
`deepset/roberta-base-squad2`	SQuAD-trained	Factual Q&A
`distilbert-base-cased-distilled-squad`	Faster QA	Quick answers

Model Selection by Use Case¶

// General conversation
const general = await ai.generate({
  input: { text: "Explain quantum computing" },
  provider: "huggingface",
  model: "mistralai/Mistral-7B-Instruct-v0.2",
});

// Code generation
const code = await ai.generate({
  input: { text: "Write a Python function to sort a list" },
  provider: "huggingface",
  model: "bigcode/starcoder",
});

// Summarization
const summary = await ai.generate({
  input: { text: "Summarize: [long article text]" },
  provider: "huggingface",
  model: "facebook/bart-large-cnn",
});

// Translation
const translation = await ai.generate({
  input: { text: "Translate to French: Hello, how are you?" },
  provider: "huggingface",
  model: "facebook/mbart-large-50-many-to-many-mmt",
});

Free Tier Details¶

What's Included¶

✅ Unlimited requests to public models
✅ No cost - completely free
✅ No credit card required
✅ Rate limits: 1,000 requests/day per model (generous)
✅ Access to 100,000+ public models

Rate Limits¶

Per Model: ~1,000 requests/day
Strategy: Use different models to scale
Best Practice: Combine with other providers for production

// Rate limit friendly approach
const ai = new NeuroLink({
  providers: [
    { name: "huggingface", priority: 1 }, // Free tier first
    { name: "google-ai", priority: 2 }, // Fallback to Google AI
  ],
});

Limitations¶

⚠️ Free Tier Constraints:

Models load on-demand (first request may be slow)
Rate limits per model (use multiple models to scale)
No guaranteed uptime (community infrastructure)
Some popular models may have queues

💡 For Production:

Use Hugging Face for experimentation
Consider paid inference for critical workloads
Combine with other providers for reliability

SDK Integration¶

Basic Usage¶

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

// Simple generation
const result = await ai.generate({
  input: { text: "Write a haiku about coding" },
  provider: "huggingface",
});

console.log(result.content);

With Specific Model¶

// Use Mistral for instruction following
const mistral = await ai.generate({
  input: { text: "Explain Docker in simple terms" },
  provider: "huggingface",
  model: "mistralai/Mistral-7B-Instruct-v0.2",
});

// Use StarCoder for code generation
const starcoder = await ai.generate({
  input: { text: "Create a REST API endpoint in Express.js" },
  provider: "huggingface",
  model: "bigcode/starcoder",
});

Multi-Model Strategy¶

// Try multiple models for best results
const models = [
  "mistralai/Mistral-7B-Instruct-v0.2",
  "meta-llama/Llama-2-7b-chat-hf",
  "tiiuae/falcon-7b-instruct",
];

for (const model of models) {
  try {
    const result = await ai.generate({
      input: { text: "Your prompt here" },
      provider: "huggingface",
      model,
    });
    console.log(`${model}: ${result.content}`);
  } catch (error) {
    console.log(`${model} failed, trying next...`);
  }
}

With Streaming¶

// Stream responses for better UX
for await (const chunk of ai.stream({
  input: { text: "Write a long story about space exploration" },
  provider: "huggingface",
  model: "mistralai/Mistral-7B-Instruct-v0.2",
})) {
  process.stdout.write(chunk.content);
}

With Error Handling¶

try {
  const result = await ai.generate({
    input: { text: "Your prompt" },
    provider: "huggingface",
    maxTokens: 500,
    temperature: 0.7,
  });
  console.log(result.content);
} catch (error) {
  if (error.message.includes("rate limit")) {
    console.log("Rate limited - try another model or wait");
  } else if (error.message.includes("loading")) {
    console.log("Model is loading - try again in a moment");
  } else {
    console.error("Error:", error.message);
  }
}

CLI Usage¶

Basic Commands¶

# Generate with default model
npx @juspay/neurolink generate "Hello world" --provider huggingface

# Use specific model
npx @juspay/neurolink gen "Write code" --provider huggingface --model "bigcode/starcoder"

# Stream response
npx @juspay/neurolink stream "Tell a story" --provider huggingface

# Check available models
npx @juspay/neurolink models --provider huggingface

Advanced Usage¶

# With temperature control
npx @juspay/neurolink gen "Creative story" \
  --provider huggingface \
  --model "mistralai/Mistral-7B-Instruct-v0.2" \
  --temperature 0.9 \
  --max-tokens 1000

# Save output to file
npx @juspay/neurolink gen "Technical documentation" \
  --provider huggingface \
  --model "tiiuae/falcon-7b-instruct" \
  > output.txt

# Interactive mode
npx @juspay/neurolink loop --provider huggingface

Model Comparison¶

# Compare different models
for model in "mistralai/Mistral-7B-Instruct-v0.2" \
             "meta-llama/Llama-2-7b-chat-hf" \
             "tiiuae/falcon-7b-instruct"; do
  echo "Testing $model:"
  npx @juspay/neurolink gen "What is AI?" \
    --provider huggingface \
    --model "$model"
  echo "---"
done

Configuration Options¶

Environment Variables¶

# Required
HUGGINGFACE_API_KEY=hf_your_token_here

# Optional
HUGGINGFACE_BASE_URL=https://api-inference.huggingface.co  # Custom endpoint
HUGGINGFACE_DEFAULT_MODEL=mistralai/Mistral-7B-Instruct-v0.2  # Default model
HUGGINGFACE_TIMEOUT=60000  # Request timeout (ms)

Programmatic Configuration¶

const ai = new NeuroLink({
  providers: [
    {
      name: "huggingface",
      config: {
        apiKey: process.env.HUGGINGFACE_API_KEY,
        defaultModel: "mistralai/Mistral-7B-Instruct-v0.2",
        timeout: 60000,
      },
    },
  ],
});

Troubleshooting¶

Common Issues¶

1. "Model is currently loading"¶

Problem: Model hasn't been used recently and needs to load.

Solution:

# Wait 20-30 seconds and retry
# Or use a popular model that's always loaded
npx @juspay/neurolink gen "test" \
  --provider huggingface \
  --model "mistralai/Mistral-7B-Instruct-v0.2"

2. "Rate limit exceeded"¶

Problem: Hit the ~1,000 requests/day limit for a model.

Solution:

// Switch to a different model
const alternativeModels = [
  "mistralai/Mistral-7B-Instruct-v0.2",
  "tiiuae/falcon-7b-instruct",
  "meta-llama/Llama-2-7b-chat-hf",
];

// Or use multi-provider fallback
const ai = new NeuroLink({
  providers: [
    { name: "huggingface", priority: 1 },
    { name: "google-ai", priority: 2 }, // Fallback
  ],
});

3. "Invalid API token"¶

Problem: Token is incorrect or expired.

Solution:

Verify token at https://huggingface.co/settings/tokens
Ensure token has "Read" permissions
Check for typos in .env file
Token should start with hf_

4. "Model not found"¶

Problem: Model name is incorrect or private.

Solution:

# Verify model exists at huggingface.co
# Use exact model ID: username/model-name
npx @juspay/neurolink gen "test" \
  --provider huggingface \
  --model "mistralai/Mistral-7B-Instruct-v0.2"  # ✅ Correct format

5. Slow Response Times¶

Problem: Model is loading or under high load.

Solution:

Use popular models (always loaded)
Add timeout handling
Consider caching results
Use streaming for long responses

const result = await ai.generate({
  input: { text: "Your prompt" },
  provider: "huggingface",
  timeout: 120000, // 2 minute timeout
});

Best Practices¶

1. Model Selection¶

// ✅ Good: Use appropriate model for task
const code = await ai.generate({
  input: { text: "Write a function" },
  model: "bigcode/starcoder", // Code specialist
});

// ❌ Avoid: Using general model for specialized tasks
const badCode = await ai.generate({
  input: { text: "Write a function" },
  model: "google/flan-t5-xxl", // General model
});

2. Rate Limit Management¶

// ✅ Good: Rotate between models
const models = [
  "mistralai/Mistral-7B-Instruct-v0.2",
  "tiiuae/falcon-7b-instruct",
  "meta-llama/Llama-2-7b-chat-hf",
];

let requestCount = 0; // Track the number of requests
const modelIndex = requestCount % models.length;
const result = await ai.generate({
  input: { text: prompt },
  provider: "huggingface",
  model: models[modelIndex],
});
requestCount++; // Increment after each request

3. Error Handling¶

// ✅ Good: Handle model loading gracefully
async function generateWithRetry(prompt, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await ai.generate({
        input: { text: prompt },
        provider: "huggingface",
      });
    } catch (error) {
      if (error.message.includes("loading") && i < maxRetries - 1) {
        console.log("Model loading, waiting 30s...");
        await new Promise((resolve) => setTimeout(resolve, 30000));
      } else {
        throw error;
      }
    }
  }
}

4. Production Deployment¶

// ✅ Good: Use Hugging Face with fallback
const ai = new NeuroLink({
  providers: [
    {
      name: "huggingface",
      priority: 1,
      config: {
        defaultModel: "mistralai/Mistral-7B-Instruct-v0.2",
      },
    },
    {
      name: "google-ai", // Free tier fallback
      priority: 2,
    },
    {
      name: "anthropic", // Paid fallback for critical
      priority: 3,
    },
  ],
});

Performance Optimization¶

1. Model Warm-Up¶

// Keep popular models warm with periodic requests
setInterval(async () => {
  await ai.generate({
    input: { text: "ping" },
    provider: "huggingface",
    model: "mistralai/Mistral-7B-Instruct-v0.2",
    maxTokens: 1,
  });
}, 300000); // Every 5 minutes

2. Caching¶

// Cache responses for repeated queries
const cache = new Map();

async function cachedGenerate(prompt) {
  if (cache.has(prompt)) {
    return cache.get(prompt);
  }

  const result = await ai.generate({
    input: { text: prompt },
    provider: "huggingface",
  });

  cache.set(prompt, result);
  return result;
}

3. Parallel Requests¶

// Use different models in parallel to avoid rate limits
const prompts = ["prompt1", "prompt2", "prompt3"];
const models = [
  "mistralai/Mistral-7B-Instruct-v0.2",
  "tiiuae/falcon-7b-instruct",
  "meta-llama/Llama-2-7b-chat-hf",
];

const results = await Promise.all(
  prompts.map((prompt, i) =>
    ai.generate({
      input: { text: prompt },
      provider: "huggingface",
      model: models[i],
    }),
  ),
);

Provider Setup Guide - General provider configuration
SDK API Reference - Complete API documentation
CLI Commands - CLI reference
Multi-Provider Failover - Enterprise patterns

Additional Resources¶

Hugging Face Models - Browse all models
Hugging Face Inference API - API documentation
Model Cards - Understanding model capabilities
Hugging Face Hub - Platform documentation

Need Help? Join our GitHub Discussions or open an issue.

Hugging Face Provider Guide¶

Overview¶

Key Benefits¶

Use Cases¶

Quick Start¶

1. Get Your API Token¶

2. Configure NeuroLink¶

3. Test the Setup¶

Model Selection Guide¶

Popular Models by Category¶

1. General Text Generation¶

2. Code Generation¶

3. Summarization¶

4. Translation¶

5. Question Answering¶

Model Selection by Use Case¶

Free Tier Details¶

What's Included¶

Rate Limits¶

Limitations¶

SDK Integration¶

Basic Usage¶

With Specific Model¶

Multi-Model Strategy¶

With Streaming¶

With Error Handling¶

CLI Usage¶

Basic Commands¶

Advanced Usage¶

Model Comparison¶

Configuration Options¶

Environment Variables¶

Programmatic Configuration¶

Troubleshooting¶

Common Issues¶

1. "Model is currently loading"¶

2. "Rate limit exceeded"¶

3. "Invalid API token"¶

4. "Model not found"¶

5. Slow Response Times¶

Best Practices¶

1. Model Selection¶

2. Rate Limit Management¶

3. Error Handling¶

4. Production Deployment¶

Performance Optimization¶

1. Model Warm-Up¶

2. Caching¶

3. Parallel Requests¶

Related Documentation¶

Additional Resources¶