Multimodal Chat Experiences¶
NeuroLink 7.47.0 introduces full multimodal pipelines so you can mix text, URLs, and local images in a single interaction. The CLI, SDK, and loop sessions all use the same message builder, ensuring parity across workflows.
What You Get¶
- Unified CLI flag –
--image
accepts multiple file paths or HTTPS URLs per request. - SDK parity – pass
input.images
(buffers, file paths, or URLs) and stream structured outputs. - Provider fallbacks – orchestration automatically retries compatible multimodal models.
- Streaming support –
neurolink stream
renders partial responses while images upload in the background.
Format Support
The image input accepts three formats: Buffer objects (from readFileSync
), local file paths (relative or absolute), or HTTPS URLs. All formats are automatically converted to the provider's required encoding.
Supported Providers & Models¶
Provider Compatibility
Not all providers support multimodal inputs. Verify your chosen model has the vision
capability using npx @juspay/neurolink models list --capability vision
. Unsupported providers will return an error or ignore image inputs.
Provider | Recommended Models | Notes |
---|---|---|
google-ai , vertex |
gemini-2.5-pro , gemini-2.5-flash |
Local files and URLs supported. |
openai , azure |
gpt-4o , gpt-4o-mini |
Requires OPENAI_API_KEY or Azure deployment name + key. |
anthropic , bedrock |
claude-3.5-sonnet , claude-3.7-sonnet |
Bedrock needs region + credentials. |
litellm |
Any upstream multimodal model | Ensure LiteLLM server exposes vision capability. |
Use
npx @juspay/neurolink models list --capability vision
to see the full list fromconfig/models.json
.
Prerequisites¶
- Provider credentials with vision/multimodal permissions.
- Latest CLI (
npm
,pnpm
, ornpx
) or SDK>=7.47.0
. - Optional: Redis if you want images stored alongside loop-session history.
CLI Quick Start¶
# Attach a local file (auto-converted to base64)
npx @juspay/neurolink generate "Describe this interface" \
--image ./designs/dashboard.png --provider google-ai
# Reference a remote URL (downloaded on the fly)
npx @juspay/neurolink generate "Summarise these guidelines" \
--image https://example.com/policy.pdf --provider openai --model gpt-4o
# Mix multiple images and enable analytics/evaluation
npx @juspay/neurolink generate "QA review" \
--image ./screenshots/before.png \
--image ./screenshots/after.png \
--enableAnalytics --enableEvaluation --format json
Streaming & Loop Sessions¶
# Stream while uploading a diagram
npx @juspay/neurolink stream "Explain this architecture" \
--image ./diagrams/system.png
# Persist images inside loop mode (Redis auto-detected when available)
npx @juspay/neurolink loop --enable-conversation-memory
> set provider google-ai
> generate Compare the attached charts --image ./charts/q3.png
SDK Usage¶
import { readFileSync } from "node:fs";
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink({ enableOrchestration: true }); // (1)!
const result = await neurolink.generate({
input: {
text: "Provide a marketing summary of these screenshots", // (2)!
images: [
// (3)!
readFileSync("./assets/homepage.png"), // (4)!
"https://example.com/reports/nps-chart.png", // (5)!
],
},
provider: "google-ai", // (6)!
enableEvaluation: true, // (7)!
region: "us-east-1",
});
console.log(result.content);
console.log(result.evaluation?.overallScore);
- Enable provider orchestration for automatic multimodal fallbacks
- Text prompt describing what you want from the images
- Array of images in multiple formats
- Local file as Buffer (auto-converted to base64)
- Remote URL (downloaded and encoded automatically)
- Choose a vision-capable provider
- Optionally evaluate the quality of multimodal responses
Use stream()
with the same structure when you need incremental tokens:
const stream = await neurolink.stream({
input: {
text: "Walk through the attached floor plan",
images: ["./plans/level1.jpg"], // (1)!
},
provider: "openai", // (2)!
});
for await (const chunk of stream) {
// (3)!
process.stdout.write(chunk.text ?? "");
}
- Accepts file path, Buffer, or HTTPS URL
- OpenAI's GPT-4o and GPT-4o-mini support vision
- Stream text responses while image uploads in background
Configuration & Tuning¶
- Image sources – Local paths are resolved relative to
process.cwd()
. URLs must be HTTPS. - Size limits – Providers cap images at ~20 MB. Resize or compress large assets before sending.
- Multiple images – Order matters; the builder interleaves captions in the order provided.
- Region routing – Set
region
on each request (e.g.,us-east-1
) for providers that enforce locality. - Loop sessions – Images uploaded during
loop
are cached per session; callclear session
to reset.
Best Practices¶
- Provide short captions in the prompt describing each image (e.g., "see
before.png
on the left"). - Combine analytics + evaluation to benchmark multimodal quality before rolling out widely.
- Cache remote assets locally if you reuse them frequently to avoid repeated downloads.
- Stream when presenting content to end-users; use
generate
when you need structured JSON output.
CSV File Support¶
Quick Start¶
# Auto-detect CSV files
npx @juspay/neurolink generate "Analyze sales trends" \
--file ./sales_2024.csv
# Explicit CSV with options
npx @juspay/neurolink generate "Summarize data" \
--csv ./data.csv \
--csv-max-rows 500 \
--csv-format raw
SDK Usage¶
// Auto-detect (recommended)
await neurolink.generate({
input: {
text: "Analyze this data",
files: ["./data.csv", "./chart.png"],
},
});
// Explicit CSV
await neurolink.generate({
input: {
text: "Compare quarters",
csvFiles: ["./q1.csv", "./q2.csv"],
},
csvOptions: {
maxRows: 1000,
formatStyle: "raw",
},
});
Format Options¶
- raw (default) - Best for large files, minimal token usage
- json - Structured data, easier parsing, higher token usage
- markdown - Readable tables, good for small datasets (<100 rows)
Best Practices¶
- Use raw format for large files to minimize token usage
- Use JSON format for structured data processing
- Limit to 1000 rows by default (configurable up to 10K)
- Combine CSV with visualization images for comprehensive analysis
- Works with ALL providers (not just vision-capable models)
Troubleshooting¶
Symptom | Action |
---|---|
Image not found |
Check relative paths from the directory where you invoked the CLI. |
Provider does not support images |
Switch to a model listed in the table above or enable orchestration. |
Error downloading image |
Ensure the URL responds with status 200 and does not require auth. |
Large response latency |
Pre-compress images and reduce resolution to under 2 MP when possible. |
Streaming ends early |
Disable tools (--disableTools ) to avoid tool calls that may not support vision. |
Related Features¶
Q4 2025 Features:
- Guardrails Middleware – Content filtering for multimodal outputs
- Auto Evaluation – Quality scoring for vision-based responses
Documentation:
- CLI Commands – CLI flags & options
- SDK API Reference – Generate/stream APIs
- Troubleshooting – Extended error catalogue