Skip to content

Import from Hugging Face

Import Q&A datasets directly from Hugging Face Hub into your FloTorch workspace. This guide explains what you need and how to use the import form.

Prerequisites

A dataset of type Question and Answer Pair (Q&A Pair)
During dataset configuration, select Import Question & Answer Pair from HuggingFace as the creation method
The dataset repository ID from Hugging Face
Knowledge of the file structure in the target dataset (file names and column names)

Where to Import

Open your Workspace → Datasets
Click Create Dataset
Select Question and Answer Pair (Q&A Pair)
Click Configure
Choose Import Question & Answer Pair from HuggingFace as the creation method
Complete the import form and click Create Dataset

Form Fields

Required

Field	Description	Example
Repository	Hugging Face dataset repository ID in `username/dataset-name` format	`allenai/mmlu`
GT File	Filename of the ground truth / evaluation file in the repo	`test.jsonl`, `validation.jsonl`
Question Map From	Name of the source column that contains the question/prompt	`input`, `question`, `prompt`
Answer Map From	Name of the source column that contains the expected answer	`output`, `answer`, `target`
Limit	Number of Q&A pairs to import	`100`

Optional

Field	Description	Example
Access Token	Hugging Face token for private repositories. Leave empty for public datasets	`hf_xxxx...`
Examples File	Filename for few-shot examples (used for N-shot prompts)	`train.jsonl`
Examples Question Map From	Column name for questions in the examples file	`input`
Examples Answer Map From	Column name for answers in the examples file	`output`

How to Find Values

1. Repository ID

Browse datasets: huggingface.co/datasets
From the dataset page URL: https://huggingface.co/datasets/allenai/mmlu → use allenai/mmlu

2. File Names and Columns

Open the dataset page on Hugging Face
Go to the Files and versions tab to see available files (e.g. train.jsonl, test.jsonl)
Use the Preview or Explore feature to inspect the structure and column names

3. Access Token (Private Repos Only)

Go to Hugging Face → Settings → Access Tokens
Create a token with read access
Paste it in the Access Token field when importing a private dataset

Example: allenai/mmlu

Field	Value
Repository	`allenai/mmlu`
GT File	`test.jsonl` (or the appropriate split)
Question Map From	`input`
Answer Map From	`target`
Limit	`100`

After Import

An import job runs in the background
The new dataset appears in the datasets list once created
When complete, imported Q&A pairs are available in that dataset