Skip to content

Import from Hugging Face

Import Q&A datasets directly from Hugging Face Hub into your FloTorch workspace. This guide explains what you need and how to use the import form.

  • A dataset created with HuggingFace as the source (in Dataset Form → Source)
  • The dataset repository ID from Hugging Face
  • Knowledge of the file structure in the target dataset (file names and column names)
  1. Open your WorkspaceDatasets
  2. Open a dataset with HuggingFace source
  3. Go to the Files tab → click AddImport from Hugging Face
  4. Or: Dataset detail page → AddImport from Hugging Face
FieldDescriptionExample
RepositoryHugging Face dataset repository ID in username/dataset-name formatallenai/mmlu
GT FileFilename of the ground truth / evaluation file in the repotest.jsonl, validation.jsonl
Question Map FromName of the source column that contains the question/promptinput, question, prompt
Answer Map FromName of the source column that contains the expected answeroutput, answer, target
LimitNumber of Q&A pairs to import100
FieldDescriptionExample
Access TokenHugging Face token for private repositories. Leave empty for public datasetshf_xxxx...
Examples FileFilename for few-shot examples (used for N-Shot Prompts)train.jsonl
Examples Question Map FromColumn name for questions in the examples fileinput
Examples Answer Map FromColumn name for answers in the examples fileoutput
  • Browse datasets: huggingface.co/datasets
  • From the dataset page URL: https://huggingface.co/datasets/allenai/mmlu → use allenai/mmlu
  • Open the dataset page on Hugging Face
  • Go to the Files and versions tab to see available files (e.g. train.jsonl, test.jsonl)
  • Use the Preview or Explore feature to inspect the structure and column names
FieldValue
Repositoryallenai/mmlu
GT Filetest.jsonl (or the appropriate split)
Question Map Frominput
Answer Map Fromtarget
Limit100
  • An import job runs in the background
  • Progress is shown in the dataset Files tab
  • When complete, the imported Q&A pairs appear as ground truth (and optionally examples) data