Skip to content

Import from Hugging Face

Import Q&A datasets directly from Hugging Face Hub into your FloTorch workspace. This guide explains what you need and how to use the import form.

  • A dataset of type Question and Answer Pair (Q&A Pair)
  • During dataset configuration, select Import Question & Answer Pair from HuggingFace as the creation method
  • The dataset repository ID from Hugging Face
  • Knowledge of the file structure in the target dataset (file names and column names)
  1. Open your WorkspaceDatasets
  2. Click Create Dataset
  3. Select Question and Answer Pair (Q&A Pair)
  4. Click Configure
  5. Choose Import Question & Answer Pair from HuggingFace as the creation method
  6. Complete the import form and click Create Dataset
FieldDescriptionExample
RepositoryHugging Face dataset repository ID in username/dataset-name formatallenai/mmlu
GT FileFilename of the ground truth / evaluation file in the repotest.jsonl, validation.jsonl
Question Map FromName of the source column that contains the question/promptinput, question, prompt
Answer Map FromName of the source column that contains the expected answeroutput, answer, target
LimitNumber of Q&A pairs to import100
FieldDescriptionExample
Access TokenHugging Face token for private repositories. Leave empty for public datasetshf_xxxx...
Examples FileFilename for few-shot examples (used for N-shot prompts)train.jsonl
Examples Question Map FromColumn name for questions in the examples fileinput
Examples Answer Map FromColumn name for answers in the examples fileoutput
  • Browse datasets: huggingface.co/datasets
  • From the dataset page URL: https://huggingface.co/datasets/allenai/mmlu → use allenai/mmlu
  • Open the dataset page on Hugging Face
  • Go to the Files and versions tab to see available files (e.g. train.jsonl, test.jsonl)
  • Use the Preview or Explore feature to inspect the structure and column names
FieldValue
Repositoryallenai/mmlu
GT Filetest.jsonl (or the appropriate split)
Question Map Frominput
Answer Map Fromtarget
Limit100
  • An import job runs in the background
  • The new dataset appears in the datasets list once created
  • When complete, imported Q&A pairs are available in that dataset