No description
Find a file
2026-02-15 19:32:09 +00:00
data Use Evaluator model trained by DSPy 2026-02-07 18:24:31 +00:00
results Use Evaluator model trained by DSPy 2026-02-07 18:24:31 +00:00
src/hal Move Evaluation DSPy program to the main package 2026-02-15 19:21:13 +00:00
typings/transformers Train a paraphrase model 2026-01-19 15:45:03 +00:00
.gitignore Modify analysis 2026-02-13 15:26:37 +00:00
.python-version Initial commit 2026-01-05 18:41:56 +00:00
LICENSE Add LICENSE 2026-02-15 19:32:09 +00:00
pyproject.toml Install Prefect 2026-02-08 09:47:23 +00:00
README.md Update README 2026-02-15 19:30:24 +00:00
uv.lock Install Prefect 2026-02-08 09:47:23 +00:00

Do LLMs hallucinate more in Czech?

This repository contains the code necessary to run experiments comparing the halluciation rates of LLMs in English and Czech using the TruthfulQA dataset.

How to use

  1. Install depeendencies with uv sync
  2. Open the config.py file and choose the models for translation, evaluation and models to evaluate. You can use any model available via LiteLLM.
  3. Set the API keys as environment variables.
  4. To translate the TruthfulQA dataset, run uv run translate.
  5. (Optional) To tune a prompt for answer evaluation using DSPy, run uv run train-eval.
  6. To generate answers using a specific model and then evaluate them, run uv run hal -l [cs|en] -m [model_name] -o [output_path].
  7. To generate hallucination rates run uv run analyze [results_path].