No description

Find a file

Milos Svana 96b8eac7e8 Add LICENSE		2026-02-15 19:32:09 +00:00
data	Use Evaluator model trained by DSPy	2026-02-07 18:24:31 +00:00
results	Use Evaluator model trained by DSPy	2026-02-07 18:24:31 +00:00
src/hal	Move Evaluation DSPy program to the main package	2026-02-15 19:21:13 +00:00
typings/transformers	Train a paraphrase model	2026-01-19 15:45:03 +00:00
.gitignore	Modify analysis	2026-02-13 15:26:37 +00:00
.python-version	Initial commit	2026-01-05 18:41:56 +00:00
LICENSE	Add LICENSE	2026-02-15 19:32:09 +00:00
pyproject.toml	Install Prefect	2026-02-08 09:47:23 +00:00
README.md	Update README	2026-02-15 19:30:24 +00:00
uv.lock	Install Prefect	2026-02-08 09:47:23 +00:00

README.md

Do LLMs hallucinate more in Czech?

This repository contains the code necessary to run experiments comparing the halluciation rates of LLMs in English and Czech using the TruthfulQA dataset.

How to use

Install depeendencies with uv sync
Open the config.py file and choose the models for translation, evaluation and models to evaluate. You can use any model available via LiteLLM.
Set the API keys as environment variables.
To translate the TruthfulQA dataset, run uv run translate.
(Optional) To tune a prompt for answer evaluation using DSPy, run uv run train-eval.
To generate answers using a specific model and then evaluate them, run uv run hal -l [cs|en] -m [model_name] -o [output_path].
To generate hallucination rates run uv run analyze [results_path].