No description
| data | ||
| results | ||
| src/hal | ||
| typings/transformers | ||
| .gitignore | ||
| .python-version | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
Do LLMs hallucinate more in Czech?
This repository contains the code necessary to run experiments comparing the halluciation rates of LLMs in English and Czech using the TruthfulQA dataset.
How to use
- Install depeendencies with
uv sync - Open the
config.pyfile and choose the models for translation, evaluation and models to evaluate. You can use any model available via LiteLLM. - Set the API keys as environment variables.
- To translate the TruthfulQA dataset, run
uv run translate. - (Optional) To tune a prompt for answer evaluation using DSPy, run
uv run train-eval. - To generate answers using a specific model and then evaluate them, run
uv run hal -l [cs|en] -m [model_name] -o [output_path]. - To generate hallucination rates run
uv run analyze [results_path].