This tool allows you to explore the usage of variables in ehrQL datasets across all OpenSAFELY studies. It is a list of all ehrQL variable names as extracted on 2025-10-10.
By default, each variable name is listed separately. You can enable grouping by identical names using the checkbox. This will group all occurrences of the same variable name together, showing a count of total occurrences and distinct variants (based on their internal representation hash). Clicking a group will expand to show all file-level occurrences and links to the exact line in the repository. The different variants are shown via different coloured labels.
Once grouped, you can also try to collapse groups that share any variant (hash) using the "Collapse groups by common variant" checkbox. This will merge groups that share any variant into a super-group, combining their names and occurrences. This can help identify variables that are likely to be the same but have been given different names in different studies.
You can also try the "Fuzzy variant match" checkbox, which will use a variant hash that ignores code lists. This can help identify variables that only differ by their code lists.
The full method and script for extracting the variable names can be found in this repo, but a high level overview is as follows: