Efficiently check for unique identifiers using C plugins. This is a fast
option to Stata's isid. It checks whether a set of variables uniquely
identifies observations in a dataset. It can additionally take
in but it cannot check an external data set or sort the data.
gtools, upgrade to update
gtools to the latest stable version.
gisid varlist [if] [in] [, missok ]
missok indicates that missing values are permitted in varlist.
(Note: These are common to every gtools command.)
compressTry to compress strL to str#. The Stata Plugin Interface has only limited support for strL variables. In Stata 13 and earlier (version 2.0) there is no support, and in Stata 14 and later (version 3.0) there is read-only support. The user can try to compress strL variables using this option.
forcestrlSkip binary variable check and force gtools to read strL variables (14 and above only). Gtools gives incorrect results when there is binary data in strL variables. This option was included because on some windows systems Stata detects binary data even when there is none. Only use this option if you are sure you do not have binary data in your strL variables.
verboseprints some useful debugging info to the console.
bench(level)prints how long in seconds various parts of the program take to execute. Level 1 is the same as
benchmark. Levels 2 and 3 additionally prints benchmarks for internal plugin steps.
hashmethod(str)Hash method to use.
defaultautomagically chooses the algorithm.
bijecttries to biject the inputs into the natural numbers.
spookyhashes the data and then uses the hash.
oncollision(str)How to handle collisions. A collision should never happen but just in case it does
gtoolswill try to use native commands. The user can specify it throw an error instead by passing
You can download the raw code for the examples below here
. sysuse auto, clear (1978 Automobile Data) . gisid mpg variable mpg does not uniquely identify the observations r(459); . gisid make . replace make = "" in 1 (1 real change made) . gisid make variable make should never be missing r(459); . gisid make, missok
gisid can also take a range, that is
. gisid mpg in 1 . gisid mpg if _n == 1