gisid
Efficiently check for unique identifiers using C plugins. This is a fast
option to Stata's isid. It checks whether a set of variables uniquely
identifies observations in a dataset. It can additionally take if
and
in
but it cannot check an external data set or sort the data.
Important
Run gtools, upgrade
to update gtools
to the latest stable version.
Syntax
gisid varlist [if] [in] [, missok ]
Options
missok indicates that missing values are permitted in varlist.
Gtools options
(Note: These are common to every gtools command.)
-
compress
Try to compress strL to str#. The Stata Plugin Interface has only limited support for strL variables. In Stata 13 and earlier (version 2.0) there is no support, and in Stata 14 and later (version 3.0) there is read-only support. The user can try to compress strL variables using this option. -
forcestrl
Skip binary variable check and force gtools to read strL variables (14 and above only). Gtools gives incorrect results when there is binary data in strL variables. This option was included because on some windows systems Stata detects binary data even when there is none. Only use this option if you are sure you do not have binary data in your strL variables. -
verbose
prints some useful debugging info to the console. -
benchmark
orbench(level)
prints how long in seconds various parts of the program take to execute. Level 1 is the same asbenchmark
. Levels 2 and 3 additionally prints benchmarks for internal plugin steps. -
hashmethod(str)
Hash method to use.default
automagically chooses the algorithm.biject
tries to biject the inputs into the natural numbers.spooky
hashes the data and then uses the hash. -
oncollision(str)
How to handle collisions. A collision should never happen but just in case it doesgtools
will try to use native commands. The user can specify it throw an error instead by passingoncollision(error)
.
Examples
You can download the raw code for the examples below here
. sysuse auto, clear (1978 Automobile Data) . gisid mpg variable mpg does not uniquely identify the observations r(459); . gisid make . replace make = "" in 1 (1 real change made) . gisid make variable make should never be missing r(459); . gisid make, missok
gisid can also take a range, that is
. gisid mpg in 1 . gisid mpg if _n == 1