Three Questions to Ask When Working with Databases
- Factchequeado

- Feb 5
- 4 min read
By Factchequeado

Before working with figures, statistics, and databases, it is essential to have as limited a purpose as possible: What exactly do you want to measure? Are there public databases for this? What data will you not be able to find?
Question 1: Origin
Who produced this data, who collected it, how, and why? That is, what interest the database has.
● Is the source reliable?
● What is the methodology used to compile them? Is your methodology visible and transparent? Do we understand it? Is it replicable?
● What is the purpose of the person who generates this data? Check if they come from official or private institutions, what limitations they have, and if they have credibility.
● What critical phenomena are left out of this measurement?
Examples:
1. CRIME DATA: Police departments could show a statement where they say that crime increased, based on the number of arrests, but without mentioning that in their data records, there is also a variable that shows that, in the same period, operations increased.
2. POVERTY DATA: Poverty figures vary depending on who defines the "poverty line" and how income is measured.
3. BORDER IMMIGRANT DATA: Suppose the Border Patrol (CBP) publishes a table with "encounters" of migrants at the southwest border by month and nationality.
○ Who generates this data? Do they come from CBP at the national level, from a regional office, or from a third-party contractor?
○ How do you define "encounter"? To do this, check the official definitions.
○ Does it include only stops between ports of entry, or repetitions of the same person in the same month?
○ How is the data collected? Is it a real-time administrative record, a sampling, or an estimate?
○ Why is this data produced? To justify more budget, to show that migration increased or "decreased", to comply with a legal mandate?
Question 2: Quality
Is the data complete, up-to-date, and accurate?
Review geographic coverage, time, seasonal adjustments, and possible errors or gaps. This prevents misinterpretation.
● What period is the data from? (fiscal year vs. calendar, full or partial months)
● What is the sample size?
● Do all regions report the same? (Is there a state or sector missing?)
● Were there any changes in the way of counting between periods?
● Are there months without unexplained data?
● Do the numbers match between different official sources of the same phenomenon?
● Are they using the exact definition as in previous years?
When comparing different years, always ask yourself: Has the way you measure changed?
Example:
1. ICE DETENTIONS: Database of people in ICE custody by country of origin and gender.
○ Coverage: Adults only, or does it include minors? Just large centers or all counties?
○ Temporality: Fiscal year (October-September) or calendar year? Are they up to date?
○ Accuracy: Are there errors in the nationality identification? Do you have a daily breakdown or only a monthly breakdown?
2. POPULATION IN CUSTODY: Imagine you see this figure in an official statement: "The population in ICE custody is down 20% this quarter."
○ Question: Did it go down because there are fewer new arrests or because those already in custody were deported en masse?
○ It may be that there are fewer arrests at the border due to the changes in immigration policy.
○ Or it may be the same arrests, but more deportations because there are more operations.
If you review the data carefully, it could turn out that the population in custody fell by 20%, but that was due to mass deportations rather than fewer people arriving.
Cross-reference the data on "population in custody" with "deportations executed" from the same period. Analyze whether deportations went up; there you have your answer.
Question 3: Analysis
What patterns, trends, or anomalies do the data reveal when cross-referencing them?
Compare periods, cross variables, and look for patterns to generate narrative hypotheses.
Example:
ANALYSIS BY CRIME: The FBI has the "Uniform Crime Reporting" (UCR) database, with which you can see trends and cross-reference data. Suppose that base has homicide rates per 100,000 inhabitants, by type of weapon, and large city: fbi.gov/services/cjis/ucr. Analyze, compare, and find anomalies.
○ Compare by periods:
- 2019 (before the pandemic): 5.0 homicides per 100,000.
- 2020: Homicides rise 30% to 6.5 during protests over the murder of George Floyd during the COVID-19 pandemic.
- Analyze one year against another: 2025 vs. 2024.
○ Crosses variables:
- Year vs. weapon type.
- City vs. Democratic or Republican mayor.
- Month vs. specific periods (pandemic, protests, etc.).
○ Spot anomalies: 2023: Washington, D.C. recorded a homicide rate of 40 per 100,000, the highest in 20 years, but dropped 35% in 2024. Why? Cross-reference that data with the data on gun sales and check the data: Were there more gun purchases during periods of protests or not? Did car thefts go up, but homicides go down?
It's important to see trends like this: Between 2022 and 2025, Hyundai and Kia car thefts rose by more than 400% in cities such as Chicago, Milwaukee, and Los Angeles, according to data from the FBI and the National Insurance Crime Bureau (NICB). So, while homicides are down 21% in 2025, total robberies are up about 10% nationally.



Comments