06 Introduction to the Analysis of Patterns & Relationships.
Analysis of Police Demand and Resource Management
Introduction to Police Demand
- The demand for police services has surged faster than the resources available, prompting forces to adopt new methods for prioritizing responses.
- This observation is rooted in a 1993 UK Audit Commission report, which remains relevant today due to ongoing economic challenges affecting law enforcement budgets.
Role of Analysts in Policing
- Analysts play a crucial role in managing police responses by identifying significant patterns and trends from large datasets.
- Their work aims to optimize the use of limited resources by focusing on areas where they can have the greatest impact.
Understanding Mathematical Operations
Importance of Order in Mathematics
- The order of mathematical operations is essential; without it, different individuals may arrive at varying answers for the same problem.
- A mnemonic device, "Please Excuse My Dear Aunt Sally," helps remember this order: Parentheses, Exponents, Multiplication/Division (left to right), Addition/Subtraction (left to right).
Example Calculations
- An example illustrates how following the correct order leads to accurate results: 30 / 5 * 2 + 1.
- The calculation shows that performing operations from left to right ensures consistency and correctness in arriving at the final answer.
Complex Mathematical Examples
Advanced Calculation Techniques
- A more complex example demonstrates using all six operations: 5 + (4 - 2)^2 * 3 / 6 - 1.
- Step-by-step calculations reveal how parentheses dictate initial operations followed by exponents, multiplication/division, and finally addition/subtraction.
Statistics in Policing
Perception of Statistics
- Many people recoil at the mention of statistics due to a lack of understanding; this includes researchers and police officers alike.
- Despite their complexity, basic statistical tools are used daily by most individuals without realizing it.
Understanding Statistics as Tools
- Statistics serve as tools for data reduction and demonstrating relationships between variables. They encompass collection, organization, analysis, interpretation, and presentation of data.
Historical Context and Development
Evolution of Data Collection in Law Enforcement
- Since Sir Robert Peel's era, law enforcement agencies have collected data—often underutilized—especially in developing countries.
- Advances in technology have enhanced data collection capabilities but highlight the need for understanding basic statistical tools.
Branches of Statistics
Descriptive vs. Inferential Statistics
- There are two main branches: descriptive statistics summarize data through tabular or graphical methods while inferential statistics draw conclusions about larger populations based on sample data.
Understanding Descriptive and Inferential Statistics
The Role of Descriptive Statistics
- Descriptive statistics are essential for interpreting data subject to random variation, making it easier to visualize and understand large datasets.
- This branch of statistics employs various tools to summarize quantitative data, helping identify patterns within the dataset.
- It utilizes tabulated descriptions (tables), graphical representations (graphs/charts), and statistical commentary to present data meaningfully.
- However, descriptive statistics only summarize the measured data without allowing conclusions beyond that specific dataset.
- Despite their limitations, they help visualize characteristics of voluminous data, aiding in understanding.
Transitioning to Inferential Statistics
- Inferential statistics extend beyond summarizing data; they allow us to draw conclusions about a population based on sample analysis.
- A population encompasses all individuals or items of interest; for example, adults in a housing estate or burglaries in a specific division.
- Parameters represent properties of populations (like mean or range), while statistics refer to similar properties derived from samples due to limited access to entire populations.
- Techniques in inferential statistics enable generalizations about larger populations using smaller representative samples.
- It's crucial that samples accurately reflect the population's characteristics for valid inferential conclusions.
Exploring Measures of Central Tendency
- In this module, we will delve deeper into descriptive statistics focusing on measures like central tendency and spread through tables and graphs.
- Measures of location summarize datasets by identifying a central value around which the data clusters; common measures include mean, median, and mode.
- The average (mean), calculated as the sum of all measurements divided by observations, serves as a balance point representing typical values within the dataset.
Understanding Measures of Central Tendency
The Mean: Definition and Calculation
- The mean (average) is calculated by summing all data points (Σx) and dividing by the number of variables (n). In an example with 20 data points totaling 708, the mean is computed as 708 / 20 = 35.4.
Advantages and Disadvantages of the Mean
- The mean utilizes all values in a dataset, allowing for precise calculations and further statistical analysis. However, it can be skewed by outliers, which may misrepresent the dataset.
Understanding the Median
- The median represents the middle value in an ordered dataset. It separates higher from lower halves and varies based on whether there’s an odd or even number of observations.
Calculating the Median
- For an odd number of variables, identify the middle value directly. For even numbers, average the two central values to find the median.
Advantages and Disadvantages of the Median
- While unaffected by extreme values, making it a robust measure, disadvantages include not accounting for all data points and being less representative in small datasets.
Exploring the Mode
- The mode is defined as the most frequently occurring value(s). Identifying it can be easier when data is ordered; however, no formula exists for its calculation.
Types of Modes
- A unimodal dataset has one mode; bimodal has two modes; multimodal contains multiple modes. If all values are unique or occur equally often, there is no mode.
Summary of Advantages and Disadvantages of Mode
- While identifying common values effectively, like median and mean, it does not consider all data points nor support further statistical calculations.
Other Measures of Central Tendency
Brief Overview of Less Familiar Measures
- Additional measures include:
- Truncated Mean: Average after discarding certain high/low values.
- Mid-range: Average of maximum and minimum values.
More Complex Measures
- Advanced measures consist of:
- Weighted Mean: Incorporates weights for specific elements.
- Geometric Mean: nth root product calculation.
- Harmonic Mean: Reciprocal average method.
Importance in Data Analysis
Understanding Measures of Location in Statistics
The Importance of Choosing the Right Measure
- When analyzing data, it's crucial to remember that if a dataset is normally distributed, the mean, median, and mode will be identical. The choice of which measure to use depends on the context and intended application.
- The mean is often most relevant when interested in total values since it represents the total divided by the number of variables. For instance, mean income indicates how much each family member can spend on necessities.
Case Study: Family Savings for a Holiday
- A family's ability to save for a holiday depends on their total savings after regular expenses. This total is calculated using the mean savings multiplied by the number of family members.
- In a family with five members (two earners and three children), while the mean savings might suggest all are saving, it could mislead as only adults contribute financially.
Misleading Statistics and Their Implications
- Using measures that support one's viewpoint can mislead audiences; this tactic is often exploited by politicians and unscrupulous individuals.
- An example from UK wage statistics shows that while the average salary was reported as £30,000, this figure obscures significant disparities in actual earnings among different workers.
Analyzing Wage Distributions
- The average salary does not accurately reflect what most people earn; for instance, while some earn minimum wage (£12,200), others make significantly more.
- The distribution skewed towards lower salaries suggests that relying solely on averages can create an illusion of greater affluence than exists.
Limitations of Individual Measures
- Isolated means can be misleading due to susceptibility to outliers. To gain accurate insights into data trends, multiple measures should be examined together.
Suitability of Different Measures for Various Data Types
- Not all measures are appropriate for every type of data; categorical data requires different approaches compared to ordinal or cardinal data.
- For categorical data (e.g., names or types), only mode is valid. Ordinal data allows both mode and median usage; cardinal data permits all three measures.
Exploring Measures of Spread
- Understanding how spread affects representation is vital; wide spreads indicate less reliability in means as representatives of datasets.
Understanding Measures of Spread in Data Analysis
Introduction to Measures of Spread
- Individual values are often seen as negative characteristics, contrasting with measures of location that show data similarity. Measures of spread highlight how data differs, indicating variability.
- Combining measures of dispersion with measures of location provides a more comprehensive view than using either alone. The first stage in analysis is data description.
Standard Deviation and Range
- The standard deviation is a complex calculation defined as the square root of variance, which represents average squared deviations from the mean. However, its practical calculation is straightforward.
- The range is simply the difference between maximum and minimum values in a dataset. It will be revisited when discussing the five-number summary.
Additional Measures of Spread
- Other measures include the interquartile range (IQR), calculated by subtracting lower quartile from upper quartile, providing insight into central data spread.
- Quartiles divide an ordered dataset into four equal parts: lower quartile (between smallest number and median), median (second quartile), and upper quartile (between median and highest values).
Distribution Shape Characteristics
- The form or shape of distribution relates to how data graphs visually represent itself, particularly assessing normality against a bell-shaped curve.
- Characteristics such as peaks, skewness, and kurtosis are essential for understanding distribution shapes; parametric statistics assume specific probability distributions.
Skewness Explained
- Skewness indicates symmetry; a symmetrical dataset has zero skewness. Distributions can be negatively or positively skewed if they deviate from this symmetry.
- A skewness value greater than +1 or less than -1 indicates high skewness; values between -1 and -0.5 or +0.5 to +1 indicate moderate skewness; values within -0.5 to +0.5 suggest approximate symmetry.
Implications of Sample Skewness
- Analyzing sample skewness helps infer population characteristics; significant sample skewness may indicate underlying population skewness despite random chance variability.
Understanding Kurtosis
- Kurtosis assesses whether data distributions are peaked or flat compared to normal distributions; high kurtosis shows distinct peaks near means while low kurtosis results in flatter tops.
- Normal distribution has a kurtosis value of three; excess kurtosis above three indicates peakiness relative to normal distribution while below three suggests flatter distributions.
Understanding Statistical Tools: Skewness and Kurtosis
The Role of Excel in Reporting Data
- Microsoft Excel reports excess values, which can differ significantly from actual values. For instance, an actual value of 5.5 results in an excess value of 1.5, while a value of 1.4 leads to -1.6.
Understanding Skewness and Kurtosis
- Skewness and kurtosis are statistical tools that help describe the distribution shape of a dataset, providing insights into its histogram representation.
- Interpretation of skewness and kurtosis is context-dependent; generally, if skewness is between -1 and +1 and kurtosis is between 2 and 4, the data can be considered normally distributed.
Five Figure Summary: A Concise Data Overview
- The five figure summary includes key statistics: median, maximum, minimum, upper quartile (Q3), and lower quartile (Q1). This summary avoids clutter by focusing on essential data points.
- It is suitable for ordinal, cardinal, interval, or ratio data but not for nominal data due to lack of order.
Identifying Outliers with Box Plots
- The five figure summary aids in identifying outliers—data points significantly different from others—and assessing skewness without assuming distribution type.
- Box plots visually represent the five figure summary characteristics such as median (50% mark), maximum (highest value), minimum (lowest value), Q3 (upper quartile), and Q1 (lower quartile).
Analyzing Distribution with Box Plots
- Box plots illustrate the range of scores for the middle 50% of data through interquartile ranges while whiskers indicate overall data range excluding outliers.
- If one whisker is longer than another, it indicates skewness; longer top whisker suggests positive skew while a longer bottom whisker indicates negative skew.
Comparative Analysis Using Box Plots
- When comparing datasets using box plots labeled A through D (e.g., police beats or restaurants), visual differences can reveal significant variations in underlying variables like crime rates or sales figures.
- Observations from box plots can highlight similarities or discrepancies among groups that warrant further investigation.
Variability Insights from Box Plot Sections
- Uneven sections within box plots may indicate varying distributions across different parts; for example, long upper whiskers suggest greater variability at higher values compared to lower ones.
Analysis of Box Plots and Descriptive Statistics
Understanding Box Plots for Data Comparison
- The box plot technique is essential for comparing data sets, focusing on the distribution patterns and characteristics of individual sets.
- Data from three regions over a 20-month period is analyzed to identify similarities and differences, which could pertain to various contexts like police beats or restaurant outlets.
- While box plots can be drawn by hand, using Excel simplifies the process; however, it requires specific adjustments as it's not a standard chart type.
Key Observations from Box Plots
- Comparing the box plots reveals that Region A has significantly higher values than Regions B and C, while Region B shows lower and more tightly packed values.
- In terms of burger sales, Region A leads with higher sales figures compared to Regions B and C. Conversely, Region B exhibits the least fluctuation in sales.
- When analyzing police calls for service, Region A again has the highest number of calls while Region B maintains a consistent low level.
Insights into Variability and Analysis Tools
- The analysis raises questions about why these differences exist between regions; understanding these variances can lead to valuable insights.
- Microsoft Excel's analytical tools are recommended for calculating descriptive statistics efficiently without manual computation.
- The Analyst Tool Pack in Excel enhances statistical analysis capabilities by providing additional functions such as ANOVA tests and T-tests.
Descriptive Statistics Overview
- The tool pack calculates key descriptive statistics including measures of location (mean, median), spread (range), kurtosis, and skewness which help in understanding data distributions.
- Although not perfectly normal, the data suggests a close approximation with similar mean, median, mode values indicating low skewness.
Adding Analytical Tools in Excel
- Instructions for adding the Analyst Tool Pack in Excel 2013 include navigating through file options to enable advanced statistical functions.
Introduction to Percentages
- Percentages represent numbers divided by 100; they are commonly used in everyday life but can sometimes convey misleading information regarding scale or value changes.
Understanding Percentages and Their Contexts
The Importance of Context in Percentages
- A 100% increase from 1 to 2 is not equivalent to a 100% increase from 1,000 to 2,000; context matters significantly when interpreting percentages.
- Percentages can be expressed as fractions or decimals; for example, 20% equals 20 parts out of 100 or as the decimal 0.20.
- To understand a percentage's value, it’s crucial to know what it represents; providing the base number alongside the percentage offers necessary context.
Comparing Discounts: A Case Study
- Two stores offer different discounts on items priced at £16 and £10 respectively; despite appearances, both items end up costing £8 after applying their respective discounts.
- When comparing data with different sample sizes, converting figures into percentages allows for easier comparison across datasets.
Analyzing Crime Data Through Percentages
- Total acquisitive crime data from different years (2012 vs. 2013) can be misleading without proper context; using percentages helps clarify changes in crime types.
- For instance, burglaries represented a specific fraction of total crimes in each year, making it easier to compare trends over time.
Calculating Percentage Changes
- The decrease in robberies between two years can be quantified as a percentage change by determining the difference and expressing it relative to the initial value.
- It’s essential to identify the correct starting value when calculating percentage changes since moving from higher to lower values yields different results than vice versa.
Practical Applications of Percentages
- In retail and banking contexts, prices and interest rates are often expressed as percentages; understanding these terms is vital for consumers.
Calculating Percentage Changes and Understanding Statistics
Methods for Calculating Discounted Prices
- The calculation of a discounted price involves multiplying the original price by 0.85 (for an 85% discount). For example, 0.85 * £180 equals £153.
- Two methods can be used to calculate percentage increases:
- Calculate the actual increase (5% of £100 = £5), then add it to the original amount (£100 + £5 = £105).
- Alternatively, calculate the total percentage (100% + 5% = 105%) and multiply by the original value (1.05 * £100 = £105).
Understanding Investment Growth
- Both methods for calculating investment growth yield the same result, demonstrating consistency in mathematical approaches.
- Basic statistics covered include measures of location and spread, which help describe data effectively.
Summary of Statistical Concepts
- The five-number summary is introduced as a tool for identifying features within a dataset and comparing different datasets.