Descriptive statistics and inferential statistics

Descriptive statistics and inferential statistics

Descriptive statistics and inferential statistics

  1. Introduction

This report is an analysis of the data generated using SPSS and presented using charts and tables. The report firstly presents the results of selected descriptive statistical analyses. Subsequently, the report summarises the numerical results with descriptive statistics analysis tables or graphs, including the interpretation of these tables and graphs. The fourth section or the report is a presentation of the data regarding numerical results of the inferential statistics. This is followed by a discussion of the same, before a summative conclusion is presented in the last section.

  • Selected descriptive statistics

Descriptive statistics refers to the kinds of data that analysts and researchers use in presenting the characteristics of the sample used in a study. According to Kothari (2004), they are used in checking whether the variables that the researcher has chosen to use violate any assumptions that the researcher might have made, which might be consequential to the findings. Another important function of descriptive statistics used in this section is that they help to answer the core research questions.

In the present study, the descriptive statistics selected are for public use micro data area code (PUMA), house weight (WHTP), state code, (ST), numbering of persons (NP), rooms (RMS), bedrooms, (BDS), and household income (HINCP). The data retrieved was as presented in table 1 below

Table 1: PUMA, ST, BDS, RMS, mean, median, and standard deviation

 RMSBDSSTPUMA
NValid4911491149114911
Missing0000
Mean4.872.6115.00248.05
Median5.003.0015.00302.00
Std. Deviation1.9331.197.00081.573
Minimum1015100
Maximum9515307

Table 2: RMS, BDS, ST, and PUMA, frequency table

 PUMAFrequencyPercentValid PercentCumulative Percent
Valid10095119.419.419.4
 20078215.915.935.3
 3014078.38.343.6
 3024128.48.452.0
 3034258.78.760.6
 30453610.910.971.5
 3053657.47.479.0
 3064569.39.388.3
 30757711.711.7100.0
 Total4911100.0100.0 
NPFrequencyPercentValid PercentCumulative Percent
Valid04529.29.29.2
 197019.819.829.0
 2149130.430.459.3
 371114.514.573.8
 461912.612.686.4
 53186.56.592.9
 61613.33.396.2
 7941.91.998.1
 830.6.698.7
 920.4.499.1
 1018.4.499.5
 117.1.199.6
 127.1.199.7
 135.1.199.8
 151.0.099.9
 162.0.099.9
 171.0.099.9
 192.0.0100.0
 202.0.0100.0
 Total4911100.0100.0 
RMSFrequencyPercentValid PercentCumulative Percent
Valid11853.83.83.8
 23457.07.010.8
 367713.813.824.6
 489618.218.242.8
 5111022.622.665.4
 676815.615.681.1
 74388.98.990.0
 82344.84.894.7
 92585.35.3100.0
 Total4911100.0100.0 
BDSFrequencyPercentValid PercentCumulative Percent
Valid02114.34.34.3
 168313.913.918.2
 2120824.624.642.8
 3181036.936.979.7
 468814.014.093.7
 53116.36.3100.0
 Total4911100.0100.0 

From the data in table 1 above, a number of observations are blatant and clear. The first is that the means of RMS, BDS, ST and PUMA are 4.87, 2.61, 15, and 248.05 respectively. For rooms, the number of rooms, the median score was 5, where the scores varied from 1 to 9. This means that the majority of respondents have about 5 rooms.

When it comes to the number of bedrooms, the median score was 3, whereas the mean was 2.61, this shows that the majority of respondents have 3 rooms. The state code was 15 for all respondents whereas the mean for public use of micro data area code was 248.05. The mean was 302, whereas the minimum and maximum scores were 100 and 307 respectively.

From table 2, a number of assertions can also be made, and the first is about PUMA. From the table, the evidence shows that for public use of micro data area code, 19.4% of the respondents scored category 100, which made it the highest selected category, whereas 15.9% of the respondents checked 200, making it the second most selected category. Comparatively, 301 was the least selected category at 8.7%.

Additionally, for number of bedrooms, a majority of the respondents said that they had three bedrooms in their houses, and this represented 3.9% of all responses, closely followed by those with two bedrooms at 24.6%. At the same time, the number of people living in houses with no bedrooms or five bedrooms was the least with a score of 4.3% and 6.3% respectively.

This data is in line with the data about rooms, which shows that 22% of respondents stay in a five-roomed apartment, followed by 18% and 15%, who stay in four and five roomed houses respectively. Because of the number of rooms and bedrooms in their houses, it is plausible to conclude that a majority of the respondents stay with other people or expect other people to visit often, which are why they have extra rooms in the house, as well as extra bedrooms in the house.

Additionally, from the data, it is obvious that a majority of the people are in the middle between the rich and the poor, as those who stay in studio apartments are as marginal as those who stay in luxury apartments that can contain at least five bedrooms. . 

  • Selected inferential statistical analyses

Inferential statistics refer to the data analysis methods where the researcher or analyst uses a given set of data to determine whether there is a link between given variables being studied. By using inferential statistics, the researcher can tell whether the relationship that seems to exist between variables is a fact, or whether it is not a fact. According to Kothari (2004), a number of measures and techniques can be used to accomplish inferential statistics. The two types of inferential statistics used in this report are correlation and regression analyses.  

Correlation was conducted using the Pearson correlation analysis. Pearson correlation analysis is employed to measure the linear relationship between two or more variables. The value of Pearson correlation ranges between -1 and +1, with -1 indicating negative correlation, 0 indicating no correlation and +1 indicating positive correlation between the variables.  Besides, the closer the value is to +1, the stronger the relationship between the variables (Saunders, Lewis & Thornhill, 2007). For this study, the data is as shown below.

According to table 4-20, Sig. (2-tailed) =0.000, and all the four variables have a significant correlation at the 0.01 significant level. Pearson correlation between PUMA and NP is .110, whereas the relation between PUMA and BDS and RMS is .042 and .067 respectively. This shows that there is a weak but positive relationship between PUMA and all the independent variables, although the weakest relationship is that between PUMA and BDS.

Table 3: Correlations

 PUMANPBDSRMS
PUMAPearson Correlation1.110(**).042(**).067(**)
Sig. (2-tailed) .000.003.000
N4911491149114911
NPPearson Correlation.110(**)1.447(**).396(**)
Sig. (2-tailed).000 .000.000
N4911491149114911
BDSPearson Correlation.042(**).447(**)1.878(**)
Sig. (2-tailed).003.000 .000
N4911491149114911
RMSPearson Correlation.067(**).396(**).878(**)1
Sig. (2-tailed).000.000.000 
N4911491149114911

**  Correlation is significant at the 0.01 level (2-tailed).

Regression analysis helps estimate and investigate the association between variables. R Square is used to show the degree of relationship between the dependent and independent variables. R Square value ranges between 0 and 1, and the closer the value is to 1, the stronger the relationship between the variables further indicating the greater degree to which variation in independent variable explains the variation in dependent variable (Seber and Lee, 2012).

Based on the model summary table 4-21, R stand for the correlation coefficient and it depicts the association between dependent variable and independent variables. It is evident that a positive relationship exists between the dependent variable and independent variables as shown by R value (0.126).

However, the relationship is a very weak one. Besides, it can be seen that the variation in the three independent variables (RMS, BDS and NP) explain 1.6% variation of PUMA as represented by the value of R Square. Therefore, it means that other factors that are not studied on in this study contribute 98.4% of the PUMA programs. This means that the other factors are very important and thus need to be put into account in any effort to enhance PUMA. Additionally, this research therefore identifies the three independent variable studied on in this research as the non-critical determinants of PUMA boundaries.

Table 4: regression analysis results

                                 Model Summary

ModelRR SquareAdjusted R SquareStd. Error of the Estimate
1.126(a).016.01580.945

a  Predictors: (Constant), RMS, NP, BDS

Further, this research established through the analysis f variance that the significant value is 0.00, which is less than 0.01, therefore the model is statistically significant in foretelling how NP, RMS, and BDS can influence PUMA groupings. The F critical value at the 0.01 level of significant was 26.501. Given that F calculated  is greater than the F critical value of 26.501, then it means that the overall model was significant (Seber and Lee, 2012).

                                                         ANOVA(b)

Model Sum of SquaresdfMean SquareFSig.
1Regression520911.1203173637.04026.501.000(a)
 Residual32151092.84549076552.087  
 Total32672003.9654910   

a  Predictors: (Constant), RMS, NP, BDS

b  Dependent Variable: PUMA

At the same time, the beta coefficients also gives significant inferential information. According to the regression coefficients presented in table 4-23, this research found that when all independent variables (the number of persons (NP), number of rooms (RMS), and the number of bedrooms (BDS)) are kept constant at zero, the level of public use micro data area code (PUMA)  will be at 231.13. A 1% change in number of persons will lead to an 11.4% increase in PUMA, whereas a one percent change in BDS will lead to a 12.1% changes in PUMA.

Comparatively, a one percent change in  RMS will lead to a 12.8 percent change in PUMA. This leads to the conclusion that of the three variavles, RMS leads to the largest impact in PUMA when the three independent variables are pitted together. Further the statistical significance of each independent variable was tested at the 0.01 level of significance of the p-values.

                                                      Coefficients(a)

Model Unstandardized CoefficientsStandardized CoefficientstSig.
BStd. ErrorBeta
1
(Constant)231.1303.161 73.128.000
NP4.700.654.1147.181.000
BDS-8.2222.068-.121-3.977.000
RMS5.3841.248.1284.315.000

a  Dependent Variable: PUMA

In general form, it can be said that the equation used to determine the link between  Public use microdata area code, numbering of persons, rooms and bedrooms is of the form:

Y = β0+ β1X1+ β2X2+ β3X3+ ε

From the equation, β0 is a constant, whereas β1 to β3 are coefficients of the independent variables. X1 X2 and X3 are the independent variables numbering of persons, rooms and bedrooms respectively, whereas epsilon ε is an error term. Additionally, the dependent variable Y in the equation represents public use microdata area code. Pegging the present discussion in the formula above, the model would be as follows.

Y = 231.130 + .114X1 – .121X2 +.128X3

This means that the public use micordata area code = 231.130 + (0.114 x numbering of persons) – (0.121 x rooms) +(0.128 x bedrooms).

References

Kothari, C. (2004). Research methodology, methods & techniques (2nd ed.).New Delhi: Wishwa Prakashan.

Saunders, M., Lewis, P. & Thornhill, A. (2007). Research Methods for Business Students. 4th edition. England: Prentice Hall.

Seber, A. F. G. and Lee, J. A. (2012) Linear Regression Analysis. 2nd Edition. Hoboken, New Jersey: John Wiley & Sons

Want help to write your Essay or Assignments? Click here