UNIT – 2
Data collection and Processing
The acquisition includes the collection or addition to the holdings of data. There are several information acquisition methods:
Gathering new data
- Using your own previously gathered information
- Reuse of someone else's data
- Buying data
Acquired online from (texts, social media, photos)
Data processing: A series of data actions or steps to verify, organize, transform, integrate, and extract information in a suitable output form for subsequent use. To ensure the utility and integrity of the data, methods of processing must be rigorously documented.
Data Analysis It includes data-based actions and methods that help describe facts, identify patterns, develop explanations, and test hypotheses. This includes quality assurance of data, analysis of statistical data, modeling, and outcome interpretation.
Results: The results of above mentioned actions are published as a research paper. In case the research data is made accessible, one has to prepare the data set for opening up.
Sources of Data can be classified into 2 types. Statistical sources refer to data that are gathered for some official purposes and incorporate censuses and officially administered surveys. Non-statistical sources refer to the collection of data for other administrative purposes or for the private sector.
For other administrative purposes or for the private sector, non-statistical sources refer to data collection.
What are the various data sources?
The two data sources are below:
1. Internal Source
- When data is collected from the organization itself's reports and records, it is known as the internal source.
- A company publishes its 'Annual Report' on profit and loss, total sales, loans, wages, etc., for instance.
2. External Source
- When data is collected from the organization itself's reports and records, it is known as the internal source.
- A company publishes its 'Annual Report' on profit and loss, total sales, loans, wages, etc., for instance.
A) Primary Data
- Primary information means 'First-hand data' gathered by an investigator.
- It is the first time it has been collected.
- It is original and more reliable.
- For instance, the population census is conducted every 10 years by the Government of India.
B) Secondary Data
- Secondary information refers to 'Second-hand data'.
- These are not originally collected, but rather obtained from sources that have already been published or unpublished.
- The address of a person taken from a company's telephone directory or phone number taken from 'Just Dial', for example.
Key Takeaways:
|
1. Direct Personal Inquiry
2. Oral Indirect Investigation
3. Information via Correspondents
4. Interview by Telephone
5. Questionnaire Mailed
6. The questionnaire that enumerators filled out
Primary data
An advantage of using primary data is that, for the specific purposes of their study, researchers collect information. In essence, the questions asked by the researchers are tailored to elicit the information that will assist them with their research. Researchers, using surveys, interviews and direct observations, collect the data themselves.
For instance, direct observations in the area of workplace health research may involve a researcher watching people at work. The researcher could count and code the number of times she sees practices or behaviors relevant to her interest, such as instances of improper lifting posture or the amount of hostile or disrespectful interactions employees over a period of time engage with customers and clients.
Let's say, to take another example, a research team wants to find out about the experiences of employees in return for work following a work-related injury. Part of the study may involve telephone interviews with workers about how long they have been off work and their experiences with the return-to-work process. The responses of the employees-considered primary data-will provide specific information about the return-to-work process to the researchers; e.g. they may learn about the frequency of job accommodation offers, and the reasons for refusing such offers by some employees.
Secondary data
Several types of secondary data exist. They can include information collected by Statistics Canada from the national population census and other government information. Administrative data is one type of secondary data that is increasingly used. This term refers to information routinely collected as part of an organization, institution or agency's day-to-day operations. There are a number of examples: registrations of motor vehicles, hospital intake and discharge records, records of workers' compensation claims, and more. Secondary data tends to be readily available and inexpensive to obtain in comparison to primary data. Moreover, because the data collection is comprehensive and routine, administrative data tends to have large samples. In addition, administrative information (and many kinds of secondary data) is collected over a long period of time. That makes it possible for researchers to detect changes over time. Returning to the above-mentioned return-to-work study, the researchers could also examine secondary data in addition to the data provided by their primary data (i.e. survey results). To determine the amount of time workers were receiving wage replacement benefits, they could look at workers' compensation lost time claims data.
The researchers may be able to determine which factors predict a shorter absence of work among injured workers with a combination of these two data sources. Such data could then help improve the return of other injured workers to work.
Data collection in Statistics is a process of collecting information from all relevant sources to find a solution to the problem of research. It helps to assess the issue's outcome. The methods of data collection allow a person to finalize an answer to the relevant question. Data collection methods are used by most organizations to make assumptions about future probabilities and trends. It is necessary to undergo the process of data organization once the data is collected. 'Data' is the main source of data collection methods. A data can be classified into two types, namely primary data and secondary data. In any research or business process, the primary importance of data collection is that it helps to determine many important things about the organization, especially performance. So, in all the streams, the data collection process plays an important role. The data collection method is divided into two categories according to the type of data, namely,
Methods of Primary data collection
• Secondary methods of data collection
The various types of methods of data collection and their benefits and limitations are explained in this article.
Primary methods for data collection
Primary data or raw data is a type of information that is obtained by experiments, surveys or observations straight from the first-hand source. The primary method of data collection is further categorized into two types. They are
- Quantitative methods for data collection
- Methods for qualitative data collection
Let us discuss the various methods used in these two methods of data collection to collect data.
Methods for Quantitative data collection
It is based on mathematical calculations using different formats, such as close-ended questions, methods of correlation and regression, measures of mean, median or mode. This method is cheaper than qualitative methods of collecting data and can be applied within a short period of time.
Methods for Qualitative data collection
Any mathematical calculations are not involved. This method is closely associated with non-quantifiable elements. Interviews, questionnaires, observations, case studies, etc. are included in this qualitative data collection method. To collect this type of data, there are several methods. They are
Observation Method
When the study relates to behavioral science, the observation method is used. This method is systematically scheduled. Many controls and checks are subject to it. The distinct kinds of observations are:
- Observation structured and unstructured
- Controlled and unregulated observation
- Participant, non-participant and observation disguised
Interview Method
The data collection technique in terms of oral or verbal responses. It is accomplished by two means, such as
- Personal interview- A person known as an interviewer is required to ask the other person face-to-face questions in this way. The personal interview, direct investigation, focused conversation, etc. may be structured or unstructured.
Telephone interview-An interviewer obtains information in this way by contacting individuals on the telephone to orally ask questions or opinions.
Questionnaire Method
The set of questions are mailed to the respondent in this method. They should read, reply, and return the questionnaire afterwards. The questions are printed on the form in the specified order. A good survey ought to have the following characteristics:
- Short and straightforward
- Must follow a logical sequence
- Provide appropriate space for responses
- Avoid technical terms
- To attract the attention of the respondent, good physical appearance such as colour and quality of the paper should be
Schedules
This method is similar, with a slight difference, to the questionnaire method. For the purpose of filling out the schedules, the enumerations are specially designated. It clarifies the purposes and objectives of the investigation and, if any, can eliminate misunderstandings. Enumerations should be trained with hard work and patience to execute their work.
Secondary Data Collection Methods
Data collected by someone other than the actual user is secondary data. It means that the data is already available, and it is being analyzed by someone. The secondary data includes magazines, journals, books, newspapers, etc. It may be either data published or data unpublished.
Published information is available in multiple resources, including
- Government publications
- Public records
- Historical and statistical documents
- Business documents
- Technical and trade journals
Unpublished data includes
- Diaries
- Letters
- Unpublished biographies, etc.
Observation is a technique that uses vision as its main data collection method. It means the use of the eyes instead of the voice and the ears. Observation is accurate observation and observation of phenomena as they arise with regard to cause and effect or mutual relationships. Observation is observing other people's behaviour as it occurs without controlling it. Therefore, recording data without asking questions is called a method of observation. The following are some examples of the data collection method of observation:
Observing the behaviour of salesmen on sales calls.
- Observing the conduct of clients in advertisements.
- Observing the consumer's response to the display of a specific product.
- Observing the retailers' stocking pattern.
Key Takeaways:
|
a) Observation-
Systematic viewing or intentional study by the eye is the observational method of data collection. In a natural or simulated situation, the observation may be done. Observation can be done either openly or via hidden cameras. This technique is useful for gathering data that individuals are unwilling or unable to provide. The merits of observation follow:
- It provides information on consumers' actual behaviour. It therefore provides greater precision than other methods.
- Limited chances of bias.
- It is the simplest method and the most non-technical one. A small amount of training can make his observation ideal.
It helps to obtain data that consumers are reluctant or unable to provide. Any rigid position does not bind the researcher. According to the changes in the problem, he can change.
It makes it possible for a researcher to record events as they happen. It is possible through observation to collect more satisfactory and in-depth material. To classify the data, a researcher is in a better position.
Limitations of Observation Method:
- Only certain aspects of consumer behaviour are helpful.
- Due to the nature of certain elements, planned observation is very often not possible. Observation is restricted by the length of events.
- It only provides data on how customers behave and does not provide any information as to why customers behave. It is not possible to quantify observational data.
• There is a probability of bias. Observation gives incorrect results as a result.
This tool is time-consuming and costly.
b) Experimental
In establishing cause-and-effect relations, these are more efficient. Data collection is performed in an experiment in such a way that relatively unambiguous interpretation is permitted. Experiments determine and prove relationships of cause and effect in the realm of the sciences. These are also used in the New Century's marketing research efforts, although these were also in use during the last century.
The basis of marketing science has become experimentation. In addition, the design of experiments makes it possible to analyze research findings rationally. It also provides a role model against which it is possible to compare other research designs. An experiment is defined as follows by Boyd, Westfall, and Stanch': An experiment is a research process in which one or more variables are manipulated under conditions that allow data to be collected in an unconfused manner, showing the effects, if any, of such variables.' Therefore, the difference between experimental research and non-experimental research, according to this definition, may be a matter of degree rather than kind.
Experiments create artificial conditions in order to obtain the data necessary for the research to be completed. In order to measure the data in the most precise manner or format, such conditions may also be necessary. Because of this phenomenon, situations are invariably created to conduct experiments and respondents may not feel at home while cooperating with researchers during experiments.
Nevertheless, this approach has the benefit that researchers can analyze the actual cause-and-effect relationships between any two variables that are relevant to the study. Other variables are either not present or, to the minimum extent, present.
Thus, the data gathered by the researchers are representatives of the actual relationships of cause and effect between the two variables given. In addition, it is possible to change one of the variables (only in a controlled experimental setting) and to measure the effects of such modifications on the other. Experiments are therefore popular among researchers, particularly those in scientific fields.
Two types of experiments are available: laboratory and field. In laboratories, laboratory experiments are conducted. Test subjects are brought to these laboratories and different tests are administered. A TV commercial or journal advertisement could be shown to them. A small programme prepared by trained artists could also be made for them to witness. Then, either on a recording medium or in writing, the responses of test subjects are measured. This approach is artificial, provides results quickly, is less expensive and requires fewer attempts. In the field where test subjects are normally found or identified, field experiments are conducted. Test subjects are asked questions, TV commercials are shown or some leaflets are made available for reading. It is even possible to tell them to try some products. Their responses are measured in some leaflets that are read. It is even possible to tell them to try some products. Their responses are measured and dutifully recorded in the sport. This approach is reliable, time-consuming, more expensive, and involves more effort. If the vital factors to be considered are not cost and operational problems, field experiments are the ones that the researcher should always choose. In addition, Boyd et al have confirmed that laboratory experiments produce results much similar to those produced by field experiments, but not comparable to those produced by descriptive studied problems can also be studied by non-experimental methods; researchers would have to be careful when using them because their results may not be very useful or reliable.
c) Interview
1. DIRECT PERSONAL INTERVIEWS
The investigator personally meets the individuals concerned and collects from them the information required. This method can prove very expensive and time-consuming if the area to be covered is vast. Nevertheless, this approach is important for certain laboratory experiments or localized queries. Errors are likely to influence the outcomes because of the investigator's personal bias.
2. INDIRECT PERSONAL INTERVIEWS
Whenever direct sources do not exist, we interview third parties or witnesses who have information, or the informants hesitate to respond for some reason or another. The reliance is not placed solely on the evidence of one witness, because some of the informants are likely to intentionally give wrong information.
3. COLLECTION THROUGH QUESTIONNAIRES
In general, the questionnaires are sent by email to inquire about several relevant questions. There is a space for entering the requested data in the questionnaires. The informants are requested within a certain period to return the questionnaires to the investigator. This technique is inexpensive, reasonably expeditious and good for comprehensive inquiries. However, when there is no incentive involved, only a small percentage of recipients respond to questionnaires.
4. COLLECTION THROUGH ENUMERATORS
In this method, the data was collected by trained enumerators. They help the informants correctly create the entries in the schedules or questionnaires. If the enumerator is well trained, experienced, and discreet, through this method, you can get the most reliable information. Enumerator driven approach works best for a large scale governmental or an organizational inquiry. This method cannot be adopted by private individuals or institutions as its casting would be prohibitive for them.
5. COLLECTION THROUGH LOCAL SOURCES
In this method, the agents or local correspondents gather and send the information requested, using their judgment as to the best way to obtain it, but there is no formal data collection. This technique is cheap and expeditious, but it only provides estimates. It may involve the bias of local agents.
Key Takeaways:
|
d) Survey
Two main purposes are followed by the survey method:
1. Describing certain population elements or features and/or characteristics
2. Testing assumptions about the nature of interactions within a population.
- Online Surveys
Online surveys are the most cost-effective and, compared to other media, can reach the maximum number of people. The output of these surveys is much more widespread than the other methods of collecting data. In circumstances where the target sample has more than one question to ask, some researchers prefer to conduct online surveys over traditional face-to-face or telephone surveys.
For exponentially more accurate survey data collection, online surveys are efficient and therefore require computational logic and branching technologies compared to any other traditional means of surveying. They are straightforward in their implementation and the respondents take a minimum of time. In comparison to the other methods, the investment required for survey data collection using online surveys is also negligible. The findings are gathered in real-time to analyze and decide corrective measures for researchers.
A very good example of an online survey is a hotel chain that uses an online survey after a stay or an event at the property to gather guest satisfaction metrics.
Learn more: Questions for the Quality of Life Survey + Sample Questionnaire Template
Online surveys are safe to conduct and secure. They are quite useful in times of global crisis, as there is no in-person interaction or any direct form of communication. During the pandemic, for example, many organizations shifted to contactless surveys. It helped them ensure that the staff did not experience any symptoms of COVID-19 before they came to the office.
Learn more: Health-Screen Contactless Survey Questions + Template Sample Questionnaire
- Face-to-face Surveys
It is much more efficient than the other media to obtain data from respondents via face-to-face media because respondents generally tend to trust surveyors and provide honest and clear feedback on the subject in-hand.
Researchers can easily identify whether their respondents are uncomfortable with the questions asked and can be highly productive in the event that the discussion involves sensitive topics. This method of online data collection requires more cost-investment compared to the other methods. Researchers must be trained to obtain accurate information, depending on geographic or psychographic segmentation.
For example, a job assessment survey between an HR or a manager with the employee is performed in person. This method works best face-to-face as it is possible to gather as accurate information as possible from the data collection.
- Telephone Surveys
Telephone surveys require much less investment than surveys conducted face-to-face. Telephone surveys cost as much or a little more than online surveys, depending on the required reach. It takes less effort and manpower to contact respondents via the telephone media than the face-to-face survey medium.
If interviewers are located at the same location, their questions can be cross-checked to ensure that the target audience is asked error-free questions. Because of the bridge of the medium, the main drawback of conducting telephone surveys is that establishing a friendly equation with the respondent becomes difficult. In their feedback over the phone, respondents are also highly likely to choose to remain anonymous as the reliability associated with the researcher can be questioned.
For example, if a retail giant wants to understand buying decisions, they can conduct a survey of telephone, motivation, and purchasing experience to gather information about the entire buying experience.
Paper Surveys
Paper surveys are the other commonly used survey method. Such surveys can be used where laptops, computers, and tablets are unable to go, and therefore use the age-old data collection method; pen and paper. This approach helps to collect survey data in field research and helps to increase the number of collected responses and the validity of those responses.
A popular example of a paper survey or use case is a survey of fast food restaurants where the fast-food chain would like to gather feedback on its customers' dining experience.
It is possible to broadly divide the survey method into three categories: mail survey, telephone survey, and personal interview. The descriptions of each of these methods are briefly explained on the following table [2]:
Survey method | Description |
Mail survey | A written survey that is self-administered |
Telephone survey | A survey conducted by telephone in which the questions are read to the respondents |
Personal interview | A face-to-face interview of the respondent |
Major survey methods and their descriptions
Alternatively, the most popular variations of surveys include questionnaires, interviews and review of documentation from the perspective of practicality. The main benefits and drawbacks associated with these primary methods of data collection are explained as follows:
Method | Purpose | Advantages | Disadvantages |
Questionnaires | Conducted in order to gather large size of information in a short period of time | Members of the sample group can remain anonymous Considerably cheaper than most of the other primary data collection methods Possibility of generating large amount of data | Difficulties of ensuring greater depth for the research The problem of the ‘first choice selection’ |
Interviews | Conducted in order to reflect emotions and experiences, and explore issues with a greater focus | The possibility to direct the process of data collection The possibility to collect the specific type of information required | Great amount of time required in order to arrange and conduct interviews and primary data collection. Additional costs might be incurred associated with arranging and conducting interviews, travelling etc. Potential for interviewee bias |
Documentation review | Conducted in order to study issues that have developed over a specific period | Possibility to retrieve comprehensive information | Challenges associated with access to documentation Inflexibility of the research process |
Advantages of Survey Method
1. Compared to other primary data collection methods, such as observation and experiments, surveys can be carried out quicker and cheaper.
2. It is relatively easy to analyze primary data collected through surveys.
Disadvantages of Survey Method
1. In some cases, respondents' reluctance or inability to provide information
2. Respondents' human bias, i.e. respondents providing inaccurate information
3. Differences in understanding: it is difficult to formulate questions so that each respondent will have exactly the same meaning.
e) Survey tool & Question Types
- Questionnaire is used as a research tool consisting of a list of questions, along with the choice of answers, printed or typed in a sequence on a form used by respondents to obtain specific information.
- In general, questionnaires are sent either by post or mail to the individuals concerned, asking them to answer the questions and return them.
- Informants are expected in the space provided in the questionnaire itself to read and understand the questions and reply.
- The questionnaire is prepared in such a way that it translates the information required into a series of questions that can and will be answered by informants.
The following are characteristics of good questionnaires:
- It should consist of a list of questions that is well-written.
- To create interest among respondents, the questionnaire should deal with an important or important topic.
- Only data that cannot be obtained from other sources should be requested.
- It should be as short as possible, but it should be detailed.
- It ought to be appealing.
- Directions should be straightforward and complete.
- It should be represented in a good psychological order, from general to more particular responses.
- It is necessary to avoid double negatives in questions.
- It is also necessary to avoid putting two questions into one question. Each question should attempt to obtain only one particular information.
- It should be designed to collect information that can subsequently be used as analysis data.
Two forms can be taken from the questions asked:
Restricted questions, also referred to as closed-ended, ask the respondent to make decisions, yes or no, check items on a list, or choose answers from multiple choices.
It is easy to tabulate and compile restricted questions.
Unrestricted questions are open-ended and allow respondents to share feelings and opinions about the matter at hand that are important to them.
It is not easy to tabulate and compile unrestricted questions, but they allow respondents to reveal the depth of their emotions.
If the goal is to compile data from all respondents, then it is better to stick to restricted questions that are easily quantified.
If it is necessary to study degrees of emotion or depth of feeling, then develop a scale to quantify those feelings.
- Questionnaires are a common and inexpensive research tool used, depending on the need, by private businesses, government departments, individuals, groups, NGOs etc. to obtain feedback, research, collect customer, customer or general public data.
- The most significant portion of primary surveys are questionnaires.
Advantages of Questionnaire
- One of the biggest advantages of questionnaires is their uniformity, with all respondents seeing exactly the same issues.
- Regardless of the size of the universe, it is an inexpensive technique.
- Free from the interviewer's bias, as in his own words, the respondents answer the questions.
- Respondents have sufficient time to think and reply.
- Due to its extensive coverage, it is also possible to easily reach respondents living in distant areas.
Limitations of Questionnaire
- In the questionnaire, the risk of collecting inaccurate and incomplete information is high, as it may happen that people may not be able to correctly understand the question.
- • The non-response rate is high.
f) Scaling techniques
A survey scale is a set of answer options that cover a range of opinions on a subject, either numeric or verbal. It is always part of a question that is closed-ended (a question that presents respondents with pre-populated answer choices).
So what is the question for the Likert scale survey? It's a question that uses a 5 or 7-point scale that ranges from one extreme attitude to another, sometimes referred to as a satisfaction scale. In its scale, the Likert survey question typically includes a moderate or neutral option.
Likert scales are quite popular because they are one of the most reliable ways to measure opinions, perceptions, and behaviors (named after their creator, American social scientist Rensis Likert).
Likert-type questions will give you more granular feedback on whether your product was just "good enough" or (hopefully) "excellent," compared to binary questions, which give you only two answer options. And Likert questions can help you decide whether a recent company outing left employees feeling "very satisfied," "somewhat dissatisfied," or maybe just neutral.
This technique will allow you to uncover degrees of opinion that could make a real difference in understanding the feedback you get. And it can also identify the areas where you may want to improve your product or service.
With the Likert scale, individuals’ state how much they agree or disagree with a specific statement; with the semantic differential scale, individuals who complete the questionnaire decide how much the item has a characteristic or quality.
A Likert scale measures agreement or discrepancy with a declaration. The scale ranges from "strongly agree" to the centers "disagree" with neutral. You can quantify the results easily. A 5, neutral as 3 and the lowest agreement or no agreement are customary for the highest agreement to be rated. This makes comparing outcomes easy. An example of a question regarding the Likert scale would be:
My job is drudgery. Strongly disagree / Disagree / Neutral / Agree / Strongly Agree
The Semantic scale is, on the other hand, very subjective on the part of the user. Plus, there is no "neutral" answer, making it hard to quantify. With this type of scale, you can't pinpoint direct metrics. For example, if "on hold wait times are less than five minutes," you couldn't find out. You can only ask how the respondent feels about wait times (acceptable/unacceptable). A potential benefit of the semantic differential scale is that only two options that are opposite are actually presented to the user, where the Likert scale has a range of intensities to choose from.
The two scales have some middle ground. Technically, you can design the Likert Scale with opposites, such as love/hate or happy/sad. For the semantic differential scale, this is similar to the idea of polar opposite adjectives. Although polar opposites may be "like/hate," they are not adjectives* and can therefore not be included in a semantic differential questionnaire.
Which scale you choose depends largely on what data you want to know. As far as the three semantic differential scales are concerned, a rule of thumb is: choose the unlabeled scale if you think your respondents are good at abstract thinking, and if you survey the general public, the labeled one. If you are sure that your respondents have numerical aptitude, choose the numerical scale.
* The word "like" might be an adjective, as in "the twins are very like." But it is a verb in this love/hate context.
The semantic differential scale measures things' connotative meaning. For example, while the word "heart" is defined as the organ that pumps blood around the body, love or heartache is the connotative meaning. In surveys, the scale is used to gauge the feelings of people towards a specific subject.
Connotation vs. Denotation
The exact meaning of a word is denotation. If you looked in a dictionary, that's what you would find. A few instances of denotation:
- Sweater: a knitted dress worn to keep warm.
- Abyss: A deep chasm, or seemingly a lower one.
- Diamond: A stone that is precious, clear and colorless, made from pure carbon.
- Lion: A big, fawn-colored cat living with pride.
- Connotation is an idea or a feeling that the word invokes . In pop culture and literature, the above words have many implied meanings, including:
- Sweater: friendship, fireplaces and hot cocoa.
- Abyss: a really bad situation.
- Diamond: anyone who stands out and “shines.”
- Lion: bravery.
The Semantic Differential Scale
A differential semantic scale measures attitudes towards something. For example, with the following scale, you could measure the attitude of a person to the word 'Work':
The terms to the left and right are polar opposite adjectives. For example, “necessary” is the opposite of “unnecessary.” There are usually five intervals, although some scales have seven. You could have radio buttons or boxes to check rather than blank spaces to mark.
Three types of semantic differential scale exist:
1. Points of scale are unlabeled.
2. There are labeled scale points.
3. Points of scale are numerated (i.e. from 1 to 5).
Garland found that respondents preferred labeled scale points because they are easier to understand, easier to complete, and more useful for opinions to be expressed. It seems that labeling the scales does not introduce bias.
Key Takeaways:
|
Reference Books:
- Marketing research and applied orientation, Naresh K Malhotra, Pearson
- Statistics for management, Levin and Reuben, Prentice Hall.
- Research Methods for Management: S Shajahan, Jaico Publishing