Introduction
Let’s understand the data types and data structures in data science where data is the foundation of every analysis and modeling. Before you analyze data or build machine learning models, you need to understand the different types of data and how data is organized. These concepts are the building blocks for more complex data science techniques.
What are Data Types?
Data types tells the values that variables can hold, how they are stored in memory, and what operations can be performed on them.
Commonly used data types:
- Numerical Data
- Integers: Whole Numbers. (example: 1,2,3,…-3,-2,-1)
- Floats: Decimal Numbers. (example: 3.14, 2.177, -0.75)
2. Categorical Data
- Binary: Data with only two possible values (example: True/False, 0/1)
- Nominal: Data without a specific order (example: colors, categories)
- Ordinal: Data with a specific order (example: low, medium, high)
3. Text Data
- Strings: Sequences of characters (example: “Hello, Welcome to Visionsofai.com”)
4. Date/Time Data
- Timestamps: Specific points in time (example: 19-March-2024 10:45:55)
Knowing the data types is important because it determines what operations you can perform on the data, for example, you can add, or multiply, or divide, or subtract numerical data, but not string data i.e., text data.
What are Data Structures?
Data Structures are the way data is organized and stored in a computers memory. choosing the right data structure can make the code more efficient
Here are some common data structures in data science:
- List
- Ordered collection of elements
- Allows duplicate values
- Example in Python:
- my_list = [1,2,3,4,5,6,7,8,9]
- Tuples
- Ordered collection of elements
- Allows duplicate values
- Elements cannot be modified
- Example in Python: my_tuple = (1,2,3,4,5,6,7,8,9)
- Sets
- Unordered collection of unique elements
- Does not allow duplicate values
- Example in Python: my_set = {1,2,3,4,5,6,7,8,9}
- Dictionaries
- Unordered collection of key-value pairs
- Keys must be unique
- Example in Python:
- my_dict = {‘apple’: 2, ‘banana’: 3, ‘orange’: 1}
- NumPy Arrays
- Collection of elements with the same data type
- Efficient for numerical operations
- Example in Python:
- import numpy as np;
- my_array = np.array([1, 2, 3, 4, 5])
- Pandas DataFrames
- Tabular data with rows and columns
- Can have different data types in each column
- Powerful for data manipulation and analysis
- Example in Python:
- import pandas as pd;
- data = {‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’], ‘Age’: [25, 30, 35]};
- df = pd.DataFrame(data)
The choice of data structure depends on your specific needs, such as memory efficiency, ease of access, or the need for specific operations.
In data science, understanding data types and data structures is essential for working with data effectively. It forms the foundation for more advanced techniques, such as data preprocessing, feature engineering, and model building. By mastering these concepts, you’ll be better equipped to handle and analyze data, leading to more accurate and insightful results.