Pandas How to Choose Columns to Read

                        In [1]:                        import            pandas            as            pd          
  • Titanic data

    This tutorial uses the Titanic information set, stored every bit CSV. The information consists of the post-obit information columns:

    • PassengerId: Id of every passenger.

    • Survived: This characteristic take value 0 and 1. 0 for not survived and i for survived.

    • Pclass: In that location are iii classes: Class 1, Grade 2 and Class 3.

    • Proper noun: Proper name of passenger.

    • Sex: Gender of passenger.

    • Age: Historic period of rider.

    • SibSp: Indication that rider have siblings and spouse.

    • Parch: Whether a passenger is lonely or have family unit.

    • Ticket: Ticket number of passenger.

    • Fare: Indicating the fare.

    • Cabin: The cabin of passenger.

    • Embarked: The embarked category.

    To raw data

                                        In [ii]:                                    titanic                  =                  pd                  .                  read_csv                  (                  "data/titanic.csv"                  )                  In [3]:                                    titanic                  .                  caput                  ()                  Out[3]:                                                                          PassengerId  Survived  Pclass                                               Name  ...            Ticket     Fare  Motel  Embarked                  0            ane         0       3                            Braund, Mr. Owen Harris  ...         A/5 21171   7.2500    NaN         South                  ane            2         one       1  Cumings, Mrs. John Bradley (Florence Briggs Thursday...  ...          PC 17599  71.2833    C85         C                  2            three         1       three                             Heikkinen, Miss. Laina  ...  STON/O2. 3101282   vii.9250    NaN         S                  3            4         1       1       Futrelle, Mrs. Jacques Heath (Lily May Peel)  ...            113803  53.1000   C123         Due south                  4            5         0       three                           Allen, Mr. William Henry  ...            373450   eight.0500    NaN         Due south                  [5 rows x 12 columns]                

How do I select a subset of a DataFrame

How do I select specific columns from a DataFrame

../../_images/03_subset_columns.svg
  • I'one thousand interested in the historic period of the Titanic passengers.

                                            In [4]:                                        ages                    =                    titanic                    [                    "Historic period"                    ]                    In [5]:                                        ages                    .                    head                    ()                    Out[5]:                                                            0    22.0                    i    38.0                    2    26.0                    3    35.0                    4    35.0                    Proper noun: Age, dtype: float64                  

    To select a single cavalcade, utilize foursquare brackets [] with the column proper noun of the column of interest.

Each cavalcade in a DataFrame is a Series . As a single cavalcade is selected, the returned object is a pandas Series . Nosotros can verify this by checking the type of the output:

                                In [6]:                                type                (                titanic                [                "Age"                ])                Out[6]:                                pandas.core.series.Series              

And accept a expect at the shape of the output:

                                In [7]:                                titanic                [                "Age"                ]                .                shape                Out[7]:                                (891,)              

DataFrame.shape is an aspect (call up tutorial on reading and writing, do not use parentheses for attributes) of a pandas Serial and DataFrame containing the number of rows and columns: (nrows, ncolumns). A pandas Serial is 1-dimensional and but the number of rows is returned.

  • I'm interested in the age and sex of the Titanic passengers.

                                            In [8]:                                        age_sex                    =                    titanic                    [[                    "Age"                    ,                    "Sex"                    ]]                    In [9]:                                        age_sex                    .                    caput                    ()                    Out[nine]:                                                                                  Age     Sex activity                    0  22.0    male                    1  38.0  female person                    2  26.0  female                    iii  35.0  female                    four  35.0    male                  

    To select multiple columns, use a list of column names within the selection brackets [] .

Notation

The inner square brackets define a Python list with column names, whereas the outer brackets are used to select the data from a pandas DataFrame as seen in the previous example.

The returned data blazon is a pandas DataFrame:

                                In [x]:                                type                (                titanic                [[                "Age"                ,                "Sex"                ]])                Out[ten]:                                pandas.cadre.frame.DataFrame              
                                In [11]:                                titanic                [[                "Historic period"                ,                "Sex activity"                ]]                .                shape                Out[11]:                                (891, 2)              

The selection returned a DataFrame with 891 rows and 2 columns. Call up, a DataFrame is 2-dimensional with both a row and column dimension.

How do I filter specific rows from a DataFrame

../../_images/03_subset_rows.svg
  • I'1000 interested in the passengers older than 35 years.

                                            In [12]:                                        above_35                    =                    titanic                    [                    titanic                    [                    "Age"                    ]                    >                    35                    ]                    In [13]:                                        above_35                    .                    head                    ()                    Out[13]:                                                                                  PassengerId  Survived  Pclass                                               Name     Sex  ...  Parch    Ticket     Fare Cabin  Embarked                    i             ii         1       one  Cumings, Mrs. John Bradley (Florence Briggs Th...  female person  ...      0  PC 17599  71.2833   C85         C                    6             vii         0       i                            McCarthy, Mr. Timothy J    male  ...      0     17463  51.8625   E46         S                    11           12         1       1                           Bonnell, Miss. Elizabeth  female  ...      0    113783  26.5500  C103         Southward                    xiii           fourteen         0       3                        Andersson, Mr. Anders Johan    male  ...      five    347082  31.2750   NaN         S                    15           sixteen         ane       2                   Hewlett, Mrs. (Mary D Kingcome)   female person  ...      0    248706  16.0000   NaN         South                    [5 rows ten 12 columns]                  

    To select rows based on a conditional expression, utilise a condition within the selection brackets [] .

The status inside the choice brackets titanic["Age"] > 35 checks for which rows the Age column has a value larger than 35:

                                In [14]:                                titanic                [                "Historic period"                ]                >                35                Out[14]:                                                0      False                1       True                two      False                iii      False                4      Fake                                  ...                                886    False                887    False                888    False                889    False                890    Simulated                Proper name: Age, Length: 891, dtype: bool              

The output of the provisional expression ( > , but likewise == , != , < , <= ,… would work) is actually a pandas Serial of boolean values (either True or False ) with the same number of rows as the original DataFrame . Such a Serial of boolean values can be used to filter the DataFrame by putting it in betwixt the selection brackets [] . Just rows for which the value is True will be selected.

Nosotros know from before that the original Titanic DataFrame consists of 891 rows. Let's have a expect at the number of rows which satisfy the condition by checking the shape attribute of the resulting DataFrame above_35 :

                                In [15]:                                above_35                .                shape                Out[15]:                                (217, 12)              
  • I'm interested in the Titanic passengers from cabin class ii and 3.

                                            In [16]:                                        class_23                    =                    titanic                    [                    titanic                    [                    "Pclass"                    ]                    .                    isin                    ([                    2                    ,                    3                    ])]                    In [17]:                                        class_23                    .                    head                    ()                    Out[17]:                                                                                  PassengerId  Survived  Pclass                            Proper noun     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked                    0            1         0       3         Braund, Mr. Owen Harris    male  22.0      1      0         A/5 21171   7.2500   NaN        S                    two            3         one       3          Heikkinen, Miss. Laina  female person  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S                    4            five         0       iii        Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        Due south                    5            vi         0       3                Moran, Mr. James    male   NaN      0      0            330877   8.4583   NaN        Q                    vii            viii         0       3  Palsson, Master. Gosta Leonard    male   two.0      3      1            349909  21.0750   NaN        South                  

    Similar to the provisional expression, the isin() conditional function returns a True for each row the values are in the provided list. To filter the rows based on such a function, use the provisional function inside the option brackets [] . In this case, the condition within the selection brackets titanic["Pclass"].isin([2, iii]) checks for which rows the Pclass column is either 2 or iii.

The above is equivalent to filtering by rows for which the class is either ii or iii and combining the two statements with an | (or) operator:

                                In [18]:                                class_23                =                titanic                [(                titanic                [                "Pclass"                ]                ==                2                )                |                (                titanic                [                "Pclass"                ]                ==                3                )]                In [19]:                                class_23                .                caput                ()                Out[xix]:                                                                  PassengerId  Survived  Pclass                            Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked                0            i         0       iii         Braund, Mr. Owen Harris    male  22.0      1      0         A/5 21171   7.2500   NaN        S                2            3         1       iii          Heikkinen, Miss. Laina  female  26.0      0      0  STON/O2. 3101282   vii.9250   NaN        S                iv            5         0       iii        Allen, Mr. William Henry    male  35.0      0      0            373450   eight.0500   NaN        South                five            six         0       3                Moran, Mr. James    male   NaN      0      0            330877   8.4583   NaN        Q                vii            8         0       iii  Palsson, Master. Gosta Leonard    male person   ii.0      3      1            349909  21.0750   NaN        Due south              

Note

When combining multiple provisional statements, each condition must be surrounded by parentheses () . Moreover, yous can not utilize or / and but demand to use the or operator | and the and operator & .

  • I want to work with passenger information for which the age is known.

                                            In [20]:                                        age_no_na                    =                    titanic                    [                    titanic                    [                    "Age"                    ]                    .                    notna                    ()]                    In [21]:                                        age_no_na                    .                    head                    ()                    Out[21]:                                                                                  PassengerId  Survived  Pclass                                               Name  ...            Ticket     Fare  Cabin  Embarked                    0            1         0       3                            Braund, Mr. Owen Harris  ...         A/5 21171   7.2500    NaN         S                    1            2         1       i  Cumings, Mrs. John Bradley (Florence Briggs Th...  ...          PC 17599  71.2833    C85         C                    2            three         1       3                             Heikkinen, Miss. Laina  ...  STON/O2. 3101282   vii.9250    NaN         Southward                    3            iv         ane       1       Futrelle, Mrs. Jacques Heath (Lily May Pare)  ...            113803  53.thou   C123         S                    iv            v         0       3                           Allen, Mr. William Henry  ...            373450   8.0500    NaN         S                    [5 rows ten 12 columns]                  

    The notna() provisional function returns a True for each row the values are non an Null value. As such, this tin can exist combined with the choice brackets [] to filter the data tabular array.

You might wonder what actually changed, as the first 5 lines are even so the same values. One way to verify is to cheque if the shape has changed:

                                In [22]:                                age_no_na                .                shape                Out[22]:                                (714, 12)              

To user guide

For more dedicated functions on missing values, run into the user guide department virtually handling missing data.

How do I select specific rows and columns from a DataFrame

../../_images/03_subset_columns_rows.svg
  • I'm interested in the names of the passengers older than 35 years.

                                            In [23]:                                        adult_names                    =                    titanic                    .                    loc                    [                    titanic                    [                    "Historic period"                    ]                    >                    35                    ,                    "Name"                    ]                    In [24]:                                        adult_names                    .                    head                    ()                    Out[24]:                                                            1     Cumings, Mrs. John Bradley (Florence Briggs Thursday...                    vi                               McCarthy, Mr. Timothy J                    eleven                             Bonnell, Miss. Elizabeth                    13                          Andersson, Mr. Anders Johan                    15                     Hewlett, Mrs. (Mary D Kingcome)                                        Name: Proper noun, dtype: object                  

    In this case, a subset of both rows and columns is fabricated in one go and only using selection brackets [] is not sufficient anymore. The loc / iloc operators are required in front of the selection brackets [] . When using loc / iloc , the role before the comma is the rows you want, and the office after the comma is the columns you lot desire to select.

When using the column names, row labels or a condition expression, employ the loc operator in front of the option brackets [] . For both the part earlier and after the comma, you lot can use a single characterization, a list of labels, a slice of labels, a conditional expression or a colon. Using a colon specifies y'all desire to select all rows or columns.

  • I'k interested in rows 10 till 25 and columns 3 to v.

                                            In [25]:                                        titanic                    .                    iloc                    [                    9                    :                    25                    ,                    2                    :                    v                    ]                    Out[25]:                                                                                  Pclass                                 Proper noun     Sex                    9        two  Nasser, Mrs. Nicholas (Adele Achem)  female                    x       three      Sandstrom, Miss. Marguerite Heat  female person                    11       i             Bonnell, Miss. Elizabeth  female                    12       3       Saundercock, Mr. William Henry    male                    13       3          Andersson, Mr. Anders Johan    male                    ..     ...                                  ...     ...                    20       2                 Fynney, Mr. Joseph J    male                    21       2                Beesley, Mr. Lawrence    male                    22       3          McGowan, Miss. Anna "Annie"  female                    23       ane         Sloper, Mr. William Thompson    male person                    24       3        Palsson, Miss. Torborg Danira  female                    [xvi rows x three columns]                  

    Again, a subset of both rows and columns is made in one go and just using pick brackets [] is non sufficient anymore. When specifically interested in certain rows and/or columns based on their position in the tabular array, use the iloc operator in front end of the selection brackets [] .

When selecting specific rows and/or columns with loc or iloc , new values can be assigned to the selected data. For example, to assign the proper name bearding to the first 3 elements of the third column:

                                In [26]:                                titanic                .                iloc                [                0                :                3                ,                3                ]                =                "anonymous"                In [27]:                                titanic                .                caput                ()                Out[27]:                                                                  PassengerId  Survived  Pclass                                          Name  ...            Ticket     Fare  Cabin  Embarked                0            1         0       3                                     anonymous  ...         A/five 21171   7.2500    NaN         South                ane            2         i       one                                     anonymous  ...          PC 17599  71.2833    C85         C                2            3         1       three                                     anonymous  ...  STON/O2. 3101282   7.9250    NaN         S                three            iv         1       1  Futrelle, Mrs. Jacques Heath (Lily May Peel)  ...            113803  53.1000   C123         S                4            five         0       iii                      Allen, Mr. William Henry  ...            373450   8.0500    NaN         S                [five rows x 12 columns]              

REMEMBER

  • When selecting subsets of data, square brackets [] are used.

  • Inside these brackets, you can utilize a single column/row label, a list of column/row labels, a slice of labels, a conditional expression or a colon.

  • Select specific rows and/or columns using loc when using the row and column names

  • Select specific rows and/or columns using iloc when using the positions in the table

  • You can assign new values to a selection based on loc / iloc .

salazarhicest.blogspot.com

Source: https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

0 Response to "Pandas How to Choose Columns to Read"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel