Pandas

Subset a df according to string present in columns name

df.loc[:, df.columns.str.startswith('alp')]
df.loc[:, df.columns.str.contains('alp')]

Select a df according to column positions

Here from column seven till the end

df.iloc[:, 7:]

Rename columns

df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

Rename columns or index

df_new = df.rename(columns={'A': 'Col_1'}, index={'ONE': 'Row_1'})

Didnt work for me

df_merged_selected.index.rename('Row ID', inplace = True) does

Rename columns by position

df.rename(columns={ df.columns[1]: "your value" }, inplace = True)

Add prefix (or suffix) to colnames

DataFrame.add_prefix(prefix)

Check df datatype

df.dtypes

Convert to a specific type

df.year.astype(int)

From continuous to categorical

pd.cut(df.Age,bins=[0,2,17,65,99],labels=['Toddler/Baby','Child','Adult','Elderly'])

Merge two df based on index

pd.merge(df1, df2, left_index=True, right_index=True)

Merge two df based on columns

pd.merge(student_df, staff_df, how='left', left_on='Name', right_on='Name')

replace specific string in values

df['Column2'] = df.Column2.str.replace('b,?' , '')

Replace string by Nan

metadata_lat_lon_df = metadata_lat_lon_df.replace('nd', np.nan)

Here we observe full NANA rows and full NAN columns in the latest df. We drop them

selected_samples.dropna(how='all', inplace = True) selected_samples.dropna(how='all', axis = 1, inplace = True)

display all infos of a df

data_train.info(verbose=True, show_counts=True)

drop column according to regex

df = df[df.columns.drop(list(df.filter(regex='Test')))]

drop columns according to list

# %%
colsToDrop = [     'BARCODE',            'PLATESET',                'WELL',
            'SUBSTANCE_NAME',        'Full_Species',               'Genus',
                  'Sp_alone',             'Species',             'Famille']

df_merged_selected=df_merged.drop(colsToDrop, axis=1)
df_merged_selected

If Na replace with value of the same row but another column

https://stackoverflow.com/a/29177664

df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()

Extract digits from a string

https://stackoverflow.com/a/37683738

df.A.str.extract('(\d+)')

Create multiples columns values conditionally using np.where

https://stackoverflow.com/a/19913845

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
conditions = [
    (df['Set'] == 'Z') & (df['Type'] == 'A'),
    (df['Set'] == 'Z') & (df['Type'] == 'B'),
    (df['Type'] == 'B')]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
print(df)

drop duplicates according to multiple coloumns

DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False) https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html

df.drop_duplicates(subset=['a', 'b'], keep='first', inplace=True, ignore_index=False)

append df created in a for loop

https://stackoverflow.com/a/28670223

appended_data = [] for infile in glob.glob("*.xlsx"): data = pandas.read_excel(infile)

# store DataFrame in list
appended_data.append(data)

see pd.concat documentation for more info

appended_data = pd.concat(appended_data)

write DataFrame to an excel sheet

appended_data.to_excel('appended.xlsx')