Parsing from Excel multisheet file: List comprehension between columns

Question

I'm attempting to parse a large Excel file with numerous sheets. Each sheet has a column that contains the following data (three sheets = three columns):ReceivedEmail    OpenedEmail    ClickedURL
aaaa@aaa.com     gggg@aaa.com   aaaa@aaa.com
bbbb@aaa.com     dddd@aaa.com   rrrr@aaa.com
cccc@aaa.com     rrrr@aaa.com
dddd@aaa.com     aaaa@aaa.com
eeee@aaa.com     oooo@aaa.com
ffff@aaa.com
gggg@aaa.com
rrrr@aaa.com
qqqq@aaa.com
oooo@aaa.comWhat I'm looking for is a single table that keeps the first column of sheet one, the one with all the information about ReceivedEmail (persons we mass e-mailed). Instead of repeating the emails in the successive sheets' first columns, I want to utilize list comprehension to determine whether OpenedEmail is present in ReceivedEmail and output 1 otherwise.Here's what I did so far:import pandas as pd
xl = pd.ExcelFile(path_to_file)
xl.sheet_names
['ReceivedEmail', 'OpenedEmail', 'ClickedURL']
df = xl.parse(sheet_name=xl.sheet_names[0], header=None)
df.rename(columns={df.columns[0]:xl.sheet_names[0]}, inplace=True);
df.columns[0]
['ReceivedEmail']
# then I created a buffer dataframe to check next columns
df_buffer = xl.parse(sheet_name=xl.sheet_names[1], header=None)
df_buffer.rename(columns={df_buffer.columns[0]:xl.sheet_names[1]}, inplace=True);
But then when I run list comprehension like this:df[df_buffer.columns[0]] = [1 if x in df[df.columns[0]] else 0 for x in df_buffer[df_buffer.columns[0]]]
I get an error:ValueError: Length of values does not match length of indexAny clue how to solve this error or handle the problem in a smart way? I am doing manually to see if it works, then I could do a looping later, but I am stuck with the error.End result should be:ReceivedEmail    OpenedEmail    ClickedURL
aaaa@aaa.com         1              1
bbbb@aaa.com         0              0      
cccc@aaa.com         0              0     
dddd@aaa.com         1              0
eeee@aaa.com         0              0    
ffff@aaa.com         0              0  
gggg@aaa.com         1              0
rrrr@aaa.com         1              1
qqqq@aaa.com         0              0
oooo@aaa.com         1              0

narikkadan · Answer

You can use&#160;read_excel&#160;with parameter&#160;sheetname=None&#160;for return all sheets to ordered dictionary of DataFrames:Notice:Each sheet have one column.dfs = pd.read_excel('file.xlsx', sheetname=None)
print (dfs)
OrderedDict([('ReceivedEmail',               a
0  aaaa@aaa.com
1  bbbb@aaa.com
2  cccc@aaa.com
3  dddd@aaa.com
4  eeee@aaa.com
5  ffff@aaa.com
6  gggg@aaa.com
7  rrrr@aaa.com
8  qqqq@aaa.com
9  oooo@aaa.com), ('OpenedEmail',               a
0  gggg@aaa.com
1  dddd@aaa.com
2  rrrr@aaa.com
3  aaaa@aaa.com
4  oooo@aaa.com), ('ClickedURL',               a
0  aaaa@aaa.com
1  rrrr@aaa.com)])
Then join together and change order by subset&#160;[]&#160;and for each column from second check membership by&#160;isin, last convert boolena mask to integers.cols = list(dfs.keys())
df = pd.concat({k: v.iloc[:, 0] for k, v in dfs.items()}, axis=1)[cols]

df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: df.iloc[:, 0].isin(x)).astype(int)
print (df)
  ReceivedEmail OpenedEmail ClickedURL
0  aaaa@aaa.com           1          1
1  bbbb@aaa.com           0          0
2  cccc@aaa.com           0          0
3  dddd@aaa.com           1          0
4  eeee@aaa.com           0          0
5  ffff@aaa.com           0          0
6  gggg@aaa.com           1          0
7  rrrr@aaa.com           1          1
8  qqqq@aaa.com           0          0
9  oooo@aaa.com           1          0

Parsing from Excel multisheet file List comprehension between columns

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Others

Removing duplicates from Excel rows by adding values of some columns

How can I convert excel to PDF by Libreoffice and keep all format from excel file?

Download file from URL in Excel 2019 (it works on Excel 2007)

Excel: How to merge two columns into one (from different sheets or separated columns)

how do i change string to a list?

how can i randomly select items from a list?

how can i count the items in a list?

how do i use the enumerate function inside a list?

How to find out how many rows and columns to read from an Excel file with PHPExcel?

Excel vba domdocument parsing xml from TNT tracking system: in some pcs object load return no document

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES