Is there a hierarchy inferring algorithm available in python

0 votes

We receive a chart of accounts with more than a thousand rows, and we spend hours categorizing this data by adding its values to hierarchies (trial and error). We are unable to achieve this using account names because they can be written in a variety of languages and formats. Exists a ready-made algorithm in Python that can infer hierarchy based on the values given?

The sample input and output are shown below; I've coloured them to help you understand.

Input Data

Output Data

I am aware that no algorithm can ever be 100% accurate, but it would still be a success if we could classify 70% to 80% of the accounts. If anyone has a suggestion for how to do this, I can also code in VBA.

I've tried running cumulative sums and differences, but I can't seem to find steps ahead.

Jan 15, 2023 in Others by Kithuzzz
• 38,000 points
416 views

1 answer to this question.

0 votes

Convert the list of values to the desired table assuming:

  • Each subgroup begins with the word total
  • The Last row has the name of the Level 1 group (i.e. Total Revenue)

Code

Process input data as a data frame

import math

import pandas as pd
from tabulate import tabulate

def grouping(df):
    ''' Convert tablular values into nested dictionary
        Assumptions
            Subgroup begin with word Total
            Overall Group is the last row
    '''
     # Empty Dataframe
    result = []
    
    # Empty data frame for subgroup
    subgroup_df = pd.DataFrame({'Level 1': pd.Series(dtype='str'),
                   'Level 2': pd.Series(dtype='str'),
                   'Level 3': pd.Series(dtype='str'),
                   'Values': pd.Series(dtype='float')})

    subgroup_total = 0
    for index, (account, value) in enumerate(zip(df['Account'], df['Values']), start = 1):
        if index == len(df):
            continue
        if math.isclose(subgroup_total, value, abs_tol  = 0.02):
            # current value equals sum of previous rows in subgroup
            subgroup_df['Level 2'] = account

            result.append(subgroup_df)
            # New empty dataframe for subgroup
            subgroup_df = pd.DataFrame({'Level 1': pd.Series(dtype='str'),
                   'Level 2': pd.Series(dtype='str'),
                   'Level 3': pd.Series(dtype='str'),
                   'Values': pd.Series(dtype='float')})
            subgroup_total = 0
        else:
            # within group
            subgroup_df.loc[len(subgroup_df.index)] = ["", "", account, value]
            subgroup_total += value
            
    result_df = pd.concat(result, ignore_index = True)
        
    result_df['Level 1'] = account
   
    return result_df

Example Usage

# Usage
# Get data from excel (workbook test.xlsx with worksheet name test
df = pd.read_excel('test.xlsx', 'test')

# Generate table from nested dictionary
table = grouping(df)

print(tabulate(table, headers='keys', tablefmt='psql'))

Output

+----+---------------+----------------+-----------+----------+
|    | Level 1       | Level 2        | Level 3   |   Values |
|----+---------------+----------------+-----------+----------|
|  0 | Total Revenue | Total Service  | Service A |    10.79 |
|  1 | Total Revenue | Total Products | Product A |     4.93 |
|  2 | Total Revenue | Total Products | Product B |     4.81 |
|  3 | Total Revenue | Total Products | Product C |     4.18 |
|  4 | Total Revenue | Total Products | Product D |     4.52 |
|  5 | Total Revenue | Total Food     | Food A    |     5.09 |
|  6 | Total Revenue | Total Food     | Food B    |     5.6  |
|  7 | Total Revenue | Total Food     | Food C    |     5.12 |
|  8 | Total Revenue | Total Food     | Food D    |     5.24 |
|  9 | Total Revenue | Total Food     | Food E    |     5.19 |
| 10 | Total Revenue | Total Food     | Food F    |     5.75 |
+----+---------------+----------------+-----------+----------+
answered Jan 15, 2023 by narikkadan
• 63,600 points

Related Questions In Others

0 votes
1 answer

Is there a color code for transparent in HTML?

There is no Transparent color code, but ...READ MORE

answered Feb 11, 2022 in Others by Soham
• 9,710 points
1,453 views
0 votes
1 answer

Is there a maximum number of formula fields allowed in Excel (2010)

See http://office.microsoft.com/en-us/excel-help/excel-specifications-and-limits-HP010073849.aspx for limits on specs it doesn't indicate ...READ MORE

answered Sep 30, 2022 in Others by narikkadan
• 63,600 points
852 views
0 votes
1 answer

Is there a way in Microsoft Excel to give specific bins different bin widths when making a histogram plot?

Excel built-in histogram tool only allows equal ...READ MORE

answered Oct 2, 2022 in Others by narikkadan
• 63,600 points
1,986 views
0 votes
1 answer

Is there a coalesce-like function in Excel?

To have Excel analyze the formula as ...READ MORE

answered Oct 21, 2022 in Others by narikkadan
• 63,600 points
3,846 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,434 views
0 votes
1 answer
0 votes
1 answer

Is there any way in python to auto-correct spelling mistake in multiple rows of an excel files of a single column?

Use Spellchecker for doing your stuff: import pandas ...READ MORE

answered Oct 14, 2022 in Others by narikkadan
• 63,600 points
2,054 views
0 votes
1 answer

Is there a function to unhide columns in excel through python

Excel file : df.to_excel('demofile.xlsx',index=False) import openpyxl py = openpyxl.load_workbook('demofile.xlsx') exlsheet = ...READ MORE

answered Oct 18, 2022 in Others by narikkadan
• 63,600 points
2,256 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP