How to efficiently hash SHA 256 in golang data where only the last few bytes changes

0 votes

Assuming you had 80 bytes of data and only the last 4 bytes was constantly changing, how would you efficiently hash the total 80 bytes using Go. In essence, the first 76 bytes are the same, while the last 4 bytes keeps changing. Ideally, you want to keep a copy of the hash digest for the first 76 bytes and just keep changing the last 4.

Jun 27, 2018 in Blockchain by charlie_brown
• 7,720 points
5,594 views

1 answer to this question.

0 votes

You can try the following examples on the Go Playground. Benchmark results is at the end.

Note: the implementations below are not safe for concurrent use; I intentionally made them like this to be simpler and faster.

Fastest when using only public API (always hashes all input)

The general concept and interface of Go's hash algorithms is the hash.Hash interface. This does not allow you to save the state of the hasher and to return or rewind to the saved state. So using the public hash APIs of the Go standard lib, you always have to calculate the hash from start.

What the public API offers is to reuse an already constructed hasher to calculate the hash of a new input, using the Hash.Reset() method. This is nice so that no (memory) allocations will be needed to calculate multiple hash values. Also you may take advantage of the optional slice that may be passed to Hash.Sum() which is used to append the current hash to. This is nice so that no allocations will be needed to receive the hash results either.

Here's an example that takes advantage of these:

type Cached1 struct {
    hasher hash.Hash
    result [sha256.Size]byte
}

func NewCached1() *Cached1 {
    return &Cached1{hasher: sha256.New()}
}

func (c *Cached1) Sum(data []byte) []byte {
    c.hasher.Reset()
    c.hasher.Write(data)
    return c.hasher.Sum(c.result[:0])
}

Test data

We'll use the following test data:

var fixed = bytes.Repeat([]byte{1}, 76)
var variantA = []byte{1, 1, 1, 1}
var variantB = []byte{2, 2, 2, 2}

var data = append(append([]byte{}, fixed...), variantA...)
var data2 = append(append([]byte{}, fixed...), variantB...)

var c1 = NewCached1()

First let's get authentic results (to verify if our hasher works correctly):

fmt.Printf("%x\n", sha256.Sum256(data))
fmt.Printf("%x\n", sha256.Sum256(data2))

Output:

fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c

Now let's check our Cached1 hasher:

fmt.Printf("%x\n", c1.Sum(data))
fmt.Printf("%x\n", c1.Sum(data2))

Output is the same:

fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c

Even faster but may break (in future Go releases): hashes only the last 4 bytes

Now let's see a less flexible solution which truly calculates the hash of the first 76 fixed part only once.

The hasher of the crypto/sha256 package is the unexported sha256.digest type (more precisely a pointer to this type):

// digest represents the partial evaluation of a checksum.
type digest struct {
    h     [8]uint32
    x     [chunk]byte
    nx    int
    len   uint64
    is224 bool // mark if this digest is SHA-224
}

A value of the digest struct type basically holds the current state of the hasher.

What we may do is feed the hasher the fixed, first 76 bytes, and then save this struct value. When we need to caclulate the hash of some 80 bytes data where the first 76 is the same, we use this saved value as a starting point, and then feed the varying last 4 bytes.

Note that it's enough to simply save this struct value as it contains no pointers and no descriptor types like slices and maps. Else we would also have to make a copy of those, but we're "lucky". So this solution would need adjustment if a future implementation of crypto/sha256 would add a pointer or slice field for example.

Since sha256.digest is unexported, we can only use reflection (reflect package) to achieve our goals, which inherently will add some delays to computation.

Example implementation that does this:

type Cached2 struct {
    origv   reflect.Value
    hasherv reflect.Value
    hasher  hash.Hash

    result [sha256.Size]byte
}

func NewCached2(fixed []byte) *Cached2 {
    h := sha256.New()
    h.Write(fixed)

    c := &Cached2{origv: reflect.ValueOf(h).Elem()}
    hasherv := reflect.New(c.origv.Type())
    c.hasher = hasherv.Interface().(hash.Hash)
    c.hasherv = hasherv.Elem()

    return c
}

func (c *Cached2) Sum(data []byte) []byte {
    // Set state of the fixed hash:
    c.hasherv.Set(c.origv)

    c.hasher.Write(data)
    return c.hasher.Sum(c.result[:0])
}

Testing it:

var c2 = NewCached2(fixed)
fmt.Printf("%x\n", c2.Sum(variantA))
fmt.Printf("%x\n", c2.Sum(variantB))

Output is again the same:

fb8e69bdfa2ad15be7cc8a346b74e773d059f96cfc92da89e631895422fe966a
10ef52823dad5d1212e8ac83b54c001bfb9a03dc0c7c3c83246fb988aa788c0c

So it works.

answered Jun 27, 2018 by aryya
• 7,460 points

Related Questions In Blockchain

0 votes
1 answer

In a Blockchain, how difficult is it to modify the third to last block?

Technically, it's not difficult at all, all ...READ MORE

answered Apr 20, 2018 in Blockchain by Christine
• 15,790 points
1,060 views
0 votes
1 answer

How to check the data integrity logic in proof of work mining?

The proof of work is actually works ...READ MORE

answered May 8, 2018 in Blockchain by Johnathon
• 9,090 points
854 views
0 votes
1 answer

How to set the hex-encoded data field in a Web3j Ethereum transaction?

You can use the "data" field of ...READ MORE

answered Oct 15, 2018 in Blockchain by Omkar
• 69,220 points
3,036 views
0 votes
1 answer
0 votes
1 answer

Truffle tests not running after truffle init

This was a bug. They've fixed it. ...READ MORE

answered Sep 11, 2018 in Blockchain by Christine
• 15,790 points
1,969 views
+1 vote
1 answer

Protocols used in a distributed/dlt system for the nodes to establish communication

yes all are over TCP/IP connections secured ...READ MORE

answered Aug 6, 2018 in Blockchain by aryya
• 7,460 points
1,529 views
+1 vote
3 answers

Removing double quotes from a string from JSON response in PHP

Just remove the json_encode call, and it should work: $resp ...READ MORE

answered Sep 12, 2018 in Blockchain by digger
• 26,740 points
45,411 views
0 votes
1 answer

Hyperledger Sawtooth vs Quorum in concurrency and speed Ask

Summary: Both should provide similar reliability of ...READ MORE

answered Sep 26, 2018 in IoT (Internet of Things) by Upasana
• 8,620 points
1,499 views
0 votes
1 answer

How to prevent the smart contract from being modified and deployed in the blockchain network?

To expand on Matthew's answer, each state ...READ MORE

answered Jul 6, 2018 in Blockchain by aryya
• 7,460 points
850 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP