Application of Hashing

KS3 Computer Science

11-14 Years Old

48 modules covering EVERY Computer Science topic needed for KS3 level.

GCSE Computer Science

14-16 Years Old

45 modules covering EVERY Computer Science topic needed for GCSE level.

A-Level Computer Science

16-18 Years Old

66 modules covering EVERY Computer Science topic needed for A-Level.

GCSE Data Representation (14-16 years)

  • An editable PowerPoint lesson presentation
  • Editable revision handouts
  • A glossary which covers the key terminologies of the module
  • Topic mindmaps for visualising the key concepts
  • Printable flashcards to help students engage active recall and confidence-based repetition
  • A quiz with accompanying answer key to test knowledge and understanding of the module

A-Level Hash Table Resources (16-18 years)

  • An editable PowerPoint lesson presentation
  • Editable revision handouts
  • A glossary which covers the key terminologies of the module
  • Topic mindmaps for visualising the key concepts
  • Printable flashcards to help students engage active recall and confidence-based repetition
  • A quiz with accompanying answer key to test knowledge and understanding of the module

What is Hashing

Hashing is the practice of taking clear text. and converting it into a dissolve of clear text data in such a manner that it is not meant to be decrypted. Hashing ‘s output is defined as a digest of a hash, hash price, or text. Hashing is an interesting field of alphanumeric that is distinct from contrivance for encryption. Hashing generates a distorted result that cannot easily be restored. scientifically, a hashing produces a value of a fastened length that is fairly simple to determine in one command, but almost difficult to back.

Hashing is a method for chopping up and mixing results, as the name loosely implies. A cord of nature will be converted into a smaller string of fastened length in a standard data operation which represents the original string of characters. Often known as a key is this reduced-length value. Searching for individual records on the basis of their shorter hashed keys, rather than their original values, is typically much simpler and simpler for database indexing and retrieval. For limited enough data sets, it is possible for each record to produce a unique hash key. And hashing a long string of characters (such as a letter or data set) generates a simplified key for encryption purposes that can be as special as a fingerprint.

Applications of Hashing

Hashing is a way to add data in any data structure in such a way that it is possible to insert, delete, and scan the simple operations on that data in O (1) time. Since it optimizes the code to a large degree, it is one of the most critical things that any programmer and developer should know.

To implement programming languages, file systems, pattern searching, distributed key-value storage, cryptography, etc., hashing is used. There are a number of cases in which the principle of hashing is used.

There are also other hashing uses, including the hash functions of modern day cryptography. Here are some of these applications:

  • Message Digest
  • Password Verification
  • Data Structures
  • Compiler Operation
  • Rabin-Karp Algorithm
  • Linking File name and path together

Message Digest:

This is an example of a Hash task for cryptography. The task that generates a result from which it is close to unfeasible to enter the input are cryptographic hash functions. Irreversibility is considered this property of hash functions.

Let’s see an example

Suppose the files have to be stored on some of the available cloud providers. You must make sure that no third party is messing with the files that you keep. You do so using a cryptographic hash algorithm by calculating the “hash” of the text. SHA 256 is one of the predominant cryptanalysis hash contrivances. The full area of the hash thus calculated is 32 bytes. So the computation of the hashing huge amounts of files won’t be an issue. These hashes are saved on your local instruments.

Now, when you copy the files, the hash will be computed again. Then you align it with the computation of the last hash. So, you wonder whether or not you wonder if the files have been manipulated or not. The hash price of the file will certainly change if anyone manipulates the file. It is almost difficult to tamper with the file without modifying the hash.

Password Verification:

In password authentication, cryptanalysis hash task is very widely used. Let’s use an example to view this:

You type your email and password to validate that the account you are attempting to access belongs to you anytime you use any online service that needs a user username. A hash of the password is calculating as the password is entered, and is then forwarded to the server for password authentication. The passwords saved on the server are simply the original passwords’ calculated hash values. This is required to guarantee that no sniffing is present when the password is transmitted from client to server.

Data Structures:

Various programming tongues provide Data Structures found on the hash table. The main principle is to generate a key-value twine where a particular rate is meant to be a key, while for different keys the rate may be the same. This implementation is used in C++ in disordered set & unordered charts, java in HashSet & HashMap, python enum, etc.

Compiler Operation:

A programming tongues keywords are refined in a different manner than most identifiers. The accumulator adds all these access. in a collection that is implemented using a hash table to distinguish between the access. of a programming tongue (if, otherwise, for, back, etc.) and other selectors and to accumulate the software successfully.

Rabin-Karp Algorithm:

The Rabin-Karp contrivance is one of the most common uses of hashing. This is effectively a cord-searching contrivance used to locate someone set of rules in a cord using hashing. Detecting plagiarism is a realistic implementation of this algorithm. Go by Looking for Patterns or Set 3 (Rabin-Karp Algorithm) to learn more about Rabin-Karp.

Linking File name and path together:

We notice two very important components of a file when going through data on our local machine, i.e. file name and file path. The system utilizes a guide (document name, record way), which is actualized utilizing a hash table, to spare the correspondence between document name and document way.

Basics of Hashing

A hash, hash gain, or text dissolve is a gain that is supplied to a hashing algorithm as an output of plaintext or ciphertext. The hash is of a predetermined length and will still be of a given extent, no matter what is introduced into the hashing contrivance. By the specification of the algorithm itself, the resulting hash has its extent fixed. We sometimes refer to a hash, sometimes in numeric format, as a list of a file or post. Hashes are used for cryptographic signatures, verification of files and texts, and to secure the privacy of careful data.

A hash can occur in the one-way feature context. This demonstrates that when created, hash can be calculated but hard to calculate in back. Start calculation is relatively straightforward to generate a simplified version of the report with a hash, but it does not re-create the original report from the hash.

Cryptographic Hashing Functions

The most straightforward way to deal with hashing a message is to utilize a comparable calculation to part it into lumps and cycle each piece progressively. Iterative hashing is called this method. Utilizing a pressure work that changes over a contribution to a littler yield, iterative hashing changes over a contribution to a yield of a similar worth, with the end goal that two distinct data sources give two unique yields. The hash capacities that depend on block figures are cryptographic hash capacities.

Instances of cryptographic hash capacities incorporate a calculation dependent on message digest MD, for example, MD5 and a calculation dependent on stable hash SHA, for example, SHA-1.

Message-Digest algorithm 5 (MD5):

Ron Rivest built up this cryptographic hash calculation in 1991. This calculation takes as an info a variable-length message and delivers a 128-bit fixed-length message digest. This calculation utilizes a Big-endian conspire where the least significant byte of a 32-digit word is situated in the byte area of the low location. This calculation experiences four cycles, every one of 16 emphases, so a sum of 64 emphases are utilized. Henceforth, a 128-cycle support is essential. This is less steady, however quicker than SHA-1 in administration. This calculation requires 2128 tasks from the predetermined message summary to distinguish the first message and 264 activities to identify two messages that produce a similar message digest.

Secure Hash Algorithm (SHA-1):

This calculation takes as information a message of variable length and produces a 160-piece fixed-length message digest. To see the message as a progression of 32-cycle words, this calculation utilizes the Little-endian plot. The most important byte of a 32-bit word is located in the low-address byte location in this algorithm. This algorithm goes through four stages, each of 20 iterations, so a total of 80 iterations are used. Hence, a 160-bit buffer is necessary. In contrast to MD5, this is more reliable but slower in operation. This algorithm includes 2160 operations from the specified report dissolve to detect the original message and 280 operations to detect two messages that produce the same report dissolve.

Openly key encryption, message verification, key arrangement conventions, computerized marks, uprightness checking, personality security and numerous other cryptographic associations, we ordinarily utilize this hash highlight. On the off chance that we scramble an email, make an impression on your phone, connection to a HTTPS site, or connection to a distant PC through IPsec or SSH, there is consistently a hash work somewhere in the engine.

Examples of real world where we using Hash functions

  • To examine similar data and to locate modified files, cloud storage services use hash tasks.
  • To distinguish records in a store, the Git correction control framework utilizes hash capacities.
  • In its proof-of – work programs, Bitcoin utilizes a hash algorithm.
  • Hash values are used by forensic experts to ensure that digital objects have not shifted.
  • To identify suspected malicious data passing across a network, NIDS uses hashes.

Hashing Values

In a hashing operation, according to a particular formula that is applied to all character strings in the data set or message, each character set is reduced to its corresponding hash value.

When looking for unique database entries, or when saving and sending the cryptographic form of an encrypted document, the translation of varying strings of characters into reduced keys of fixed length has the benefit of reducing complexity.

In order to find a match, a search for any entry starts by calculating its hash value, then sifting through the results. To encode and decode the numeric signatures that validate the senders and recipients of various messages, hashing is used for cryptography. The hash values here (also known as message-digests) are the digital signature sources that are reduced.

Hashing Functions

The mathematical methods used in calculating hash values from base input digits and letters strings are hash functions or hashing algorithms. They can be extremely dynamic and can generate a hash value without understanding the hash function applied, which is almost difficult to extract from the original input data. For this purpose, such algorithms can derive the keys used in public key encryption.

Hash functions are used to index the initial data string values or keys which are used on any subsequent occasion where it is necessary to recover the information associated with a given hash value or key. The functions are one-way operations in this sense: from the extracted hash values, they do not need to be “reverse engineered” and hash functions are built to be immune to this form of analysis.

Here are several hash functions that have been used that are pretty simple:

  • Division-remainder method
  • Folding method
  • Radix transformation method
  • Digit rearrangement method

Division-remainder method:

The scale of the digit of objects estimated in the table is determined. In order to derive a quotient and a remainder, that digit is then used as a common factor into each real value or base. The hashed value is the remainder. (Since this approach is responsible for generating a number of collisions, it will be important for any search mechanism to consider a collision and provide an alternative search mechanism.)

Folding method:

This methodology breaks the genuine incentive into numerous segments, joins the parts together, and afterward utilizes the last four digits as the hashed worth or key (or some other arbitrary number of digits that will work).

Radix transformation method:

If the value or key is interactive, it is possible to alter the number base (or radix) to result in a particular series of digits. (For example, it was possible to turn a decimal numerical key into a hexadecimal numerical key.) High-order numbers could be discarded to match a standardized length hash value.

Digit rearrangement method:

This is actually taking part of the original value or key, such as digits in positions 3 through 6, reverse their order, and then using the hash value or key as the sequence of digits.

Collision of Hashing

A well-designed hashing algorithm does not extract the same hash value, a phenomenon known as collision, from two separate sets of inputs. Hash features are usually used in data handling to evaluate the location of a single data string in an array, through a request for its hash value or key. Confusion (and maybe a consequent device crash) would be the outcome if two or more separate keys were to hash to the same location in the data list.

Advantage and Disadvantage of Hashing

Advantage:

The essential favorable position of hash tables over different frameworks of table information is time. At the point when the quantity of sections is high (at least thousands), this impact is more clear. Hash tables are particularly viable on the grounds that it is conceivable to gauge the real number of passages ahead of time, with the end goal that the container assortment can be set to the right size once and never resized.

On the off chance that the arrangement of key-esteem sets is fixed and known ahead of time (so additions and erasures are not allowed), a cautious decision of the hash work, container table size, and inner information structures can limit the normal query cost. Truth be told, one may have the option to build up a crash free or even ideal hash work (see underneath). For this situation, the keys in a table shouldn’t be put away.

Disadvantage:

Hash tables can be more earnestly to actualize than twofold pursuit trees that are self-adjusting. Picking a proficient hash work is a greater amount of a craftsmanship than a science for a specific application. It is genuinely easy to create a powerless hash work in open-tended to hash tables.

The expense of an effective hash capacity can be significantly higher than the inward circle of the quest calculation for a consecutive rundown or search tree, regardless of whether procedure on a hash table takes steady time by and large. In this way, when the quantity of passages is restricted, hash tables are not proficient. (In certain circumstances, however, it is conceivable to limit the significant expense of figuring the hash work by putting away the hash and incentive alongside the key.)

Hash tables can be less powerful than endeavors, limited automatics, or Judy exhibits for some string handling purposes, for example, spell-checking. Additionally, in the event that each key is spoken to by a generally modest number of pieces, at that point one may utilize the key straightforwardly as the file in a variety of qualities rather than a hash table. Note that this case doesn’t include crashes.

The sections put away in a hash table can be listed productively (at steady expense per passage), however just in some pseudo-irregular request. Accordingly, there is no effective method to productively find a passage whose key is closest to a given key. Posting all n passages in some particular request by and large requires a different arranging step, whose cost is relative to log(n) per section. In correlation, requested hunt trees have query and inclusion cost corresponding to log(n), yet permit finding the closest key at about a similar expense, and requested count of all passages at consistent expense per section.

On the off chance that the keys are not put away (on the grounds that the hash work is sans crash), there might be no simple method to list the keys that are available in the table at some random second.

Despite the fact that the normal expense per activity is consistent and tiny, the expense of a solitary activity might be very high. Specifically, if the hash table uses dynamic resizing, an addition or erasure activity may every so often require some investment relative to the quantity of sections. This might be a genuine downside continuously or intelligent applications.

Benefits of Hashing

Comparing two files for balance is the key application of hashing. Without opening two text files to compare them word-for-word, these files’ determined hash values would cause the owner to know whether they are different automatically.

Hashing is often used, usually in a file recovery application like Sync Back, to check the consistency of a file after it has been moved from one location to another. A user should compare the hash value of both files to ensure the transferred file is not corrupted. If they are the same, the file being copied is an exact copy.

In certain cases, an encrypted file can be programmed to never alter the file size or the date and time of the last update (for example, container files for virtual drives). In such examples, whether two identical files are different or not, it would be difficult to say at a glance, but the hash values would clearly tell both files apart if they are different.

Uses Hashing

Here we discuss the Using Hashing in 2BrightSparks software:

In the reinforcement and synchronization programming, SyncBackPro/SE/Free, hashing is essentially utilized for record honesty checks during or after an information move meeting. For instance, a SyncBack client can turn on record confirmation (Modify profile > Copy/Delete) or utilize a slower however more dependable technique (Modify profile > Compare Options) which will empower hashing to check for document contrasts. Distinctive hash capacities will be utilized relying upon which alternative is utilized and where the reinforcement records are found.

Different zones where hashing is utilized are continuing in FTP, scripting and sporadically for verification in Cloud profiles (scripting and cloud reinforcement is upheld by SyncBackPro as it were).

2BrightSparks likewise has a utility program considered HashOnClick that can be utilized to guarantee records are indistinguishable. HashOnClick is accessible as a freeware just as an authorized program. A few kinds of hashing calculations are accessible in HashOnClick.

Hashing Facts

A hash is a capacity that takes a variable-length string (message) and packs and changes it into a fixed-length esteem. Significant realities about hashes are:

  • Hashes guarantee the information respectability of records and messages.
  • Hashes don’t guarantee privacy (as such, hashes are not used to scramble information).
  • The hash esteem (yield) is additionally alluded to as a message digest or computerized unique mark.
  • Hashes are single direction capacities. You can’t duplicate the message by running it back through the hash.
  • The sender and the beneficiary utilize the equivalent hashing calculation.
  • An alternate hashing calculation can be utilized for various sorts of information to build information security.

Know about the accompanying with respect to hashes:

  • Great hashing calculations have high intensification, otherwise called the torrential slide impact. A little change in the message brings about a major change in the hashed esteem.
  • Solid hash yields ought to contain an enormous number of pieces. This makes the duplication of the hash an incentive by an assailant more troublesome.
  • Hashes ought to be created from the whole message, not only a segment of the message.
  • Crash is the term used to depict a circumstance in which two unique messages produce a similar hash esteem. This means that a more grounded hashing calculation ought to be utilized.
  • A birthday assault is an animal power assault in which the aggressor hashes messages until one with a similar hash is found. This sort of assault depends on the measurement that there is in excess of a half possibility that two out of 23 individuals in a room will have a similar birthday. To coordinate a chosen day, 253 individuals would be in the room.

References:

  1. https://www.geeksforgeeks.org/applications-of-hashing/#:~:text=This%20is%20why%20hashing%20is,Message%20Digest
  2. https://medium.com/@aditya_ch/practical-applications-of-hashing-c946ae7d2db0
  3. https://afteracademy.com/blog/applications-of-hash-table
  4. https://www.golangprograms.com/what-is-hashing.html
  5. https://blog.finjan.com/hashing-algorithms-a-closer-look-at-the-methods-and-applications-for-encryption/
  6. https://sites.google.com/a/pccare.vn/it/security-pages/hashing-facts
  7. https://searchsqlserver.techtarget.com/definition/hashing
  8. http://rajaghoshtech2.blogspot.com/2010/03/advantages-and-disadvantages-of-hashing.html
  9. https://www.2brightsparks.com/resources/articles/introduction-to-hashing-and-its-uses.html#:~:text=Types%20of%20Hashing,into%20a%20128%2Dbit%20fingerprint.