subject sama dengan Information Theory
title = Data Compression
? Data Compression merely sounds challenging. Dont end up being
afraid, compression is our good friend for many reasons. This saves hard drive
space. This makes data files to handle. In addition, it cuts all those immense data file download
moments from the Internet. Wouldnt it be nice whenever we could compress all files
down to just a few bytes?
There is a limit to how much you are able to compress
data. How unique the data file is, may be the determining component to how long it can
be compressed. If the file is totally random and no pattern can be found
then the least representation in the file is a file it self. Some of the
proof that proves this is certainly at the end of my newspaper. The key to compressing a
file is to find some kind of exploitable pattern. Most of this daily news will
be explaining those patterns that are commonly used.
Null suppression is
the most ancient form of info compression which i could find. Essentially
it says that when you have different domains that info is in (possibly a spread
sheet), and some of them have simply zeros in them, then this program simply eliminates
the data and moves straight from the empty data set to the next.
step up from null suppression can be Run Size Encoding. Operate length development
simply notifys you how many of what you include in a line. It would change a set
of binary data just like 0011100001 in what the pc reads because (2)zeros
(3)ones, (4)zeros, 1 ) As you can see, functions on the same fundamental idea of obtaining
a series of 0s (null suppression) and 1s in this case too and abbreviating
When the whole concept of data compression caught on, more people started
working away at programs for this. From these folks we got some new premises to
work with. Substitutional encoding is a big one. It was invented jointly
by two people: Abraham Lempel and Jakob Ziv. Most compression algorithms (big
word meaning roughly? program) using substitutional encoding get started with? LZ
pertaining to Lempel-Ziv.
LZ-77 can be described as really neat compression where the program
starts off just copying the source record over to the brand new target record, but when
that recognizes a phrase of information that it offers previously written, it replaces
the second set of data inside the target data file with guidelines on how to be able to
the 1st occurrence of it and copy it inside the directions place. This is more
commonly called a sliding-window compression because the focus of the program
is always sliding throughout the file.
LZ-78 is the compression that most
individuals have in their homes. Some of the more usual ones will be ZIP, LHA, ARJ
TIERPARK, and GZIP. The main thought behind LZ-78 is a? dictionary. Yet it works
quite a bit such as the LZ-77. For every phrase it comes across, that indexes the
string with a number and writes it in a? book. When the software comes
over the same string, it uses the associated quantity in the? dictionary instead
of the string. The? dictionary is then written with the pressurized
file to get used in decoding.
We have a combined variation of LZ-77 an
LZ-78. It is known as LZFG. That only writes to the book when it discovers
the repeated phrase, certainly not on every expression. Then instead of LZFG changing the
second set of data with guidelines on how to reach the initial occurrence of
it, this software puts inside the number reference point for the dictionarys translation.
Not only is it faster, but it tulle better because of the fact that it
will not have while big of your dictionary attached.
Statistical coding is another
one of many new compression concepts. It is an offshoot of the LZ family of
compressors, It uses basically the same style as LZFG, but instead of determining
the figures in order that the strings emerge from the source document, statistical
compressors do some exploration. It figures the number of instances each chain
is used after which ranks the string with all the most number of uses on top of
the hash table. The string while using least can be ranked in the bottom. (A hash
table is where the list is figured) The higher up a line is on this list
the smaller of a reference number it gets to minimize the whole bit use.
This gives this compression only a slight edge on the others, but every little
tad helps. (ha ha -bit- )
Beware! There are a few compression programs
to choose from that claim wonderful compression ratios, proportions that beat the compression
limit for that documents randomness. These programs arent really compression
programs. They are OWS and WIC. Hardly ever compress anything with these types of. What
they actually is separation the document that you planned to compress and hide the majority of
it in another element of your harddrive. OWS puts it in a particular spot on the
physical hard disk drive disk. WIC puts the extra information in a hidden file
called winfile. dll. The actual problems with these programs will be that if you
dont have the winfile. dll or the information concerning the specific spot on your drive
then a program wont put the file back again.
My first intent
with this job was to invent a new compression algorithm. I started with
the idea that in the event you took the file in the pure binary form and laid it
in a matrix, there were particular rows and columns that you may add up to receive
an end result that would be capable of recreate the first matrix. I had been close
as well. I had several different results that actually would be what would make up
the compressed file that put together together to develop one output for each tad.
From this solitary output I can determine if the small amount was one particular or 0. It worked well
perfectly pertaining to matrixes of 11, twenty two, and 33. Except that with this little of
a matrix, I actually wasnt compressing it in any way. It was more of a coding program that
used more space than did the initial file. I even found a way to reduce
the size of the four outputs but it had not been enough to even break even on bit
count. While i got to the 44s I came across an overlap. An overlap is a term I
made up for this formula. It means i got similar single result for
?nternet site did a 0. The moment that happens, I cant find out which it really is: a 1
or 0. As you cant recreate the original data file, data compression has failed.
It is lossy. I needed a fifth original result. If you want more information
on how I believed the protocol would have proved helpful, please refer to my Creators
Log i included. Its way too much to re-type right here and it will serve
not any real goal in this newspaper.
If you were focusing earlier, you
would be saying,? Why dont you find a pattern? Or else you cant compress
that. You are treating this like a random file.? I didnt understand that it was
difficult to shrink random data until considering the time my formula was faltering.
of my challenges I started looking for a completely new method to reduce data
utilizing a pattern of some sort. I obtained to thinking of all of the existing
algorithms. I desired to combine a hash stand, statistical coder, and a run
duration coder. The only hard part that I would find in that would be trying
to find the patent slots of each of those algorithms to let me to mix
them and also modify all of them slightly.
In its current formula, the statistical
coder simply accepts alpha-numeric phrases. I would like to modify this to not
see the characters the binary code spells out, but the binary code it
self. My spouse and i dont know what form the end result is besides compressed, however for
my functions it wouldnt matter what the shape the output is definitely. I would program
into the system all of the thirty-two combinations of 5 parts (2^5). All the combinations
would be labeled inside the program 1 through 32. I would then simply make a hash desk
of all of the 5 bit mixtures. This would give me an result which I would
run through a run size coder. Since the entire formula is reliant in binary
code and not upon characters, it is usually recursable, or it can shrink further
an already compressed file. LZs cant do that because after they convert a
string into their dictionary/sliding window form, that ceases to be one of the
characters that it compresses.
Now that you are aquatinted with our good friend
Data Compression, I hope he will probably be able to serve you better. Now you can
down load programs faster, save space, and who knows? You will create
the next new compression formula. Until then simply, keep your rodents happy and your
Proof that random data is not compressible:
Let us suppose
the file being compressed can be 8 parts long (x works, nevertheless this is easier) and
is usually random
There are exactly 2^8 different likely 8 bit data strings. (2^x)
compress the file it should shrink this by at least a single bit (2^x)-1
So you will find
at most 2^7 different pressurized files 2^(x-1)
Thus for least two source documents
compress right down to the same record.
Therefore The compression cannot be lossless.
Aronson, Jules Data Compression- an evaluation of Strategies
Washington M. C.: Commence for Computer system Science and Technology
x=Intro, almost 8, 9, and 10