SATURDAY APRIL 15, 2006 Find  


Cheap International
Airfare Online

Wachovia online banking
Get Free Coupons Online
Finding the perfect
discount hot tub

Payday Loans
Stock Trading Online
Stuffed Animals
Smart Investing Online

HomeGeneral
What is Journaling and Logs
Dear Diary, now I'm going to make a change...
     By: David K. Every
Kind:
Created:
Size:
Article
2002-10-16 08:19:36
12 KB
 
echnically, Journaling is the concept of writing a journal (or logs). Basically it is like being followed around by the FBI or a secretary, and have them writing down every little thing you do, every day. A minute by minute diary of events. While we might think of this as an annoying invasion of privacy, it could occasionally come in handy during a lawsuit or something where you want to know exactly what you were doing when something bad happened, or if you lost something and you needed to retrace your steps.

Basically, a Journaled Filing System does this for your disk drive. The problem is there are many types of journaling, and degrees of what that means.




Imagine a file as one big sheet of paper with all your stuff written on it. You can't just add a bunch to the front, because there is no room (there's already writing on it). If you add a lot to the top of the page, you are overwriting what used to be there. Which means you have to also write that further down; but then that is over-writing what used to be there as well, and so on. So basically changes to the front of a file require rewriting the entire file, until you get to the end, in order to make room; at least whenever you add to the front or middle of the file. If you just tack on to the end of a file then everything works fine. Files can grow from the end, but not really from the middle or font.

So if every time you make a change (to the beginning), you have to rewrite the entire file (from the beginning), what happens if you lose power or have a software problem part way through the write? The answer is not pretty; the file is in a corrupt state with part of it reflecting the old state, and part reflecting the new, and the parts that weren't written out, or are partially written over, and just gone; the dreaded data-loss. People hate that.

These "partially complete" errors are a big deal, because stuff happens, and sometimes you have bugs or lose power. And the ramifications are bad. So there was an evolution of how to fix or avoid that.



A common way to cure the problem is the old "programmer fixes it" way. Instead of just writing over the old file as you go, or opening a file for read and write access, a programmer can only open the original file for read-only access. Every time they save, all changes go into a new file (or temp files), that they are writing to. When they’re completely done writing the file, they just rename the old file (to something like "to be deleted"), and the new file to the original files name, then finally they delete the old file.

Because of the order when things this way, they are really reducing the likelihood that you can lose data, and even if you do lose something, it is just the changes between the last and the newest version, and not the entire file. And if you know the exact sequence, you can reverse it, or figure out when something was only partly completed.

While this is better, and good programming practice can reduce 99% of the bugs, it is not perfect. And this is a lot of work, and so many don't do this. More than that, even if you did, there are still cases where you've renamed the old file, but haven't yet renamed the new one, or when you've done that, but haven't yet deleted the old file. If you lose power, or crash, then bad things happen. The former causes a lost relationship to the data, so it can be hard to find what you want (and fix it). The latter just leaves crap littered on your drive, and fills it up with stuff you don't want (orphaned temp files). You can fix both, but it still isn't perfect. The way to fix that, is to write a "log" of what you are about to change, and then change it. Then if you lose power, that log can tell you exactly where you were when things messed up, and that tells you how to fix it. This journaling (or loggin) is what logged filing systems are about; but I’m going to describe other types of journaling as well.



Some other programmers, decided to avoid every having to completely rewrite things. Why can't every file just be a journal? Instead of keeping a file as a complex structure, where the file was written as the entire structure of the document (and in the order of the document) it was written as a sequence of changes, or as a journal or log of work. Basically every command the user did, was written out to the disk. And in fact, these programs often didn't have a “save” command, they just saved continuously (each keystroke or command). It was like an infinite undo filing system.

When you opened a file, it can be quite bizarre to read these journal files in, and watch your entire document get typed in very quickly, but with every spelling mistake and every correction that fixes them later. Your file isn't a structure of the final output, it is a sequence of events that will eventually get you to the final output.

What makes these systems handy is that you're never writing over the file, you're just adding to it. So you can't really lose (much) work. Since it is written over time, whatever you lose is most likely just the very last command/keystroke that you did; which you probably remember anyways.

As you can imagine, these journals may have a lot of unnecessary crap in them. Imagine I type 20 pages, then slowly pair it down to 10, and then restructure those a few times. In a normal file it is kept by logical arrangement and the end result is only 10 pages long. But when ordered by series/sequence/time it can be more like 30 or 40 pages long, and each time you read it, you have to wait until it has read it, and executed every command; not very efficient.

Some hybrid this a little, and only wrote out the commands necessary each time you saved. Since sometimes you edited the same thing 3 or 4 times before you saved, it knew enough to only write the last one (since that was the last one that counts), so this is slightly more efficient. But still. Full Journaling is bloated and big. Ironically when people are talking about journaled filing systems, they are not talking about this complete journaling.



OpenDoc had a pretty neat hybrid system, or they had conceptualized it and wrote white-papers (concept documents) on it. Basically, what it did, is write journals constantly, right up until the time you pressed save or snapshot. When you saved, you were saving an entire version or a snapshot of a file (like a traditional file). That snapshot got written as a whole file, and arranged in logical instead of sequential order; and that snapshot got attached to the original file, then your journal was removed (cleaned) and you started over.

Now each file was really many files; each saved in their own area (or fork). In a forked filing system you really have many files that are attached to each other. So when you drag a file, you are dragging all the files related to that one file; it is sort of like a specialized folder where each fork is a sub-file inside.

The power of this OpenDoc / Bento based system was that you could go back and look at previous versions/snapshots of the file. This is called version control, and it is occasionally very nice to have that built in. An older operating system (DEC's VMS) also tried to do a similar thing in a much more primitive way; and many users loved that feature. But OpenDoc really seemed to find the balance, and was far more modern. With OpenDoc, you could clean older versions if you wanted, or keep them around, and you had a history of what you’d done.

Since your last fork (journal) was only the difference between your last snapshot, it didn't get that big and bloated. If it did, you could always clean/fix it by saving a snapshot. Since each file was really many parts (forks), if anything happened you were only likely to corrupt the last version, and still have the others as backup. And since it was journaled, all your changes could be saved constantly (no manual saving necessary), and you had an infinite level of undo's, right up to the last snapshot.

But alas, OpenDoc died for political reasons. And it seems that some of its ideas have died as well. But not to fret, good ideas tend to return. For now, when people are talking about journaled filing systems, most have never heard of OpenDoc, or know how this worked; so they are talking about something else.



Now not only do programmers have to be careful of what they are doing when creating the files, but there are many commands that work on files as well. When you copy files, move then, rename them, and so on, it affects files. This is all the OS’s (Operating Systems) responsibility. And most programmers don't want to talk about what they are supposed to do, but instead talk about what the OS is supposed to do.

Many UNIX people think very low level, and forget about the high level. So when they are talking about journaled filing systems, it is not what programs and programmers do at the higher level (writing their own files). Many of them are just talking about what the OS does, or what is done only at the lowest levels.

When the Operating System (Filing System) is doing something to a file like copying it, or moving/renaming it, there are changes being made. Some of those changes require many lower level writes or changes. If the system crashes part way through that sequence, then again, there’s an incomplete state, and you lose data.

Normally, on UNIX (or other OS’s) many of the things that are partially done can be fixed. At boot, time there’s a utility that is run if you’ve lost power called fsck (one letter off from what you’re going to say right before it has to run). Fsck basically walks every file (and fragment) on the disk and looks for things that are in the “in-between” state, and tries to fix them, or just cleans them up (deletes the parts if it isn’t recoverable). And if you have a large disk, with many files, this can take a while. Most operating systems have some variant of these utilities, and many third party utilities are just fancy versions of this.

What journaled filing system thus means, is just that before every action that could cause data-loss, that the Operating System does, the change is written to a separate log file or journal. Then the operation is performed, and finally the log file is cleaned up. This way if the system crashes at any time during the operation, you can dive into the log file (instead of scanning the entire disk) and reverse and/or repeat the process so the changes get made. This makes recovery much quicker, and almost guarantees no data-loss.

But there is a cost. Each time you are going to make a change you need to make sure the log is written first. And each small change has to update the log; so the log is always in sync with what is happening on the disk. All this synchronization takes time. And so most journaled filing systems are slower than non-journaled variants would be.

Also you can see, that just because the filing system is journaled at the lowest level, that doesn’t mean that the Applications (programs) are journaled and safe; that is still left to the programmers.



You can see that there is a lot to journaling and logging. There are many levels, and implementations. I’m glad that Apple is talking about creating a journaled filing system in OS X for the OS. And I may or may not use it, depending on speed and reliability. Actually, my data loss by the OS, is already pretty low; I’m more concerned about Applications.

I really think that UNIX’es version of journaling is only part of the problem. I have interest in a higher-level journaling and version control being brought up to the programmer levels; where they will use them. And there are varying degrees there as well. I’m hoping someday that people go back and read some of the white-papers or reinvent the wheel, and we get much higher level functionality in those areas as well. However, every step towards more stability and more security is generally a good thing.

Restore Navigation  Mail 

  About | Contacts | Privacy

Copyright 2003 DKE • All rights reserved • www.iGeek.comLegalese