Methods of data access
Files are useful not just for the information they hold, but the way they hold that information
There are a few different structures files use to make their data accessible
Serial Files
- Serial files don’t make any effort to make searching easier – they just store records in the order the records are added
- New records are always just added on to the end (this is called appending)
- This type of storage is usually used for small files where the order the records are in doesn’t matter (or the only order they’re needed in is order they’re entered in)
- Finding records again can be difficult if there are a lot
- The computer has to search through every single record until it finds the one it’s looking for
- Sometimes you don’t need to find single records individually – you either need all of them or none of them – in this case, serial files are completely fine
Sequential Files
- In this type of structure, the records are stored in order
- Each record has several fields, but one of them will be key field, which is what they’re stored in order of
- This type of storage is great for data that needs to be processed in a specific order
- Searching through sequential files is a bit faster than searching serial files, because the computer knows when to give up
- If you’re searching for a number beginning 345 in a file organised in numerical order, you check each number from the beginning until you reach 345. If you don’t find it, and you reach 346, at least you know 345 isn’t there. In a serial file, you’d have to search right up to the end
Random Files
- This is a really weird way of storing files, but there is actually a reason for it
- Most file types have their records all stored next to each other in memory (contiguously)
- Random files spread their data all over the disk
- This is done ‘randomly’ using a hashing algorithm, which takes the key field and distorts it in strange ways to produce a result that’s an address in memory
- The record with that key field is then stored at that address
- To find records again, put the key field into the hashing algorithm and you’ll get the address
- (So it’s not random in the sense that he hashing algorithm produces a different output each time – it actually produces the same output every time for the same input)
- Hashing algorithms are complicated because they have to avoid ‘collisions’ – where multiple records are told to live in the same memory location, which just can’t happen
- However complicated the hashing algorithms are, collisions can still happen, and dealing with them takes a lot of memory (this is called redundancy)
- The good thing about random files is that all you need to find a record is its key field – you don’t have to go searching through loads of them
- This means random files are better than indexed sequential files, when you have massive databases
Opening and Closing
- Before you can access a file, you have to open it
- You can’t read and write to a file at the same time, so you can only open it for reading or for writing – not both
- When you’ve finished with a file, you have to close it (which is very easy to forget)
- If you don’t close it, your changes might not be saved
- Each record tends to be a separate line in the file (or they’re separated by dilemeters characters, such as semi colons or commas) to make them easier to retrieve
Inserting, Updating and Deleting
- To add something to a serial file, you just append it to the end – easy
- Adding something to a sequential file is trickier, though, because it has to go in a certain place…
- (Adding things to indexed sequential files or random files won’t be tested on, apparently)
- Inserting something is done by:
- Open the file for reading
- Create a new file
- Open the new file for writing
- Copy all the records from the original file, before the point the new record needs to go, to the new file
- Copy the new record to the new file
- Copy everything else from the original file to the new file
- Close both files
- Delete the old file
- Rename the new file to what the old file used to be called
- The same technique is used to update existing records
- Similarly, this is how to remove a record from a sequential file:
- Open the file for reading
- Create a new file
- Open the new file for writing
- Copy all the records from the original file, up to and including the one before the one you want to get rid of, to the new file
- Don’t copy the one you want to get rid of – just skip it
- Copy the rest of the records to the new file
- Close both files
- Delete the original file
- Rename the new file to what the old file was called
|