Today we were given 10 sample files and our task was to explore the metadata for each file.
The two suggested ways of doing this was by looking at the properties of each file and then check further by using the site http://www.extractmetadata.com
The 10 sample files are as follows;
Adobe acrobat document (EDRM-TalentTaskMatrix-v1.pdf)
By checking the properties, it confirmed the file type as being an Adobe acrobat document which was modified on 04/02/2013 at 15:29. It gave the file location, size as being 97KB, but compressed size as 93KB, the method: deflated, the Cyclic redundancy checksum polynomial (used to validate the integrity of the data) CRC-32: 8ABE110F and Index: 6. However, once I checked the same file using the extract metadata online tool, it showed that it was originally created in Excel (hence the title showing the .xlsl extension) using a Mac, by an author called George Socha. None of this information would have been obvious otherwise.
DWG file (civil_example-imperial.dwg)
By checking the properties, it confirmed the file type as being a DWG file (usually a graphic file associated with AutoCAD), modified on 21/05/2011 at 20:23. It gave me the file location, size of 166KB, compressed size as 67KB, method: deflated, CRC-32: 9F5AFC3E and Index: 1. The online extract metadata tool yielded no results saying “Max. filesize: 5MB !”
GIF image (Wrinkled_Paper.gif)
I started this task on Campus using a PC in a computer lab. I am not sure what version of Windows it was running, but as I did not finish, I completed the task at home on my own PC running Windows 7 Home Premium. This made a difference to the available metadata if you compare the properties as shown in the image below;
The first properties (illustrated in the grey box) gave three pieces of information that was not available when I checked the same file later on my home PC. These were CRC-32: 3FF3001B, Index: 4 and method: deflated. There are two values in the second properties box relating to ‘size’ and ‘size on disk’ which are very slightly different, so I assume this would relate to the values in the grey properties box described as ‘size’ and ‘compressed size’, although the second properties box gives more exact values for the size of the file (15,063 & 15,360 bytes) as opposed to the first size value which appears to be rounded up. Obviously the location of the file is different as it has been accessed from two different locations. The second properties box offers three dates; when the file was created, modified and accessed, whereas the first only gives the modified date. In addition, the second properties box gives information about the owner, attributes, bit depth and even the dimension of the image (using pixels). This metadata is not present in the first properties box. The file has not changed (apart from the location) so I can conclude that the newer system offers more information to the user, which must have already existed but not been available to view in the older system. The second properties box gives the user the option to ‘remove properties and personal information’ that has been attached to this file, which is not an option when checking the properties on the system on Campus. Once I checked this file using the online extract metadata tool, the outcome was that the format was shown as MPEG-1 and the mimetype described as audio/mpeg with the duration as 0m00. As a GIF file is an animation, it refers to the file as both an image and a video/audio file.
JPEG image (chimp at typewriter.jpg)
The properties shown here are interesting; apart from the automatically generated properties such as the file type, name, size, dates of creation etc. location, owner, resolution, bit depth, attributes and dimensions, as this is an image file, there is the option to add your own metadata too. You can add information such as author, date taken, copyright, camera make and model used, lens make and model used, camera serial number, light source, contrast and so on. There is even the option to rate the image, give it a title and add tags which would be a useful finding aid. The only additional information yielded by the online extract metadata tool was that the thumbnail was binary and 16019 bytes.
JPEG image (huh.jpg)
As with the other jpeg file, there is a lot of useful metadata. The image below shows the option to remove all personal information relating to the file, or to pick specific properties to remove.
In addition to this, the properties of this file also include GPS information which (if accurate) should pinpoint where the photograph was taken.
Latitude 38; 53; 51.669499999989057
Longitude 77; 2; 11.309899999992918
The jpeg image is of the Eiffel Tower shown from the Seine. As this is a recognisable landmark, I would have expected the GPS location to be Paris, however when I put the co-ordinates into http://www.gps-coordinates.net/ the result shown was in China. When I tried again using http://www.nearby.org.uk/ the result shown was instead Washington in the USA. Either way, clearly the attached metadata was incorrect as shown in these maps below;
JPEG image (PICT0460)
This metadata attached here shows the file was originally created as an Adobe photoshop CS3 windows file. However, once checked using the online extract metadata tool, this information is not included. The other attached metadata appears to be consistent.
MS Excel worksheet (97-03 version) ARMA-Speakers_list
The metadata shows that the file is indeed an Excel file, and the online metadata tool confirms that it was created by Excel software. However the mimetype refers to it as “vnd-ms.dds” which I understood to be associated with Direct X. I am a little confused by the seeming contradiction of creation and modification dates which are 2009 and 2011 as seen below;
MS Word document (97-03 version) Proposed_ED_Rules_and_Standards_2004.doc
This word document gives lots of additional information such as the word, character, line and page count (which is verified in both versions of the metadata) and the properties tells us the Company which created whereas the online tool gives us the Individuals name. The dates are consistent, but the time differs by one hour, which I have seen happen often. The properties also tell us when the file was last printed, so there is more metadata associated with the properties box in this instance.
Text document (20100501-0721 The SEC v. Goldman Sachs the case in a nutshell.txt)
This plain text document yields very little information from the properties other than the title, file type, size, date created and modified. The online metadata extraction tool does not generate any metadata whatsoever.
Wave sound file (bonds.wav)
The metadata is slightly contradictory again; the properties say it is 12 seconds duration, whereas the online metadata extraction tool says it is 10 seconds long. They both agree that it is an audio WAV file, but the properties describe the bitrate as 20 kbps and the online metadata tool says 21 kbps. Overall, there is a very small amount of information on this type of file.
In conclusion, it was difficult to find all 15 metadata elements required for Dublin Core, and even more so to find the selected elements required for PREMIS. It was interesting to see the variety of metadata included for the different file types, and depending on what operating system you used, some information was available and others were not. By having the option to add or remove metadata, it also made me question the validity of some of the information. How do we know it the attached attributes are correct or false? Also, there were some inconsistencies between the two versions of metadata which in a couple of instances, I could not understand why. Although the two ways of extracting metadata were useful, neither was entirely consistent or gave complete values, still leaving gaps in the information.