One of the first things to strike me about this meeting was that it was in the old UMIST building, which is a fantastic piece of Victorian Gothic complete with a huge hall with stained glass windows. It’s certainly a very fitting venue for a meeting that brought together records managers, suppliers, archives managers, local councils, cloud experts and publicly listed companies to discuss what storing information in the cloud means for them but more on that later. The workshop-based unconference was part of a project currently run by the Department of Information Studies at Aberystwyth University and funded by the Society of Archivists, looking into the security, operational and governance issues of storing information in the cloud. Their aim at this one-day workshop-based unconference was to generate debate and to highlight some of the security and governance issues surrounding the storage of information in a virtual environment. Very helpfully, they used a hash tag of #soacloud for the day and I seem to remember that they were going to archive on Twapper Keeper so I’ll record my impressions here but please do see the feed for the ‘official’ notes.
So, one of the first jobs was for everyone to introduce themselves. What was particularly noticeable was:
- The diversity of job roles represented at the meeting. As you can see from above there was a wide range of people represented. It was fair to say they were mostly from a records management background but that was probably to be expected given the remit of the day and the speakers;
- New titles – many people had new titles or had moved departments and this was particularly the case in universitites. Conversations over coffee revealed that many people had changed their title to something like, say, digital archives manager, but that practice at their institution and support for the new demands of the role were not changing as quickly, leading to a challenging environment;
- There was a range of experience with the cloud; most people were fairly new to it and had come to the meeting to find out more and explore the challenges and opportunities represented by using it. Very few had got extensive experience and that was mostly from either selling cloud solutions or working in a consultancy or research role;
The unconference went on to explore three different areas so I’ll structure my post on these areas although it is fair to say that the format and the active discussion amongst participants meant there were often not neat boundaries around these areas.
Security and Legal Issues
I’d like to state for this section that I am not a legal expert and that what follows are discussion points and comment. The reader should not act in any way on the basis of the information below without seeking, where necessary, appropriate professional advice concerning their own individual circumstances.
One of the first key discussion points was raised here, which was the perpetual issue of how you define the cloud. What everyone appeared to agree on was that it was important be clear about the definition you use, that there are a lot of definitions around and that the definition of what it is very much affects everything else you say about it. In terms of this session, it was very much defined by the issues so the focus was on the cloud as it existed on the open internet rather than private clouds such as the GCloud. My personal opinion is that it would be better to stick with the NIST definition so we’re all talking about the same thing and then specify what aspects you want to cover rather than inventing a new definition; I find that useful and it prevents starting every discussion with a long debate about definition;
Most of this session focused on a legal assessment of the risks of using the cloud, on the basis that there were already lots of people willing to promote the positives. Many people, myself included, stressed that there was a need for balance and that presenting all the negatives was just as bad as presenting all the positives. So, the points in summary were:
- There is a risk v value equation to purchasing a cloud service and consumers are very reliant on trust in the case of the service not being available. Under a standard contract with most cloud providers then there is not much acceptance of risk and quite often no acceptance at all. It was quite evident that this caused a great deal of concern to the legal profession compared to a normal outsourcing contract or providing the service internally. Interestingly, though, many contributors in the room reported that universities negotiated their own contract with cloud service providers, giving them the acceptance of risk they felt was appropriate and there was a feeling that sharing that experience with others would lead to cloud providers amending their terms appropriately. A good example raised was Leeds Metropolitan University negotiating their own terms for the use of Google Apps (further information on Leeds Met use of Google Apps is here) . This could be an interesting trend to follow as I feel it goes right to the heart of adoption of cloud by institutions. However, there were many in the room that felt there were a number of solutions on offer. As mentioned, some institutions are likely to see value in doing their own legal work given the amount they could save. There is also potential for collective negotiation where a number of organisations or institutions are happy to agree common terms with cloud providers. An interesting scenario that was presented was whether there would be some trickledown and what was in place now was based on what cloud providers were used to – if they had enough customers who were high value enough would a revised SLA be possible at a slightly greater cost of the provision of the cloud service. On a final note, many argued that it was important to understand what it meant to get services from a cloud provider as often the service they offered was better than that provided internally. Internal services also, often, didn’t come with an SLA or acceptance of risk;
- The cloud is very easy to get into, whether that be storing data on it as a service or storing data as part of the service that is offered (e-mail is a very good example of this). This is not necessarily a good thing, particularly in light of the standard (and it has to be said understandable in some ways) answer from the legal and IS departments of ‘no’ when they are asked by end users whether they can use the cloud, which tends to lead to users doing it anyway. What was also highlighted is that those tasked with information management of whatever sort are very often not even consulted, which can be potentially even more catastrophic as information both leaves the organisation and is generated outside the organisation, possibly never to return. This may seem relatively trivial but data and information is increasingly valuable, whether that be for re-use, exploitation of IP, transfer into other areas such as industry or to ensure it is appropriately archived for future use in all these areas. Many reported existing benign neglect around information management that they felt could get worse as the cloud freed up people to find their own solutions that worked for them but not necessarily the greater good. There were several who felt the answer was to say ‘maybe’ or ‘yes and this provider is great’ and for information management professionals to talk more about what they did and raise awareness of the benefit they brought;
- One risk that was highlighted that I hadn’t even thought of on the legal front was that of data being temporarily in the cloud for processing and what happens to it after that. We didn’t really explore this but I think it’s an area that bears more analysis;
- The Data Protection Act (DPA) raised its head, as always in any discussion where government (local and national) and universities were involved due to the high risk of not complying and the issues around being able to meet its provisions if data is stored in the cloud (but also bear in mind this is with the definition of the cloud being ‘on the internet’ so it excludes a private cloud or the proposed government cloud). There were questions as to whether cloud providers could be certified and whether that would make a difference in compliance; there are certainly examples given over in the US where providers are certified under HIPAA for storage of health data. The main conclusion here was to make a decision as to what data went into the cloud and was processed there.
Records Management in the Cloud
The ever jovial Steve Bailey took a wry look at the implications of the cloud for records management. His driving premise was that every principle that records managers had been able to hold to, even through storage of records on electronic media, was broken apart by the cloud because there was no longer one physical location where records could go to be managed (even if records reside on a server in electronic form then they are in one physical location).
The main points of this session were:
- Records managers know very little about managing records in the cloud;
- There is a difference between those who are interested in developing and maintaining a product in the cloud and those whose primary interest is preserving what is in or on that product. With fragmentation between what is on each format, it is becoming increasingly difficult to search for records in one place. One subject may have a blog, photos online, mail conversations under many different providers in the cloud and discussion documents online. Contrast this to Bailey’s example of Samuel Pepys papers that can be found in one place and are professionally archived for future generations. An interesting quote was ‘maybe Google doesn’t want to be the world’s archivist’;
- Is there a place for an information administrator in this world, similar to a financial administrator for if a company goes under? The information administrator could make sure that the relevant information be taken and appropriately archived for future use. Another thought that struck me whilst discussing this was whether we had thought about what happened to all the material we now generate in a Web 2.0 world and how long it was valid for; certainly something to raise with colleagues such as Neil Grindley, who is heading up JISC’s digital preservation and archiving work;
- On the back of this, Bailey suggested that it may be beneficial to have a public-funded web repository for all this stuff so that it could be archived safely. A contemporary example is who is going to remember what the BIS website looked like before the change of government? Does anyone care? Should they? There was some interest in the Content Management Interoperability Services (CMIS) specification, which could help in this area by ensuring content is interoperable and therefore easier to save. Certainly the vendors were starting to see some join-up around it and mentioned that tenders were starting to request it ;
- Linked data, as always, got a mention but a very brief one, which I think was driven in part by the people in the room. It’s maybe a topic that could be covered further (or maybe already has) in terms of its use for finding and helping preserve records;
- Another good question from Bailey was what would happen if some of the big providers start charging; would we start to realise the value in records and the records manager as a decision maker on what to keep and what to get rid if we had to face that choice;
- The final question is what is the difference between records and information. Users often do not want records management and yet they often need it, unfortunately, often after the fact. Maybe more needs to be done to get the user to appreciate that and to go to them rather than expecting them to arrive at the records manager’s desk. Once that starts to happen then perhaps records management will start to get recognised at a senior level and there will start to be an appreciation that information management is just as vital a consideration as other business drivers when procuring systems. Bailey concluded by putting forward the proposition that maybe there aren’t records managers any more and that the role has become that of an information manager, for which there seemed broad agreement in the room.
Cloud and Security
Paul Miller’s presentation went down a well worn road for me so I’ll let you pick up on it from the twitter feed. Key points that stood out for me that he made were:
- Software as a Service (SaaS) started off by providing most of what people wanted for most of the time for most of what they need and targeted those who were never going to use that service outside the cloud. So, Google Docs is OK as a basic word processor but I am not going to leave MS Word for it. Interestingly, though, SaaS is now starting to offer features that the incumbents have always had and is catering for more of what users want. So where next? There weren’t any answers but it was an interesting thought experiment as to what could happen;
- Cloud SaaS offers rapid iteration, which is good for the user but bad for those supporting the user. I’m not too sure that it is even good for the user. Sure they get lots of new features but do they know what they are? Do they suffer from feature deluge? I’d certainly agree that support can be problematic if new features arrive without the time to develop training for users and appropriate knowledge in a support team;
- Understanding what you put on the cloud is vital. There is the perception that the cloud is insecure based on the security model you would want for personal data but if what you are putting on there is ideas for papers or general ‘fluff’ do you really need to have high levels of security? I think this proved problematic for the audience but I can see Paul’s point and mostly agreed with it. An interesting counter argument was that the reason many staff did not have e-mail in the cloud whilst their students did was that they had a legal link to their institution so whilst staff interests would be served by going to the cloud, maybe those of the institution weren’t and selecting what data mattered to the institution was just too difficult;
- An interesting point from earlier was a discussion over how auditing could be used to verify the security of a cloud facility. Many argued that a cloud facility could be a lot more secure than an internal facility and that auditing, to some extent, reduced that security because it meant allowing people into the data centre who could then potentially compromise it, even unintentionally. Do we need certification to overcome this need to have an independent audit of security? Which ones do we trust? Is audit very much depdendent on what the use is so do we need to audit differently for different uses? Does is matter that we know someone who has ‘touched the hardware’ or are we happy to trust that the provider has carried out the appropriate checks? There is also the question of distinguishing between access, storage and use – audit often needs to answer who is doing these things and in a virtual environment is this even possible?
- The penultimate point to run through is around the fear of having your information in the cloud and out there available for all to get at. There were a number of good arguments made that the sheer volume of stuff in the cloud tended to mitigate against anyone being able to find what was yours, a sort of security by obscurity. This even applied to agencies that had legal powers to look at what you had (there is often concern, for example, that the US spies on data held in facilities located there). Another mitigating factor that was mentioned was around the interest these agencies had in data located on the cloud; going back to an earlier argument, what is held is often a large volume of stuff that no-one other than the user has that much interest in. If that can be held on the cloud then it makes a lot of sense. The more sensitive data can then be held elsewhere;
- A good point to come back to finish off this section was around what people expected out of security and their perception of security as opposed to the reality of it. There was a great meme in here about trust and how even if an institution had many users on its network, making it less secure, it was something they owned and felt they trusted more than a network they did not own and yet had far fewer users on it; the harsh fact is that most networks and servers are compromised by people and the more of them you have with access the more likely it is to happen. In a lot of ways, I think the cloud is a very secure place but it is vital to ensure you ask the right questions to be sure that you can answer the right questions about those who have been trusted with what you have put there.
So, I promised to cover off why the building was so appropriate at the start of the post and I’ll do that here in conclusion. When the UMIST building was erected on Sackville Street in Manchester it represented a triumph of knowledge and certainty when it was opened in 1902. Everything generated in the building was appropriately filed and managed and you can no doubt still find it today as a a result of those people who sifted records from information and diligently archived what was important. Are we going to be able to come back to the same building in 100 years time and have that same certainty about finding information that we’ve stored in the cloud?