Peer-to-Peer Working Group:

Summary of the Internet2 Spring Member Meeting Session

April 20 from 11:45 am – 1:00 pm and 4:00 - 6:00 pm (EDT)

WG CHAIRS:
David Futey, Stanford University
Linda Roos, OARnet


David Futey opened the meeting and gave a brief overview of the Working Group’s efforts and the schedule for meetings at the Spring Member Meeting: a luncheon session with three speakers and afternoon session with an in depth presentation on and discussion of the LionShare project). He identified a call for participation (CFP) for a variety of topics on which he and Linda Roos (co-chairs) would like to develop a white paper for the working group’s charge. The outline for the paper will be posted on the website (http://p2p.internet2.edu/) -- David also called for collaboration on an update for Campus Bandwidth Management (CBM)/P2P topics (any projects related to the effort).
NOTE: More indepth coverage of the questions and presentations is available through the session minutes, which are posted on the P2P WG website.


SPEAKERS
Charles Phelps, University of Rochester [HTML] [PPT]
W. Pence, Napster [HTML] [PPT]
Russell Vaught, Pennsylvania State University [HTML] [PPT]
Alex Valentine, Pennsylvania State University
Derek Moore, Pennsylvania State University


Charles Phelps gave an overview of their implementation of a file sharing program at the University of Rochester (Rochester), where he is the provost. As a member of the Joint Committee on File Sharing (representing higher ed and the RIAA), he was asked to chair the task force on technology, which is how he got involved in P2P file sharing. The task force is giving a frequent update on progress to Congressional committees.

Rochesteris using Napster to reduce the drain on campus bandwidth, provide students a legitimate file downloading option, and contain the threats issued by the RIAA and Congress.

He noted that you can find/block P2P traffic but that strategy doesn’t discriminate between legitimate P2P activities and illicit ones. Another strategy is to sample the content but Charles is very concerned about invasion of privacy. He is also concerned that those strategies are not very effective.

He feels that encryption doesn’t cause significant problems for discovering illicit P2P activities because, if a file being shared is hidden via encryption, then it is almost certainly illicit. Any attempt by Kazaa (and others) to build encryption into the file sharing software, it would kill their stance of legitimacy.

Q&A:
Q: Steve Wallace agreed that, if the file sharers started offering encryption, it would de-legitimize them; but he also thinks that encryption is the only way to provide privacy for downloads (one wouldn’t want people to know who was downloading erotica, for example) and personal files (what books I have checked out of library).
A: Someone has to sell the information unencrypted for it to be P2P, so I don’t think that covers this type of privacy information – that material is better distributed point-to-point.

Q: How about reports that people are using P2P technology and encryption to get information to people in countries where their government restricts freedom of speech?
A: Point-to-point might be a better and more secure mechanism. If it is sent encrypted, the end-user needs the encryption key, which must be sent point-to-point. While I don’t see any reason P2P materials should be encrypted, I have a very strong personal inclination not to outlaw encryption.


Russsell Vaught gave an overview of Penn State University’s attempts to issue a response to pressures to limit illicit music sharing via the Internet. Penn State started using Napster in the fall of 2003 – despite an initial negative response, they quickly formed tech teams to figure out how to implement it. This was done in unison with Napster; the lead architect on the Napster caching service very quickly communicated a well-thought out, well-architected, well-implemented system for supplying the necessary services.

We put server on campus to provide the service; the system was rolled out to the dorm students. Fall of 2004, the service will be rolled out to the entire campus. This way, students won’t be pinging the server on campus if they are off-campus, but going directly to Napster for service.

Q&A:
Q: How do you stop students from going to “the dark side” when the music costs $0.95/song?
A: We’d already implemented rate limiting to dorms. You are allowed x access/day; if you violate the ban 4 times, you get cut off the network for good.

Q: Are you tracking by MAC address or port or card?
A: There is no wireless access in dorms but the hard-wired ports are hard-wired to a MAC address. Students have to register to use it and each port is tracked and limited.

Q: It is obvious why you put the Napster server on campus – students don’t eat up your bandwidth for downloads. Why are you providing coverage for the off-campus students?”
A: Both Rochesterand Penn State purchase x services so the costs are controlled and come directly from the students. We do client-side caching, to reduce the amount of bandwidth being used (since most of the downloads are the most popular 40 tunes in each musical area). Whether a student chooses to stream or download is based on the bandwidth on their machine; the choice can have an affect on the quality of the listening experience.

Q: If I download a library, how long is it “mine”?
A: Until you graduate; you can continue monthly accounts; collections that you’ve built up are retained if you restart the service at some point in future. During the summer, student accounts go to pause; but they can pay to continue the service.

Q: What was your discussion with the music industry on the price/song? If the price were reduced to around $0.20/song, there would be no need for anyone to obtain music illegally.
A: Alas, this is tied to the current contracts with demand x cents for each participant in the chain; when they renegotiate the contracts to percentages, they can get the pricing to around $0.10/song. Napster tries to offer the “a la carte” and “all-access models”; at $0.10/track, the “all-access” quickly becomes the best option.

Q: When these two universities negotiated with Napster to bring them onto campus, did you have to do an RFP or could you go directly to Napster (as the “only game in town”)?
A: We didn’t have to issue an RFP at Rochesterbecause: (a) private university and b) had data to show that there was only one company that could provide the services.
A: Napster is going to use this as part of the advertising scheme – we discovered that lots of kids are getting nervous about being on campuses that are not doing something about of music downloads. Napster 2.0 is a purely legitimate business and is not in the “stick” side of the business. Penn State and Rochesterusage has been high and sustained throughout the semester; we consider this a big success. At this time, we have over 100 universities that are in discussions with us. Many times, this has been spurred by student government; we’ve been giving out test accounts to try the service and they loved it. Students are going to the download style vs. purchasing CDs. They are concerned about getting access to the content they want – it isn’t “P2P” to them. P2P and music file sharing have become intertwined in the public eye, to the detriment of P2P development. From the Napster perspective, we are not offering the service over the P2P system because that is not how the students need to access it. Illegitimate file sharing needs to be P2P; none of the legitimate music sharing systems are P2P anymore. What is seen with Kazaa is that there are very few “sharers” and many “takers;” with our caching system, students take from the central database vs. sharing with one another.


Alex Valentine gave an overview of LionShare; LionShare is a federated P2P network for academic collaboration. The peer server is a component of LionShare that works for persistent storage. Users do not need stay logged in to continue to share information. This tool was originally developed out of a digital image study by Penn State Library. It was developed as a server-based storage facility with the ability to allow users to share files on their desktops. This was the beginning of LionShare – the team wrote a proposal to the Andrew Mellon Foundation for a grant. Participants include Penn State, Intenet2, MIT, and Simon Fraser University and code is based on contributions by the Limewire open source project.

The goal was the ability to share files with individuals online – individuals or groups, departmentally or multi-institutionally. LionShare architecture consists of the LionShare code, which uses a modified version of the Gnutella Protocol, combining both a P2P with a client/server architecture. This is both decentralized and centralized in combination – using a federated authentication with access control. Components include the LionShare PeerServer, SASL CA, authentication/authorization, DR OSIDs and GwebCache (Gnutella).

Derek Moore talked about LionShare security; the core of LionShare is a group of the AAA’s – having these in a P2P network is a relatively new concept. The P2P networks on the commodity Internet seem to avoid these components. Accountability is the crux of the tool – people can see which users are sharing what.

PeerServer features include Search, Share, Collaborate, and Organize. The LionShare search can search and retrieve from other peers, peer servers, repositories, other networks, and local libraries – users can also browse other hosts. Searchers are NOT authenticated due to scaling issues.

Sharing includes locally on a peer, persistently on a PeerServer, metadata only, and control access with whom you share; users cannot share anonymously – this avoids the whole file-sharing RIAA problem.

Collaboration components include chat (1-1) – the group is considering adding group chat feature, in the future where each user has access to the user profile of the person with whom they’re chatting.

Metadata is a new organizational component; most metadata is automated in other P2P networks; they’ve expanded it to taking it from files, file systems, and document metadata and then auto-populating the field s with information. I.e., when you take digital photos, you automatically get metadata (time shot, fstop, etc.); this will come from both a file and from the user profile. (The metadata is using a schema so that it is extensible – they are using an IEEE system with the Dublin Core and then adding some custom schemas that users will be able to create for themselves). Another organizational tool is the local library search.

The LionShare Peer Server components include a Gnutella core with an Apache Tomcat; features include persistent file sharing and public access. AAA is a major component of the LionShare Peer Server.

He provided a brief overview of the Gnutella Protocol and suggested that it was a fairly robust protocol for P2P use. He showed a diagram of the LionShare Peer Server architecture and described how the data/queries flow.

Peer discovery was a big issue – early versions required users to enter IP addresses, etc; the modern version allows the user to connect by automatic authentication from a specific IP address. A cache is maintained in the case that a user is disconnected. If the server is over-connected, it suggests other locations to which the user can connect. He showed a diagram of the connection and described how a Gnutella peer gets connected to an Ultra-peer.

He described Gnutella enhancements and why they were added/expanded/deleted – one of them, “swarm downloading,” is being disconnected because of negative side effects.

LionShare security relies on the AAA aspects. Users cannot share files anonymously (this is set in stone) with two possible ones – searching anonymously should be an option for most of the time but you might have to identify yourself to retrieve a file. They do not use Shibboleth – they use a SAML/XACML for authorization. They started out to get a common denominator certification for different universities – the University of Michigan has a nice piece of software (KCA) that allows users to create a key on their machine, the user gets a key that allows them to be certified for 8-10 hours, based on local authentication. Private keys are not permanently stored. Users don’t need to be using kerberos, etc. The LionShare Peer will obtain certificates/credentials before joining. He provided a “typical” LionShare flow, with diagrams and described:

  • Queries – standard Gnutella filename-based query or metadata-based search; queries are unsigned, unencrypted.
  • Query Hits – they have metadata, which allows users to set a very concrete list of attributes (so they could agree to release the data only to people in a specific class, etc.). The hits are signed with a valid digital signature (certified).
  • Certifications and Metadata – are destroyed at shut down but the metadata is stored on disk; the metadata has your server cert so there is an audit trail.
  • Attribute Release – couldn’t use Shibboleth because you have to have attributes for everything in advance; we felt there was no way to decide all the possible attributes a user might want to include in advance. Also, we didn’t want to set “roles” for which specific attributes would always be available – we wanted to let the sharer make that decision, individually.


Q&A:
Q: After x amount of hops, Jabber doesn’t pass the message through the server, it passes through P2P. What are you looking for in instant messaging?
A: We just started to have our secured chat discussions. We’re not up on the secured federated Jabber methods. Everyone on the network is going to have a certificate so we’re interested in whatever you’re doing.

Q: You said a lot of the implementation is in Java – what is the rest of it in?
A: We are trying to leverage the existing Shibboleth attribute authority, which is in Java. Everything else is in Java. We’re having some concerns about the availability of SASL libraries in Java, but we’re working with that.

Q: Do you have any dependencies in there about IP address space?
A: The existing Gnutella protocol only allows an IPv4 addresses, but we’re not designed to be backwardly compatible.

The speaker provided a list of locations for more information; also, see http://lionshare.its.psu.edu/main/.


ATTENDEES
John C. Fowler, Tim Chown, Don Spicer, Kevin Gamble, Charles Hedrick, Laurie Burns, Joe Askins, Gordon Springer, John Bielec, Ken Blackney, Cheryl Munn-Fremon, John Streck, Susan Evett, Milt Halem, Kilnam Chon, Tom Knab, Garret Sern, Linda Roos, David Futey, Steven Wallace, Dave Pokorney, James Deaton, Todd Reed, Greg Scibert, John Siegle, Scott Fullerton, Renee Shuey, Keven Morooney, Gary Auguston, Roman Jimenez, Mark Earest, Charles Ball, and Sung Lee.