Monday, January 31, 2005

QDN

QDN at USC is "Querical Data Network", a interesting concept
It employs two approaches to solve data locating problem in a complex data system such as peer to peer network or sensor network: first is the small-world thoery, and the second is percolation theory which supports scalable flooding
http://www-scf.usc.edu/~banaeika/papers/NSF.pdf

The indexable part of QDN captures my idea on how to conduct XML queries over XML data in a P2P network. There are several good points to mention:
1. The centralized design pattern of QDN is actually data shipping, while the decentralized design is query shipping. The eciency of the decentralized design is due to in-network processing.
2. Measurement metrics include precision, recall ratio, and hop count
3. , we argue that as a model DHT fails to respect natural characteristics/requirements of
QDNs. For example, data-to-node assignment via the virtual identifier space
violates the natural distribution and replication of the data, where each node
autonomously maintains its own and only its own data. Also, regular topology
of the DHT imposes strict connectivity rules to autonomous nodes.
4. Represent each node as multiple virtual nodes by taking each tuple tk as a virtual identity, which increases the size of the QDN, but it is more accurate.
5. with wooding the node that receives the first hit during selective walk, marks
the query for scope-limited ooding and continues forwarding the query by originating the wooding.
6. . Percolation theory analyzes
the statistical and geometrical properties of such clusters as the probability p
changes. Statistical distribution of the cluster size or cluster mass, i.e., number
of sites within the cluster, cluster surface, i.e., length of the cluster perimeter,
and cluster shape, i.e., fractal geometry of the cluster boundaries, are among the
properties of interest in the percolation theory.

Sunday, January 30, 2005

IT news

Some evidence that this might be the case exists in the form of the recent announcment that IBM is creating the Power Alliance, and sales of the (relatively) inexpensive OpenPower Linux/Power based machines. Additionally IBM have opened up the Power architecture for licensing and made other gestures to promote the adoption of the platform.

Could the end of the WinTel hegemony be coming? Could Linux on Power be the future?


new cases for distributed XML data

1. the idea of social networks devised a project named FOAF (Friend of a Friend). The concept here is that the owner of the data (that's you) creates one XML file containing your acquaintances (the info in 'Infoware') and distribute that as you like.
2. Another XML file based project I've been following is DOAP. Edd Dumbill wanted to apply the same idea as FOAF to Description of a Project. This is an XML file that contains all the info you'd ever want to know about a software project in one place that doesn't require being duplicated by hand in the handful of open source project sites.
3. Geocaching is an entertaining adventure game for gps users. Participating in a cache hunt is a good way to take advantage of the wonderful features and capability of a gps unit. The basic idea is to have individuals and organizations set up caches all over the world and share the locations of these caches on the internet. GPS users can then use the location coordinates to find the caches. Once found, a cache may provide the visitor with a wide variety of rewards.

Saturday, January 29, 2005

Related work on P2P XML

1. Angela Bonifati's paper
Several points are worthwhile to note:
1) More specific fragments cannot be located by more general queries. For example, if the network contains two fragments "/A/B" and "/A/C", the query "/A" cannot find them; in contrast, if the network contains a fragment with path expression "/A", while the query is "/A/B", it is probably that they are matching with each other.
2) An outstanding part of the paper is the employment of fingerprinting rather than the secure hash function to encode both the peer addresses and the fragment path expressions
3) This paper provides two categories of searching functionalities, the one is the exact-match, and the other is partial-match, which does not require the exhaustiveness of the matching data, so the proposal can be treated as an informaiton retrieval model
4) Another feature of the proposal is the use of an existing structured P2P protocol (i.e. Chord)
5) The proposal supports limited operators such as /, //, and positional selector []
6) Schema-less document

2. E. Pitoura
1) Exploit the structure of XML documents
2) Using bloom filter
3) No data placement, unstructured P2P overlay network
4) Support / and //
5) Use Manhattan(Hamming) distance, the number of the bits that they differ
6) Propagation of the update
7) The organization of the peers are based on the similarity of the filters (i.e. local and merge filter) of the peers, which is called similarity-based clustering
8) The scalability is demonstrated by using experiments
9) Schema-less document
10) For descendant axis, the path is split at // and sub-paths are processed
11) Nodes in a P2P system may be organized to form various topologies, flexiblility increases
12) Threshold is needed to set up
13) The depth filter treats the elements as a whole

3. Garces
1) A generalization of Galanis and David Dewitt's work, provide a key-to-key service
2) The proposal is based on existing DHT
3) Focus on bib db scenario
4) Sibling relationship is not addressed
5) Related to a lot of user interaction, authomated possible though
6) Application dependent on the choosing of index
7) Schema-less document