background image
2.
COLLABORATIVE FILTERING
The most important step for creating recommendations us-
ing collaborative filtering for a particular customer u is to
find an ordered set of customers N = {N
0
, .., N
l
} ; u N ,
where the ordering is based on similarity of N
i
and u, this
should be performed for each customer u (i.e. O(n
2
) calcu-
lations of similarities with n customers in the database) [9].
Similarity between two customers u and N
i
can be found
by using a proximity measure.
The essential data used
in the calculation of the similarity is the two customers'
vectors with votes on products or services. Votes are typi-
cally numerical values that are either entered by the user for
products s/he dis/liked, or values added by the e-commerce
provider system (e.g.
if a customer has browsed or pur-
chased a product or service, it may get a good score auto-
matically). Two common proximity measures are the Pear-
son correlation or the cosine measure, shown below for cus-
tomer a and b. One advantage of the cosine measure is that
it is suitable for dimension reduction with sparse data sets
(i.e. few products or services voted for).
corr
ab
=
i
(r
ai
- ¯
r
a
)(r
bi
- ¯
r
b
)
i
(r
ai
- ¯
r
a
)
2
i
(r
bi
- ¯
r
b
)
2
(1)
cos(a, b) =
a · b
a
2
b
2
(2)
When all proximity measures between all customers are cal-
culated, a neighborhood is formed (a reasonable size subset
of customers in N for a particular customer u). If the the
vectors in the data set are sparse, a aggregate neighborhood
approach can be used (calculates the centroid of the neigh-
borhood before adding new customers). If the vectors in
the data set are dense, a center-based approach can be used
(picking the l most similar customers from N ).
When a set of neighborhood vectors is found, recommen-
dations can be calculated by selecting items in the vectors
from the neighborhood that the customer u hasn't voted for,
e.g. if a customer u hasn't voted for particular book b, and
another (i.e. the most similar) customer u in the neigh-
borhood has given the book a high rating, the book will be
recommended for customer u.
3.
PROPOSED APPROACH
The main idea of this approach is to transform the problem
of finding good product and service recommendations using
collaborative filtering, into a search problem that can take
advantage of the scalability and privacy possible in a P2P-
based architecture [8] similar to Gnutella or Freenet.
3.1
Query Flow
3.1.1
Gnutella
In Gnutella, a query (i.e. for a MP3 file) from a Peer node
in are broadcasted to all its neighbours.
The neighbour
Peers (independently) check if the query matches one of their
hosted files, if so, they return a found message back to the
sending node, otherwise they decrease the TTL field (TTL
= Time To Live) and then pass on the query to their neigh-
bour peers. If the TTL count reaches 0, the message is not
Mobile
Customer
Assistant
Agent
Assistant
Agent
Mobile
Customer
Assistant
Agent
Assistant
Agent
Mobile
Customer
Mobile
Customer
Figure 2: P2P Network of Agents
sent any further. In other words, the query is broadcasted
in "all directions" from the quering node, but only the query
matches are sent back the network path they arrived from
[8]. This flow of queries ensures that a peer doesn't know if
a received query was from a neighborhood peer, or another
peer that broadcasted the message, this gives a limited de-
gree of privacy for queries.
3.1.2
Proposed Approach
Queries are broadcasted to neighbour Peers as for Gnutella,
but a query is not a text string (as the case for Gnutella's
file search), but rather a vector with votes on products and
recommendations. This voting vector is the same as used in
collaborative filtering.
3.2
Routing Algorithms
3.2.1
Gnutella
In Gnutella, there is a very simple routing algorithm that
either stops broadcasting from a peer if a match is found, or
continues broadcasting messages to other peers if no match
was found.
3.2.2
Proposed Approach
When a peer receives a query voting vector: It calculates
the proximity (e.g. with Pearson correlation from collab-
orative filtering) between the voting vector in the message
and cached previous messages at the node. If the proximity
measure is higher than a threshold , the cached voting vec-
tor is sent back to the neighbour that sent the query voting
vector (and possibly further back to the origin from there).
If the proximity measure is lower than , the query voting
vector is broadcasted further to the receiving peer's neigh-
bours. If there are too many messages received per time
unit, they are simply passed on to neighbour peers with-
out calculating the proximity measure against cached items.
The size of the cache tries to balance against the traffic, if
the traffic is high the cache is decreased in order to be able
to calculate proximity measures.
Paper B
75

<< - < - > - >>