Notes about Required Readings: (I take the notes directly from the chapter)
IR sections 1.4
Talks about The extended Boolean model versus ranked retrieval,
The Boolean retrievalmodel contrasts with ranked retrRANKED RETRIEVAL models such as the
MODEL vector space model in which users largely use free text queries, that is, just typing one or more words rather than using a precise language with operators for building up query expressions, and the system decides which documents best satisfy the query
chapters 6,
talks about Scoring, term weighting & the vector space model,
This chapter consists of three main ideas.
1. it introduce parametric and zone indexes in Section 6.1, which serve Two purposes. First, they allow us to index and retrieve documents by metadata such as the language in which a document is written. Second, they give us a simple means for scoring (and thereby ranking) documents in response to a query.
2. Next, in Section 6.2 we develop the idea of weighting the importance of a term in a document, based on the statistics of occurrence of the term.
3. In Section 6.3 we show that by viewing each document as a vector of such weights, we can compute a score between a query and each document. This view is known as vector space scoring.
Chapter 11 and 12
Talks about Probabilistic information retrieval and
There is more than one possible retrieval model which has a probabilistic basis. Chapter 11 will introduce probability theory and the Probability Ranking
Principle (Sections 11.1–11.2), and then concentrate on the Binary Inde- pendence Model (Section 11.3), which is the original and still most influential
probabilistic retrieval model. Finally, it will introduce related but extended methods which use term counts, including the empirically successful Okapi BM25weighting scheme, and BayesianNetworkmodels for IR (Section 11.4).
In Chapter 12, it then present the alternative probabilistic language modelOnline
Sunday, January 24, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment