Required Readings:
Notes for chapters (1,2,3)
IIR sections 1.2, 1.3, chapters 2 and 3
(1.2) explians in steps how we can build an inverted index and section, (1.3) explians how can we process a query using an inverted index. Also these two sections provide the readers with figures to let us imagine how each step can work.
the major steps in inverted index construction:
1. Collect the documents to be indexed.
2. Tokenize the text.
3. Do linguistic preprocessing of tokens.
4. Index the documents that each term occurs in.
Chapter (2)(The term vocabulary and postings lists)
Chapter 2 talks about how the basic unit of a document can be defined and how the character sequence that it comprises is determine. Then it defines Tokenization which is the Process of chopping character streams into tokens, while linguistic preprocessing then deals with building equivalence classes of tokens which are the set of terms that are indexed.
Chapter(3) Dictionaries and tolerant retrieval
- Develop data structures that help the search for terms in the vocabulary in an inverted index
- we study the idea of a wildcard query: a query such as *a*e*i*WILDCARD QUERY o*u*, which seeks documents containing any term that includes all the five vowels in sequence.
- Users make spelling errors either by accident, or because the termthey are searching for (e.g., Herman) has no unambiguous spelling in the collection. We detail a number of techniques for correcting spelling errors in queries, one term at a time as well as for an entire string of query terms.
- study a method for seeking vocabulary terms that are phonetically close to the query term(s).
------------
Notes for FOA Sections 1.2-1.5 in Chapter 1. PDF version of Ch. 1. on author's site http://www.cs.ucsd.edu/~rik/foa/.
Chpter(1) shows you how that there are many of the tools that are useful for searching collections and other media.
People use many tools of FOA:
- Language
- Writing
- But then people started to write a lot and we started to have a lot of books which makes people start to have a difficult time to find what they want. Therefore, people go to library to find what they need
- Now today people stated to use WWW to find what they need
- What happened between two persons, when one ask question and the other one answer him. The same thing that happened between person and computer engine.
Also this chapter defines indexing and mentions that there are two kind of indexing which are
1) manual indexing
2) automatic indexing.
Saturday, January 16, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment