Advanced Topics for Information Retrieval: 2010

Tuesday, April 20, 2010

Unit 14 notes

the readying of this week talks about information retrieving services.

Paepcke, Andreas. (1996). Digital libraries: Searching is not enough. D-Lib Magazine, May. http://www.dlib.org/dlib/may96/stanford/05paepcke.html
It talks about digital library
"The main research problems derivable from this starting point are scaling and information finding: If only we can provide good performance for the standard information retrieval metrics of recall and precision when accessing very large collections, we will have a Digital Library.

3. Block, Marylaine (2002). "My Rules of Information." Searcher (10) 1: 61-67. (Available online at http://www.infotoday.com/searcher/jan02/block.htm)

this article talks about how Librarians know how to alocate information ."People assume that librarians must know all the answers, but what we really know is how to ask good questions. We know how to slide up and down that continuum from general to narrow until we find the exact set of parameters that work."

Monday, April 5, 2010

Notes: Unite13

I could not access the articles but I learn some information about it by readying article talks about Intelligent Information Retrieval. It is a very good article because it gives you an introduction about Intelligent Information Retrieval. Susan Gauch says “Technological advances have led to new problems and new solutions. The number, size, and contents of online databases has grown. Finding relevant information is truly a “needle in a haystack” proposition. In one study of inexperienced searchers, one-quarter of the subjects were unable to pass a benchmark test of minimum searching skill and This led people need the Intelligent information retrieval.

Monday, March 29, 2010

Unit 12: Retrieving information on the World Wide Web

1.Sullivan, Danny. (2003). Search Engine Watch. Review 2 short pieces: “How Do Search Engines Work?” at http://searchenginewatch.com/webmasters/article.php/2168031 and “How Search Engines Rank Web Pages” at http://searchenginewatch.com/webmasters/article.php/2167961>. (Date accessed: 8/19/2005)

- This website talks about how the search engine works.
- There are two types of search engine which are crawler-based search engines and human-powered directories

- crawler-based search engines: such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.
- Human-Powered Directories: A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.

- Also it talks about how Search Engines Rank Web Pages: the search engine will sort through the millions of pages it knows about and present you with ones that match your topic. The matches will even be ranked, so that the most relevant ones come first.

2)Drabenstott, K.M. (2001). Web Search Strategy Development. Online, (25) 4: 18-24. (Available online through Pitt E-Journal http://ug4fn7ck2h.search.serialssolutions.com/ )

i tried to access this site but it does not work .

3) Blachman, Nancy (n.d.) Google Guide. Review 2 sections listed under "Printable Versions" - I: Query Input and II: Understanding Results. < http://www.googleguide.com/toc.html > (Date Accessed: 10/20/2004).Note: This will help with the assignment!

It is very helpful and informative site. I like it . it teach you how to use Google and answer your all your guestion that related to Google

Monday, March 22, 2010

Muddiest point for unit 8

I am still do not understand how one table can have two primary keys?

Unit 11

Unit 11: Retrieving information in Digital Libraries:

1.Subject-based Information Retrieval within Digital Libraries Employing LCSHs

"In this article, an effort is made to exploit the explicit and implicit semantic expressiveness of subject headings conforming to the LCSH guidelines, in favor of more efficient subject-based, information retrieval modules within digital libraries."

Sunday, February 28, 2010

Notes for unite # 8

Philip Greenspun. SQL for Web Nerds, chapter 3 “simple queries” and W3school. SQL tutorial

These two websites provide us with information about SQL and how we can start using it and how we can write queries. Moreover, they provide us with examples and tutorial which can teach us how to use SQL to access and manipulate data in.

Thursday, February 25, 2010

Muddiest points for week # 7

Is assignment #2 depend on assignment # 1 ( is it continuation for
assignment #1 ?)

If it depends on assignment # 1 could you provide us with the correct
answers for assignment #1 so we can do this assignment ...

and are you going to provide us with a video for assignment # 2 ?

Sunday, February 7, 2010

Muddiest point for unit#4

I just want to know, can we retrieve information through searching by sound instead of writing words?

unit #5

Digital reference: reference librarians' experiences and attitudes

This article talked about a survey of public and academic reference librarians to ask about their experiences with, and opinions and attitudes about, technology in reference.
“this article reports the results of a survey of reference librarians in public and academic libraries of various sizes in the United States, asking them about their experiences with and attitudes towards the use of digital and networked technologies and resources in reference work. A total of 648 responded. In general, respondents were positive and optimistic in their outlook, but not unreservedly so. Among the strongest findings was a correlation between recent experience at doing digital reference and positive attitudes towards it, a clear set of opinions about what such services would be best and worst at, and differing perspectives and patterns of responses between academic and public librarians. In addition, questions asking about characteristics of librarians, their current and planned reference services, and some of their professional choices in doing reference work are reported”

I tries to access it but I got this msg :
This page cannot be found [ Error 404 ]
The page "http://www.csu.edu.au/special/online99/proceedings99/200.htm" may have been removed, had its name changed, or is temporarily unavailable. We apologise for the inconvenience.

2. Lipow, A.G. (1999). Serving the remote user: Reference service in the digital environment. Paper presented at the Ninth Australasian Information Online and ON Disc Conference and Exhibition. Available : http://www.csu.edu.au/special/online99/proceedings99/200.htm

I tries to access it but I got this msg :
This page cannot be found [ Error 404 ]
The page "http://www.csu.edu.au/special/online99/proceedings99/200.htm" may have been removed, had its name changed, or is temporarily unavailable. We apologise for the inconvenience.

Sunday, January 31, 2010

Muddiest points

no muddiest point for unit # 3

Unit #4

The articles for weeks #4 talks about

1- A good searcher has three categories:
- General principles of searching ,Academic skills, system dependent skills.
2- Profile of a good searcher
- Good communication skills and people arientation
- Self confidence
- Patience and perseverance
- Logical and flexible approach to problem solving
- Memory for details
- Good organization and efficient work habits
- Spelling, grammar and typing skills
- Motivation for additional training
- Subject area knowledge
- Willingness to share knowledge with others
3- Interaction in Information Retrieval: Selection and
Effectiveness of Search Terms
4- Methodology for Data Collection and the Data Corpus

Sunday, January 24, 2010

Muddiest point and discussion question #2:

I do not know, is creating Blog required for this course or no ?
Just want to ask to make sure ?

Unite #3

Notes about Required Readings: (I take the notes directly from the chapter)

IR sections 1.4
Talks about The extended Boolean model versus ranked retrieval,
The Boolean retrievalmodel contrasts with ranked retrRANKED RETRIEVAL models such as the
MODEL vector space model in which users largely use free text queries, that is, just typing one or more words rather than using a precise language with operators for building up query expressions, and the system decides which documents best satisfy the query

chapters 6,
talks about Scoring, term weighting & the vector space model,
This chapter consists of three main ideas.
1. it introduce parametric and zone indexes in Section 6.1, which serve Two purposes. First, they allow us to index and retrieve documents by metadata such as the language in which a document is written. Second, they give us a simple means for scoring (and thereby ranking) documents in response to a query.
2. Next, in Section 6.2 we develop the idea of weighting the importance of a term in a document, based on the statistics of occurrence of the term.
3. In Section 6.3 we show that by viewing each document as a vector of such weights, we can compute a score between a query and each document. This view is known as vector space scoring.

Chapter 11 and 12
Talks about Probabilistic information retrieval and
There is more than one possible retrieval model which has a probabilistic basis. Chapter 11 will introduce probability theory and the Probability Ranking
Principle (Sections 11.1–11.2), and then concentrate on the Binary Inde- pendence Model (Section 11.3), which is the original and still most influential
probabilistic retrieval model. Finally, it will introduce related but extended methods which use term counts, including the empirically successful Okapi BM25weighting scheme, and BayesianNetworkmodels for IR (Section 11.4).
In Chapter 12, it then present the alternative probabilistic language modelOnline

Saturday, January 16, 2010

Notes for unite #2

Required Readings:

Notes for chapters (1,2,3)
IIR sections 1.2, 1.3, chapters 2 and 3

(1.2) explians in steps how we can build an inverted index and section, (1.3) explians how can we process a query using an inverted index. Also these two sections provide the readers with figures to let us imagine how each step can work.
the major steps in inverted index construction:
1. Collect the documents to be indexed.
2. Tokenize the text.
3. Do linguistic preprocessing of tokens.
4. Index the documents that each term occurs in.

Chapter (2)(The term vocabulary and postings lists)
Chapter 2 talks about how the basic unit of a document can be defined and how the character sequence that it comprises is determine. Then it defines Tokenization which is the Process of chopping character streams into tokens, while linguistic preprocessing then deals with building equivalence classes of tokens which are the set of terms that are indexed.

Chapter(3) Dictionaries and tolerant retrieval
- Develop data structures that help the search for terms in the vocabulary in an inverted index
- we study the idea of a wildcard query: a query such as *a*e*i*WILDCARD QUERY o*u*, which seeks documents containing any term that includes all the five vowels in sequence.
- Users make spelling errors either by accident, or because the termthey are searching for (e.g., Herman) has no unambiguous spelling in the collection. We detail a number of techniques for correcting spelling errors in queries, one term at a time as well as for an entire string of query terms.
- study a method for seeking vocabulary terms that are phonetically close to the query term(s).
------------

Notes for FOA Sections 1.2-1.5 in Chapter 1. PDF version of Ch. 1. on author's site http://www.cs.ucsd.edu/~rik/foa/.

Chpter(1) shows you how that there are many of the tools that are useful for searching collections and other media.
People use many tools of FOA:
- Language
- Writing
- But then people started to write a lot and we started to have a lot of books which makes people start to have a difficult time to find what they want. Therefore, people go to library to find what they need
- Now today people stated to use WWW to find what they need
- What happened between two persons, when one ask question and the other one answer him. The same thing that happened between person and computer engine.
Also this chapter defines indexing and mentions that there are two kind of indexing which are
1) manual indexing
2) automatic indexing.

Muddiest point and discussion question #1:

As you said, there is a connection between Information retrieval (IR) and database(IR=DATABSE).
So don’t you think we as a librarian, we should have a specific class focus on just databases and how we can create database and write queries?

Advanced Topics for Information Retrieval