作者:克罗夫特 日期:2014-09-29 21:03:30
本书介绍了信息检索(1R)中的关键问题。以及这些问题如何影响搜索引擎的设计与实现,并且用数学模型强化了重要的概念。对于网络搜索引擎这一重要的话题,书中主要涵盖了在网络上广泛使用的搜索技术。
本书适用于高等院校计算机科学或计算机工程专业的本科生、研究生,对于专业人士而言,本书也不失为一本理想的入门教材。
作者简介:
W.BruceCroft马萨诸塞大学阿默斯特分校计算机科学特聘教授、ACM会士。他创建了智能信息检索研究中心,发表了200余篇论文,多次获奖,其中包括2003年由ACMSIGIR颁发的GerardSalton奖。
目录:
1SearchEnginesandInformationRetrieva l
1.1WhatIsInformationRetrieva l?
1.2TheBigIssues
1.3SearchEngines
1.4SearchEngineers
2ArchitectureofaSearchEngine
2.1WhatIsanArchitecture?
2.2BasicBuildingBlocks
2.3BreakingItDown
2.3.1TextAcquisition
2.3.2TextTransformation
2.3.3IndexCreation
2.3.4UserInteraction
2.3.5Ranking
2.3.6eva luation1SearchEnginesandInformationRetrieva l
1.1WhatIsInformationRetrieva l?
1.2TheBigIssues
1.3SearchEngines
1.4SearchEngineers
2ArchitectureofaSearchEngine
2.1WhatIsanArchitecture?
2.2BasicBuildingBlocks
2.3BreakingItDown
2.3.1TextAcquisition
2.3.2TextTransformation
2.3.3IndexCreation
2.3.4UserInteraction
2.3.5Ranking
2.3.6eva luation
2.4HowDoesItReallyWork?
3CrawlsandFeeds
3.1DecidingWhattoSearch
3.2CrawlingtheWeb
3.2.1RetrievingWebPages
3.2.2TheWebCrawler
3.2.3Freshness
3.2.4FocusedCrawling
3.2.5DeepWeb
3.2.6Sitemaps
3.2.7DistributedCrawling
3.3CrawlingDocumentsandEmail
3.4DocumentFeeds
3.5TheConversionProblem
3.5.1CharacterEncodings
3.6StoringtheDocuments
3.6,1UsingaDatabaseSystem
3.6.2RandomAccess
3.6.3CompressionandLargeFiles
3.6.4Update
3.6.5BigTable
3.7DetectingDuplicates
3.8RemovingNoise
4ProcessingText
4.1FromWordstoTerms
4.2TextStatistics
4.2.1VocabularyGrowth
4.2.2EstimatingCollectionandResultSetSizes
4.3DocumentParsing
4.3.1Overview
4.3.2Tokenizing
4.3.3Stopping
4.3.4Stemming
4.3.5PhrasesandN-grams
4.4DocumentStructureandMarkup
4.5LinkAnalysis
4.5.1AnchorText
4.5.2PageRank
4.5.3LinkQuality
4.6InformationExtraction
4.6.1HiddenMarkovModelsforExtraction
4.7Internationalization
5RankingwithIndexes
6QueriesandInterfaces
7Retrieva lModels
8eva luatingSearchEngines
9ClassificationandClustering
10SocialSearch
11BeyondBagofWords
Reverences
Index