同上一篇一样,也是从那个小牛那里转过来的,这两篇文章对于科研的指导意义都是巨大的

http://blog.sciencenet.cn/home.php?mod=space&uid=44406&do=blog&id=630745

———正文开始————–

最近有朋友问我,为什么你可以每天工作那么久的时间,在没有周六和周日的情况下,依然保持良好的工作状态和身体状态?这是个好问题,很多人辛苦工作,可总是觉得自己没有成就感或者疲于奔命。如何长时间工作并且保持效率?我愿意将自己的心得与大家分享。   我以前读博士时,基本上一天工作16小时。如何在艰苦的工作中,激励自己、让自己能做到更多从而发挥自己的潜力,我认为大家应该考虑的是:要做你真正感兴趣、与自己人生目标接轨的事情。 我发现我的“生产力”和我的“兴趣”有着直接的关系,而且这种关系还不是单纯的线性。如果面临我没有兴趣的事情,我可能会花40%的时间,但是真正产生的结果可能只有20%的工作效率;如果遇到我感兴趣的事情,我可能会花100%的时间而得到200%的工作效率。 其次,不要成为“紧急”的奴隶。要关注“关键”的问题。事分轻重缓急,因此不要把全部的时间都去做那些看起来“紧急”的事情,一定要留一些时间做那些真正“重要”的事情。管理自己时间的问题,尤其是要分清何为“紧急的事”、何为“重要的事”。   我这里有几个辅助的建议:第一,排序。每天对该做的事排好优先次序,并按照这个次序来做。我感到在工作和生活中每天都有干不完的事,唯一能够做的就是分清轻重缓急。有的年轻人会说“没有时间学习”,其实,换个说法就是“学习没有被排上优先级次序”。 第二,时间管理与目标设定、目标执行具有相辅相成的关系,时间管理与目标管理是不可分的。每个小目标的完成,会让你清楚地知道你与大目标的远近,你的每日承诺是你的压力和激励,每日的行动承诺都必须结合你的长远目标。所以,要想有计划地工作和生活,需要你管理好自己的时间。这一点说起来容易,但做起来就不那么简单。  第三,在时间管理中,必须学会运用80%–20%原则,要让20%的投入产生80%的效益。要把握一天中20%的经典时间(有些人是早晨,也有些人是下午或夜里),专门用于你对于关键问题的思考和准备。有的人以为,安排时间就是做一个时间表,那是错误的。人的惯性是先做最紧急的事,但是这么做有可能使重要的事被荒废。每天管理时间的一种方法是,早上定立今天要做的紧急事和重要事,睡前回顾这一天有没有做到两者的平衡。  有那么多的“紧急事”和“重要事”,想把每件都做到最好是不实际的。建议你把“必须做的”和“尽量做的”分开。必须做的要做到最好,但是尽量做得尽力而为就可。建议你用良好的态度和胸怀接受那些你不能改变的事情,多关注那些你能够改变的事情。以终为始,做一个长期的蓝图规划,一步一步地向你的目标迈进。这样,你就能一步步地看到进展,就会更有动力、自信地继续做下去。   其实学习和工作的状态是一样的道理。别人曾经问我,如何在长时间内保持高效的学习状态。我的建议是,第一要精神好,全神贯注,心无杂念。第二要给自己时间放松。第三要给自己一些压力。不要让自己一直处于松弛的环境中。第四,不要太长的时间做同样一件事情。因为重复多了容易感觉枯燥和疲劳,效率就会变差。第五,不要没有准备就开始干活。第六,反复的练习、回忆、记忆是非常有用的。这些道理都很符合做事情的状态。   最后,值得注意的是,年轻时拼命工作或许没有太大关系,但是年纪较长后,你就必须要照顾自己的身体,要平衡好工作、嗜好、家庭等各方面的需求。我不认为“锻炼身体”能够从根本上改变你的工作状态和身体状态—虽然锻炼身体是好事,多运动也会让你更有精力,但我相信能改变你的状态的关键是心理而不是生理上的问题。真正投入到你的工作中才是一种态度、一种渴望、一种意志。

这是在一个小牛的博客上看到的,这个人现在威斯康辛任教

http://blog.sciencenet.cn/home.php?mod=space&uid=44406&do=blog&id=633648

————正文开始—————

转载注:这是我在读硕士和博士期间读过无数遍的一篇文章,很多词句已烂熟于心,并且作为指导方针帮助我少走了不少弯路。转载于此与朋友们分享。

How to be a winner – advice for students starting into research work

http://www.seas.upenn.edu/~andre/general/student_research_advice.html

** ** Don't get hung up trying to understand everything at the outset ** ** The biggest challenge you face at the onset of any new project is that there is a huge (seemingly overwhelming) amount of stuff you need to know to tackle your problem properly. While this phenomenon is true in the small for the beginning researcher, it is also true in the large for any research project. So learning how to cope with this challenge is an important skill to to master to become a good researcher. In contrast, blocking your action and progress while waiting for complete knowledge is the road to failure.

Coping mechanisms employed by winners include:

  • prioritizing (what do I need to know most)
  • read (everything made available to you, and seek out more; but don't put months of reading between you and getting started doing things.)
  • multithreading (when blocked on one item or path, is there another I can productively pursue?)
  • pursuing multiple, possible solution techniques (maybe some have easier/less blocks paths than others)
  • wishful thinking (ok, let's assume this subproblem is solved, does that allow me to go on and solve other problems?)
  • pester people who might have some of the information you need (you might think they should know what you need to know, but often they don't have a clear idea of what you do and don't know; start by getting them to give you pointers to things you can use to help yourself. Show respect for their time and always follow up on the resources you've been given before asking for a personal explanation.)
  • propose working models — maybe they are wrong or different from others, but they give you something to work with and something concrete to discuss and compare with others. You will refine your models continually, but it's good to have something concrete in mind to work with. ** ** Losers will stop the first time they run into something they don't know, cannot solve a problem, or encounter trouble slightly out of what they consider ``their part'' of the problem and then offer excuses for why they cannot make any progress. Winners consider the whole problem theirs and look for paths around every hangup. Losers make sure there is someone or something to blame for their lack of progress. Winners find ways to make progress despite complications. Losers know all the reasons it cannot be done Winners find a way to do it.

 

** ** ** ** Communicate and Synchronize Often

Of course, when you do have to build your own models, solve unexpected problems, make assumptions, etc. do make sure to communicate and synchronize with your fellow researchers. Do they have different models from yours? What can you learn from each others' models and assumptions? Let them know what you're thinking, where you're stuck, and how you're trying to get around your problems. ** **

 

Decompose

The whole problem often seems overwhelming. Decompose it into manageable pieces (preferably, with each piece a stable intermediate). Tackle the pieces one at a time. Divide and conquer.

This may sound obvious, but it works. I've turned numerous problems which appeared ``frightening'' in scope into many 1-day or 2-day tasks, and then tackled each nice, contained 1–2 day task as I came to it. As I understood more, new problems and tasks arose, but they could all be broken back down to bite sized pieces which would be tackled one at a time.

 

** ** Be Organized

In computer systems especially, the biggest limitation to our ability to conquer problems is complexity. You need to work continually to structure the problem and your understanding of it to tackle the inherent complexity. Keep careful track of what you have done and what you need to do. Make lists; write it down; don't rely on your memory (or worse, yet, your supervisor's memory) to hold all the things you need to do and all the intermediate issues you need to address. ** ** Prioritize ** ** Make priorities in your efforts and check your priorities with your supervisor. A common occurrence is for your supervisor to ask you to do A, forget about it, and then ask you to do B before you could possibly have finished A. If you are uncertain on whether B should take priority over A, definitely ask. Sometimes it will, but often it won't, and your supervisor will be glad that you reminded him you were busy solving A. Keep track of B, and when you finish A, see if B still makes sense to pursue.

** ** Realize that your supervisor is busy

Your professor or graduate student supervisor is busy. He hired you to help him get more accomplished than he could have on his own. Your biggest benefit to him is when you can be self moving and motivating.

Do not expect your supervisor to solve all your problems. Find out what he has thought about and suggests for a stating point and work from there. But, realize there may become a time when you have put more quality thought into something than he has (and this will happen more and more often to you as you get into your work). So, when you think you see or know a better way to solve a problem, bring it up. In an ideal scenario this is exactly what should happen. Your supervisor gives you the seed and some directions, then goes off to think about other problems. You put in concentrated time on your problem and ultimately come back with more knowledge and insight into your subproblem than your supervisor.

As a supervisor, I work in two modes:

  1. Until a student has demonstrated that he has thought more deeply about the problem than I have, I strongly advocate that he start things my way.
  2. Once a student has examined a problem in depth, then we can discuss it as peers, and generally the student becomes the expert on this subproblem, and I can offer general advise from my experience and breadth.** ** Deliver ** ** Once you've signed up you have to deliver. But, you do not have to deliver the final solution to everything at once. This, in fact, is a fallacy of many people and research projects.

Losers keep promising a great thing in the future but have nothing to show now. Winners can show workable/usable results along the way to the solution. These pieces can include:

  • solutions to simplified models
  • pieces of a flow
  • intermediate output/data
  • measurements of problem characteristics
  • stable intermediates (see below) ** ** Demonstrate progress

This allows your supervisor to offer early feedback and to help you prioritize your attention—this will often help you both make mid-course corrections increasing the likelihood you will end up with interesting results in the end. Requirements and understanding invariable evolve (remember the key challenge at the beginning is incompletely knowledge). Change and redirection is normal, expected, and healthy (since it is usually a result of greater knowledge and understanding). The incremental model is robust and prepared for this adaptation while the monolithic (all-at-once) model is brittle and often leads to great solutions which don't address the real problem.

** **

Incrementally grow your solutions (especially software). 

In the new chapters which appear in the 20th Anniversary edition of Mythical Man Month, Brooks identifies incremental development and progressive refinement towards the goal as one of the best, new techniques which he's come to appreciate since the original writing of MMM. From my own experience, I whole heartedly agree with this, and it does have a very positive impact on morale (yours, your team's, your supervisor's).

 

** ** Target stable intermediates ** ** Look for stable intermediate points on your incremental path to solving some problem.

  • points where some clear piece of the problem has been solved (has a nice interface to this subproblem, produces results at this stage)
  • things you can build upon
  • things you can spin-out
  • things you can share with team members (allow them to help)
  • points of accomplishment ** ** Don't turn problems (subtasks) into research problems unnecessarily. ** ** Often you'll run into a subtask with no single, obviously right solution. If solving this piece right is key to the overall goals, maybe it will be necessary to devote time to studying and solving this subproblem better than it has ever been solved before. However, for most sub-problems, this is not the case. You want to keep focussed on the overall goals of the project and come up with an ``adequate'' solution for this problem. In general, try to do the obvious or simple thing which can be done expediently. Make notes on the the possible weaknesses and the alternatives you could explore should these weakness prove limiting. Then, if this does become a bottleneck or weak link in the solution chain, you can revisit it and your alternatives and invest more effort exploring them.Learn to solve your own problemsIn general, in life, there won't always be someone to turn to who has all the answers. It is vitally important that you learn how to tackle all the kinds of problems you may encounter. Use your supervisors as a crutch or scaffolding only to get yourself started. Watch them and learn not just the answers they help you find, but how they find the answers you were unable to obtain on your own. Strive for independence. Learn techniques and gain confidence in your own ability to solve problems now.

 

转自:http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

这篇是haddop官方推荐的指南,写的非常详细

感谢作者的辛勤耕耘

—————-正文开始—————–

n this tutorial I will describe the required steps for setting up a _pseudo-distributed, single-node_ Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux. Are you looking for the [multi-node cluster tutorial](http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/)? Just [head over there](http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/). Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the [Google File System (GFS)](http://en.wikipedia.org/wiki/Google_File_System) and of the [MapReduce](http://en.wikipedia.org/wiki/MapReduce) computing paradigm. Hadoop’s [HDFS](http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html) is a highly fault-tolerant distributed file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets. The main goal of this tutorial is to get a simple Hadoop installation up and running so that you can play around with the software and learn more about it. This tutorial has been tested with the following software versions: * [Ubuntu Linux](http://www.ubuntu.com/) 10.04 LTS (deprecated: 8.10 LTS, 8.04, 7.10, 7.04) * [Hadoop](http://hadoop.apache.org/) 1.0.3, released May 2012 ![](http://www.michael-noll.com/blog/uploads/Yahoo-hadoop-cluster_OSCON_2007.jpeg "Cluster of machines running Hadoop at Yahoo!") Figure 1: Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) # Prerequisites ## Sun Java 6 Hadoop requires a working Java 1.5+ (aka Java 5) installation. However, using Java 1.6 (aka Java 6) is recommended for running Hadoop. For the sake of this tutorial, I will therefore describe the installation of Java 1.6. Important Note: The apt instructions below are taken from [this SuperUser.com thread](http://superuser.com/questions/353983/how-do-i-install-the-sun-java-sdk-in-ubuntu-11-10-oneric). I got notified that the previous instructions that I provided no longer work. Please be aware that adding a third-party repository to your Ubuntu configuration is considered a security risk. If you do not want to proceed with the apt instructions below, feel free to install Sun JDK 6 via alternative means (e.g. by [downloading the binary package from Oracle](http://www.oracle.com/technetwork/java/javase/downloads/)) and then continue with the next section in the tutorial.

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ~~~

The full JDK which will be placed in `/usr/lib/jvm/java-6-sun` (well, this directory is actually a symlink on Ubuntu). After installation, make a quick check whether Sun’s JDK is correctly set up:

~~~ line-numbers 1 2 3 4 ~~~

Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc).

~~~ line-numbers 1 2 ~~~

This will add the user `hduser` and the group `hadoop` to your local machine. ## Configuring SSH Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it (which is what we want to do in this short tutorial). For our single-node setup of Hadoop, we therefore need to configure SSH access to `localhost`for the `hduser` user we created in the previous section. I assume that you have SSH up and running on your machine and configured it to allow SSH public key authentication. If not, there are [several online guides](http://ubuntuguide.org/) available. First, we have to generate an SSH key for the `hduser` user.

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 11 12 ~~~

The second line will create an RSA key pair with an empty password. Generally, using an empty password is not recommended, but in this case it is needed to unlock the key without your interaction (you don’t want to enter the passphrase every time Hadoop interacts with its nodes). Second, you have to enable SSH access to your local machine with this newly created key.

~~~ line-numbers 1 ~~~

The final step is to test the SSH setup by connecting to your local machine with the `hduser`user. The step is also needed to save your local machine’s host key fingerprint to the `hduser`user’s `known_hosts` file. If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in `$HOME/.ssh/config` (see `man ssh_config` for more information).

~~~ line-numbers 1 2 3 4 5 6 7 8 9 ~~~

If the SSH connect should fail, these general tips might help: * Enable debugging with `ssh -vvv localhost` and investigate the error in detail. * Check the SSH server configuration in `/etc/ssh/sshd_config`, in particular the options `PubkeyAuthentication` (which should be set to `yes`) and `AllowUsers` (if this option is active, add the `hduser` user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with `sudo /etc/init.d/ssh reload`. ## Disabling IPv6 One problem with IPv6 on Ubuntu is that using `0.0.0.0` for the various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses of my Ubuntu box. In my case, I realized that there’s no practical point in enabling IPv6 on a box when you are not connected to any IPv6 network. Hence, I simply disabled IPv6 on my Ubuntu machine. Your mileage may vary. To disable IPv6 on Ubuntu 10.04 LTS, open `/etc/sysctl.conf` in the editor of your choice and add the following lines to the end of the file: /etc/sysctl.conf

~~~ line-numbers 1 2 3 4 ~~~

You have to reboot your machine in order to make the changes take effect. You can check whether IPv6 is enabled on your machine with the following command:

~~~ line-numbers 1 ~~~

A return value of 0 means IPv6 is enabled, a value of 1 means disabled (that’s what we want). ### Alternative You can also disable IPv6 only for Hadoop as documented in [HADOOP-3437](https://issues.apache.org/jira/browse/HADOOP-3437). You can do so by adding the following line to `conf/hadoop-env.sh`: conf/hadoop-env.sh

~~~ line-numbers 1 ~~~

Hadoop

Installation

[Download Hadoop](http://www.apache.org/dyn/closer.cgi/hadoop/core) from the [Apache Download Mirrors](http://www.apache.org/dyn/closer.cgi/hadoop/core) and extract the contents of the Hadoop package to a location of your choice. I picked `/usr/local/hadoop`. Make sure to change the owner of all the files to the `hduser` user and `hadoop` group, for example:

~~~ line-numbers 1 2 3 4 ~~~

(Just to give you the idea, YMMV – personally, I create a symlink from `hadoop-1.0.3` to `hadoop`.) ## Update $HOME/.bashrc Add the following lines to the end of the `$HOME/.bashrc` file of user `hduser`. If you use a shell other than bash, you should of course update its appropriate configuration files instead of `.bashrc`. $HOME/.bashrc

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 ~~~

You can repeat this exercise also for other users who want to use Hadoop. ## Excursus: Hadoop Distributed File System (HDFS) Before we continue let us briefly learn a bit more about Hadoop’s distributed file system. > The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop project, which is part of the Apache Lucene project. > > **The Hadoop Distributed File System: Architecture and Design**[hadoop.apache.org/hdfs/docs/…](http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html) The following picture gives an overview of the most important HDFS components. ![](http://www.michael-noll.com/blog/uploads/HDFS-Architecture.gif "HDFS Architecture (source: Hadoop documentation)") ## Configuration Our goal in this tutorial is a single-node setup of Hadoop. More information of what we do in this section is available on the [Hadoop Wiki](http://wiki.apache.org/hadoop/GettingStartedWithHadoop). ### hadoop-env.sh The only required environment variable we have to configure for Hadoop in this tutorial is `JAVA_HOME`. Open `conf/hadoop-env.sh` in the editor of your choice (if you used the installation path in this tutorial, the full path is `/usr/local/hadoop/conf/hadoop-env.sh`) and set the `JAVA_HOME` environment variable to the Sun JDK/JRE 6 directory. Change conf/hadoop-env.sh

~~~ line-numbers 1 2 ~~~

to conf/hadoop-env.sh

~~~ line-numbers 1 2 ~~~

Note: If you are on a Mac with OS X 10.7 you can use the following line to set up `JAVA_HOME` in`conf/hadoop-env.sh`. conf/hadoop-env.sh (on Mac systems)

~~~ line-numbers 1 2 ~~~

conf/*-site.xml

In this section, we will configure the directory where Hadoop will store its data files, the network ports it listens to, etc. Our setup will use Hadoop’s Distributed File System, [HDFS](http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html), even though our little “cluster” only contains our single local machine. You can leave the settings below “as is” with the exception of the `hadoop.tmp.dir` parameter – this parameter you must change to a directory of your choice. We will use the directory `/app/hadoop/tmp` in this tutorial. Hadoop’s default configurations use `hadoop.tmp.dir` as the base temporary directory both for the local file system and HDFS, so don’t be surprised if you see Hadoop creating the specified directory automatically on HDFS at some later point. Now we create the directory and set the required ownerships and permissions:

~~~ line-numbers 1 2 3 4 ~~~

If you forget to set the required ownerships and permissions, you will see a `java.io.IOException` when you try to format the name node in the next section). Add the following snippets between the `<configuration> ... </configuration>` tags in the respective configuration XML file. In file `conf/core-site.xml`: conf/core-site.xml

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ~~~

In file `conf/mapred-site.xml`: conf/mapred-site.xml

~~~ line-numbers 1 2 3 4 5 6 7 8 ~~~

In file `conf/hdfs-site.xml`: conf/hdfs-site.xml

~~~ line-numbers 1 2 3 4 5 6 7 8 ~~~

See [Getting Started with Hadoop](http://wiki.apache.org/hadoop/GettingStartedWithHadoop) and the documentation in [Hadoop’s API Overview](http://hadoop.apache.org/core/docs/current/api/overview-summary.html) if you have any questions about Hadoop’s configuration options. ## Formatting the HDFS filesystem via the NameNode The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” (which includes only your local machine if you followed this tutorial). You need to do this the first time you set up a Hadoop cluster. Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)! To format the filesystem (which simply initializes the directory specified by the `dfs.name.dir`variable), run the command

~~~ line-numbers 1 ~~~

The output will look like this:

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ~~~

Starting your single-node cluster

Run the command:

~~~ line-numbers 1 ~~~

This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine. The output will look like this:

~~~ line-numbers 1 2 3 4 5 6 7 ~~~

A nifty tool for checking whether the expected Hadoop processes are running is `jps` (part of Sun’s Java since v1.5.0). See also [How to debug MapReduce programs](http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms).

~~~ line-numbers 1 2 3 4 5 6 7 ~~~

You can also check with `netstat` if Hadoop is listening on the configured ports.

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 11 12 ~~~

If there are any errors, examine the log files in the `/logs/` directory. ## Stopping your single-node cluster Run the command

~~~ line-numbers 1 ~~~

to stop all the daemons running on your machine. Example output:

~~~ line-numbers 1 2 3 4 5 6 7 ~~~

Running a MapReduce job

We will now run your first Hadoop MapReduce job. We will use the [WordCount example job](http://wiki.apache.org/hadoop/WordCount)which reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occurred, separated by a tab. More information of [what happens behind the scenes](http://wiki.apache.org/hadoop/WordCount) is available at the [Hadoop Wiki](http://wiki.apache.org/hadoop/WordCount). ### Download example input data We will use three ebooks from Project Gutenberg for this example: * [The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson](http://www.gutenberg.org/etext/20417) * [The Notebooks of Leonardo Da Vinci](http://www.gutenberg.org/etext/5000) * [Ulysses by James Joyce](http://www.gutenberg.org/etext/4300) Download each ebook as text files in `Plain Text UTF-8` encoding and store the files in a local temporary directory of choice, for example `/tmp/gutenberg`.

~~~ line-numbers 1 2 3 4 5 6 ~~~

Restart the Hadoop cluster

Restart your Hadoop cluster if it’s not running already.

~~~ line-numbers 1 ~~~

Copy local example data to HDFS

Before we run the actual MapReduce job, we first [have to copy](http://wiki.apache.org/hadoop/ImportantConcepts) the files from our local file system to Hadoop’s [HDFS](http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html).

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 ~~~

Run the MapReduce job

Now, we actually run the WordCount example job.

~~~ line-numbers 1 ~~~

This command will read all the files in the HDFS directory `/user/hduser/gutenberg`, process it, and store the result in the HDFS directory `/user/hduser/gutenberg-output`. Note: Some people run the command above and get the following error message: In this case, re-run the command with the full name of the Hadoop Examples JAR file, for example: Example output of the previous command in the console:

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ~~~

Check if the result is successfully stored in HDFS directory `/user/hduser/gutenberg-output`:

~~~ line-numbers 1 2 3 4 5 6 7 8 9 ~~~

If you want to modify some Hadoop settings on the fly like increasing the number of Reduce tasks, you can use the `"-D"` option:

~~~ line-numbers 1 ~~~

An important note about mapred.map.tasksHadoop does not honor mapred.map.tasks beyond considering it a hint. But it accepts the user specified mapred.reduce.tasks and doesn’t manipulate that. You cannot force mapred.map.tasks but you can specify mapred.reduce.tasks.

Retrieve the job result from HDFS

To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the command

~~~ line-numbers 1 ~~~

to read the file directly from HDFS without copying it to the local file system. In this tutorial, we will copy the results to the local file system though.

~~~ line-numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ~~~

Note that in this specific output the quote signs (“) enclosing the words in the `head` output above have not been inserted by Hadoop. They are the result of the word tokenizer used in the WordCount example, and in this case they matched the beginning of a quote in the ebook texts. Just inspect the `part-00000` file further to see it for yourself. > The command fs -getmerge will simply concatenate any files it finds in the directory you specify. This means that the merged file might (and most likely will) **not be sorted**. ## Hadoop Web Interfaces Hadoop comes with several web interfaces which are by default (see `conf/hadoop-default.xml`) available at these locations: * [http://localhost:50070/](http://localhost:50070/) – web UI of the NameNode daemon * [http://localhost:50030/](http://localhost:50030/) – web UI of the JobTracker daemon * [http://localhost:50060/](http://localhost:50060/) – web UI of the TaskTracker daemon These web interfaces provide concise information about what’s happening in your Hadoop cluster. You might want to give them a try. ### NameNode Web Interface (HDFS layer) The name node web UI shows you a cluster summary including information about total/remaining capacity, live and dead nodes. Additionally, it allows you to browse the HDFS namespace and view the contents of its files in the web browser. It also gives access to the local machine’s Hadoop log files. By default, it’s available at [http://localhost:50070/](http://localhost:50070/). ![](http://www.michael-noll.com/blog/uploads/Hadoop-namenode-screenshot.png "A screenshot of Hadoop's NameNode web interface") ### JobTracker Web Interface (MapReduce layer) The JobTracker web UI provides information about general job statistics of the Hadoop cluster, running/completed/failed jobs and a job history log file. It also gives access to the ‘‘local machine’s’’ Hadoop log files (the machine on which the web UI is running on). By default, it’s available at [http://localhost:50030/](http://localhost:50030/). ![](http://www.michael-noll.com/blog/uploads/Hadoop-jobtracker-screenshot.png "A screenshot of Hadoop's Job Tracker web interface") ### TaskTracker Web Interface (MapReduce layer) The task tracker web UI shows you running and non-running tasks. It also gives access to the ‘‘local machine’s’’ Hadoop log files. By default, it’s available at [http://localhost:50060/](http://localhost:50060/). ![](http://www.michael-noll.com/blog/uploads/Hadoop-tasktracker-screenshot.png "A screenshot of Hadoop's Task Tracker web interface") # What’s next? If you’re feeling comfortable, you can continue your Hadoop experience with my follow-up tutorial [Running Hadoop On Ubuntu Linux (Multi-Node Cluster)](http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/) where I describe how to build a Hadoop ‘‘multi-node’’ cluster with two Ubuntu boxes (this will increase your current cluster size by 100%, heh). In addition, I wrote [a tutorial](http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/) on [how to code a simple MapReduce job](http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/) in the Python programming language which can serve as the basis for writing your own MapReduce programs. # Related Links From yours truly: * [Running Hadoop On Ubuntu Linux (Multi-Node Cluster)](http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/) * [Writing An Hadoop MapReduce Program In Python](http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/) From other people: * [How to debug MapReduce programs](http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms) * [Hadoop API Overview](http://hadoop.apache.org/core/docs/current/api/overview-summary.html) (for Hadoop 2.x)

正当我很嗨的下着infocom2013年的论文的时候

我忽视了一个问题,下了之后完全不知道谁是谁

然后我就崩溃了

然后找到了这个帖子,虽然代码有挺大的问题,不过基本上还能用

后面那一段取pdf第一行标题的位置没有做文件名长度的限制

我做了一个限制32个字符(NTFS貌似最长是128个字符)

更改代码片段如下:

 

 

 

1
2
3
4
5
6
7
 #这里就是把先前得到的路径名加上得到的新文件名,再加上后缀名,得到新的文件名

        title=title[:32]   #这里就是我添加的代码,改了之后就不会报文件名过长的错误了

        new=string+title+".pdf"

        print "old=%s " % old

 

 

他的代码中还有一处问题,就是字符串编码的问题,会导致很多符号无法识别,比如(?)这样的字符都会报错

然后我顺便把编码方式从GBK改成GB2312了,搞定收工

 

1
2
3
4
5
        print "new = %s " % new
#这里一定要对新的文件名重新定义编码格式,而且一定是GBK,因为Windows中文版默认的就是GBK编码
        new=new.encode(&#39;GB2312&#39;)
#关闭文件流,不然无法更名
        stream.close()

 

 

本文转载自点击打开链接

如下:

这两天全组的人都被shadi抓着去下各种会议论文,有些会议还好,可以直接批量下载。像IEEE和ACM的会议,只能从学校图书馆里进数据库,然后一篇篇地打开、保存,而且保存下来的pdf文件名字都是数字。还有些是从google scholar里搜出来的,名字更加没有规律了。

大神jhonglei同学见状,自告奋勇地上网搜索了一番,给出了一段可以根据pdf文件的title属性对pdf文件进行批量重命名的python代码。 用python对pdf批量重命名 #encoding:utf-8

'''

需要到:http://pybrary.net/pyPdf/   下载安装pyPdf库

'''

import os import operator from pyPdf import PdfFileWriter, PdfFileReader #对取得的文件命格式化,去掉其中的非法字符 def format(filename):     if (isinstance(filename, str)):         tuple=('?','╲','*','/',',','“','<','>','|','“','“',',','‘','”',',','/',':')         for char in tuple:             if (filename.find(char)!=-1):                 filename=filename.replace(char,” ”)         return filename     else:         return 'None' #通过递归调用次方法依次遍历文件夹中的每个文件,如果后缀名是.pdf,则对其处理 def VisitDir(path):     li=os.listdir(path)     for p in li:         pathname=os.path.join(path,p)         if not os.path.isfile(pathname):                            VisitDir(pathname)         else:             back=os.path.splitext(pathname)             backname=back[1]             if backname=='.pdf':                 print pathname                 rename(pathname) #文件改名程序 def rename(pathname):     stream=file(pathname, “rb”)     input1 = PdfFileReader(stream)     isEncrypted=input1.isEncrypted     if not(isEncrypted):  #这里的pathname包含路径以及文件名,根据/将起分割成一个list然后去除文件名,保留路径         list=pathname.split(“")         oldname=””         for strname in list:             oldname+=strname+'\'         old=oldname[0:len(oldname)-1] #这就是去除文件名         list.pop()                       string=””         for strname in list:             string+=strname+'\'         print “string= %s” % string         title=str(input1.getDocumentInfo().title)         print “title = %s” % (input1.getDocumentInfo().title)                          title=format(title)  #这里就是把先前得到的路径名加上得到的新文件名,再加上后缀名,得到新的文件名                new=string+title+”.pdf”         print “old=%s “ % old         print “new = %s “ % new #这里一定要对新的文件名重新定义编码格式,而且一定是GBK,因为Windows中文版默认的就是GBK编码         new=new.encode('GBK') #关闭文件流,不然无法更名         stream.close()         if(str(title)!=”None”):             try:                 os.rename(old, new)             except WindowsError,e:                  #print str(e)                   print e         else:             print”The file contian no title attribute!”      else:         print “This file is encrypted!” if name==”main”:  path=r”F:PapersICDE'09Demos”  VisitDir(path) 但是这段代码还是有问题的: 1、有些pdf文件的title属性为空,或者并不是真正的文件名 用python对pdf批量重命名 这样的重命名无意义 2、有些pdf文件的title属性里有非拉丁字符, 用python对pdf批量重命名 这会报错 UnicodeEncodeError: 'ascii' codec can't encode character u'xe9' in position 4: ordinal not in range(128) 后来大神又找到了一个更牛逼的库,pdfminer   http://www.unixuser.org/~euske/python/pdfminer/ 主要的想法就是用pfdminer的pdf2txt模块将pdf文件的第一页转换成文本,然后从中读取第一行作为文件名 得益于pdfminer强大的功能,这样的命名准确率非常高。只要你的pdf文件不是扫描版的(图片格式),都可以正确获取文件名 下面的代码要正常运行,必须与pdfminer中的pdf2txt.py位于同一目录下

#encoding:utf-8

''' 目的:根据文章的标题重命名title

采用两种方式获取文章的title 方式一: 读取PDF的title属性g,根据这个属性,更改次文档的名字! 也就是选中pdf文件右键后点击查看获取的 需要到:http://pybrary.net/pyPdf/ 方式二: 根据pdf内容获取title 需要pdfminer '''

import os import operator from pyPdf import PdfFileWriter, PdfFileReader #对取得的文件命格式化,去掉其中的非法字符 def format(filename):     if (isinstance(filename, str)):         tuple=('?','╲','*','/',',','“','<','>','|','“','“',',','‘','”',',','/',':')         for char in tuple:             if (filename.find(char)!=-1):                 filename=filename.replace(char,” ”)         return filename     else:         return 'None'

## 添加因为pdf转换产生的乱码 key_value = { 'xefxacx81':'fi'}

#通过递归调用次方法依次遍历文件夹中的每个文件,如果后缀名是.pdf,则对其处理 def VisitDir(path):     li=os.listdir(path)     for p in li:         pathname=os.path.join(path,p)         if not os.path.isfile(pathname):                            VisitDir(pathname)         else:             back=os.path.splitext(pathname)             backname=back[1]             if backname=='.pdf':                 print pathname                 rename(pathname) #文件改名程序 def rename(pathname):     stream=file(pathname, “rb”)     input1 = PdfFileReader(stream)     isEncrypted=input1.isEncrypted     if not(isEncrypted):  #这里的pathname包含路径以及文件名,根据/将起分割成一个list然后去除文件名,保留路径         list=pathname.split(“")         oldname=””         for strname in list:             oldname+=strname+'\'         old=oldname[0:len(oldname)-1] #这就是去除文件名         list.pop()                       string=””         for strname in list:             string+=strname+'\'         print “string= %s” % string         ## Option 1: user attributes         title = str(input1.getDocumentInfo().title)         #print “title = %s” % (input1.getDocumentInfo().title)          ################ jiang added          #if(str(title) == “None”):         ###Option2: use pdf content download         os.system('python pdf2txt.py -p 1 “'+old+'“ >c.txt')         f = open(“c.txt”,”rb”)         title = ””         a = f.readline()         while( a not in (“rn”,”n”) ):             title += a             a = f.readline()         title = title.replace(“rn”,” ”).strip()          ########### jiang added             title=format(title)         for key,value in key_value.iteritems():             title = title.replace(key,value)           #这里就是把先前得到的路径名加上得到的新文件名,再加上后缀名,得到新的文件名                new=string+title+”.pdf”         print “old=%s “ % old         print “new = %s “ % new #这里一定要对新的文件名重新定义编码格式,而且一定是GBK,因为Windows中文版默认的就是GBK编码         new=new.encode('GBK') #关闭文件流,不然无法更名         stream.close()         if(str(title)!=”None”):             try:                 os.rename(old, new)             except WindowsError,e:                  print str(e)         else:             # python pdf2txt.py -p 1 p43-nazir.pdf >c.txt             print”The file contian no title attribute!”      else:         print “This file is encrypted!” if name==”main”:     path=r”.” #the current directory     VisitDir(path)

 

转自http://blog.csdn.net/lovelion/article/details/1350127

可能有很多朋友在网上看过Google公司早几年的招聘广告,它的第一题如下了:{first 10-digit prime found in consecutive digits e}.com,e中出现的连续的第一个10个数字组成的质数。据说当时这个试题在美国很多地铁的出站口都有大幅广告,只要正确解答了这道题,在浏览器的地址栏中输入这个答案,就可以进入下一轮的测试,整个测试过程如同一个数学迷宫,直到你成为Google的一员。又如Intel某年的一道面试题目:巴拿赫病故于1945年8月31日。他的出生年份恰好是他在世时某年年龄的平方,问:他是哪年出生的?这道看似很简单的数学问题,你能不能能快地解答呢?下面则是一道世界第一大软件公司微软的招聘测试题:中间只隔一个数字的两个素数被称为素数对,比如5和7,17和19,证明素数对之间的数字总能被6整除(假设这两个素数都大于6),现在证明没有由三个素数组成的素数对。这样的试题还有很多很多,这些题目乍初看上去都是一些数学问题。但是世界上一些著名的公司都把它们用于招聘测试,可见它们对新员工数学基础的重视。数学试题与应用程序试题是许多大型软件公司面试中指向性最明显的一类试题,这些试题就是考察应聘者的数学能力与计算机能力。某咨询公司的一名高级顾问曾说:微软是一家电脑软件公司,当然要求其员工有一定的计算机和数学能力,面试中自然就会考察这类能力。微软的面试题目就考察了应聘人员对基础知识的掌握程度、对基础知识的应用能力,甚至暗含了对计算机基本原理的考察。所以,这样的面试题目的确很“毒辣”,足以筛选到合适的人。     四川大学数学学院的曹广福教授曾说过:“一个大学生将来的作为与他的数学修养有很大的关系”。大学计算机专业学生都有感触,计算机专业课程中最难的几门课程莫过于离散数学、编译原理、数据结构,当然像组合数学、密码学、计算机图形学等课程也令许多人学起来相当吃力,很多自认为数据库学得很好的学生在范式、函数依赖、传递依赖等数学性比较强的概念面前感到力不从心,这些都是因为数学基础或者说数学知识的缺乏所造成的。数学是计算机的基础,这也是为什么考计算机专业研究生数学都采用最难试题(数学一)的原因,当然这也能促使一些新的交叉学科如数学与应用软件、信息与计算科学专业等飞速发展。许多天才程序员本身就是数学尖子,众所周知,Bill Gates的数学成绩一直都很棒,他甚至曾经期望当一名数学教授,他的母校——湖滨中学的数学系主任弗雷福·赖特曾这样谈起过他的学生:“他能用一种最简单的方法来解决某个代数或计算机问题,他可以用数学的方法来找到一条处理问题的捷径,我教了这么多年的书,没见过像他这样天分的数学奇才。他甚至可以和我工作过多年的那些优秀数学家媲美。当然,比尔也各方面表现得都很优秀,不仅仅是数学,他的知识面非常广泛,数学仅是他众多特长之一。”。影响一代中国程序人的金山软件股份有限公司董事长求伯君当年高考数学成绩满分进一步说明了问题。很多数学基础很好的人,一旦熟悉了某种计算机语言,他可以很快地理解一些算法的精髓,使之能够运用自如,并可能写出时间与空间复杂度都有明显改善的算法。     程序设计当中解决的相当一部分问题都会涉及各种各样的科学计算,这需要程序员具有什么样的基础呢?实际问题转换为程序,要经过一个对问题抽象的过程,建立起完善的数学模型,只有这样,我们才能建立一个设计良好的程序。从中我们不难看出数学在程序设计领域的重要性。算法与计算理论是计算机程序设计领域的灵魂所在,是发挥程序设计者严谨,敏锐思维的有效工具,任何的程序设计语言都试图将之发挥得淋漓尽致。程序员需要一定的数学修养,不但是编程本身的需要,同时也是培养逻辑思维以及严谨的编程作风的需要。数学可以锻炼我们的思维能力,可以帮助我们解决现实中的问题。可以帮助我们更高的学习哲学。为什么经常有人对一些科学计算程序一筹莫展,他可以读懂每一行代码,但是却无法预测程序的预测结果,甚至对程序的结构与功能也一知半解,给他一个稍微复杂点的数学公式,他可能就不知道怎么把它变成计算机程序。很多程序员还停留在做做简单的MIS,设计一下MDI,写写简单的Class或用SQL语句实现查询等基础的编程工作上,对于一些需要用到数学知识的编程工作就避而远之,当然实现一个累加程序或者一个税率的换算程序还是很容易的,因为它们并不需要什么高深的数学知识。     一名有过10多年开发经验的老程序员曾说过:“所有程序的本质就是逻辑。技术你已经较好地掌握了,但只有完成逻辑能力的提高,你才能成为一名职业程序员。打一个比方吧,你会十八般武艺,刀枪棍棒都很精通,但就是力气不够,所以永远都上不了战场,这个力气对程序员而言就是逻辑能力(其本质是一个人的数学修养,注意,不是数学知识)。”     程序员的数学修养不是一朝一夕就可以培养的。数学修养与数学知识不一样,修养需要一个长期的过程,而知识的学习可能只是一段短暂的时间。下面是一些我个人对于程序员如何提高与培养自己的数学修养的基本看法。     首先,应该意识到数学修养的重要性。作为一个优秀的程序员,一定的数学修养是十分重要也是必要的。数学是自然科学的基础,计算机科学实际上是数学的一个分支。计算机理论其实是很多数学知识的融合,软件工程需要图论,密码学需要数论,软件测试需要组合数学,计算机程序的编制更需要很多的数学知识,如集合论、排队论、离散数学、统计学,当然还有微积分。计算机科学一个最大的特征是信息与知识更新速度很快,随着数学知识与计算机理论的进一步结合,数据挖掘、模式识别、神经网络等分支科学得到了迅速发展,控制论、模糊数学、耗散理论、分形科学都促进了计算机软件理论、信息管理技术的发展。严格的说,一个数学基础不扎实的程序不能算一个合格的程序员,很多介绍计算机算法的书籍本身也就是数学知识的应用与计算机实现手册。     其次,自身数学知识的积累,培养自己的空间思维能力和逻辑判断能力。数学是一门分支众多的学科,我们无法在短暂的一生中学会所有的数学知识,像泛函理论、混沌理论以及一些非线性数学问题不是三五几天就可以掌握的。数学修养的培养并不在与数学知识的多少,但要求程序员有良好的数学学习能力,能够很快地把一些数学知识和自己正在解决的问题联系起来,很多理学大师虽然不是数学出身,但是他们对数学有很强的理解能力和敏锐的观察力,于是一系列新的学科诞生了,如计算化学、计算生物学、生物信息学、化学信息学、计算物理学,计算材料学等等。数学是自然学科的基础,计算机技术作为理论与实践的结合,更需要把数学的一些精髓融入其中。从计算机的诞生来看它就是在数学的基础上产生的,最简单的0、1进制就是一个古老的数学问题。程序设计作为一项创造性很强的职业,它需要程序员有一定的数学修养,也具有一定的数学知识的积累,可以更好地把一些数学原理与思想应用于实际的编程工作中去。学无止境,不断的学习是提高修养的必经之路。     第三,多在实践中运用数学。有些高等学校开设了一门这样的课程——《数学建模》。我在大学时期也曾学过,这是一门内容很丰富的课程。它把很多相关的学科与数学都联系在一起,通过很多数学模型来解决实际的生产生活问题,很多问题的解决需要计算机程序来实现。我在大学和研究生阶段都参加过数学建模竞赛,获得了不少的经验,同时也进一步提高了自己的数学修养。实际上,现在的程序设计从某些角度来看就是一个数学建模的过程,模型的好坏关系到系统的成败,现在数学建模的思想已经用于计算机的许多相关学科中,不单只是计算机程序设计与算法分析。应该知道,数学是一门需要在实践中展示其魅力的科学,而计算机程序也是为帮助解决实际问题而编制的,因此,应该尽量使它们结合起来,在这个方面,计算机密码学是我认为运用数学知识最深最广泛的,每一个好的加密算法后面都有一个数学理论的支持,如椭圆曲线、背包问题、素数理论等。作为一名优秀的程序员,应该在实际工作中根据需要灵活运用数学知识,培养一定的数学建模能力,善于归纳总结,慢慢使自己的数学知识更加全面,数学修养得到进一步提高。         第四,程序员培养制度与教学的改革。许多程序员培养体制存在很多缺陷,一开始就要求学员能够快速精通某种语言,以语言为中心,对算法的核心思想与相关的数学知识都一笔带过,讲得很少,这造成很多程序员成为背程序的机器,这样不利于程序员自身的快速成长,也不利于程序员解决新问题。我在长期的程序员培训与计算机教学工作采用了一些与传统方式不一致的方法,收到了一定的效果。很多初学程序的人往往写程序时有时候会有思维中断,或者对一些稍难的程序觉得无法下手,我采用了一些课前解决数学小问题的方法来激励大家的学习兴趣,这些小问题不单单是脑筋急转弯,其中不少是很有代表意义的数学思考题。通过数学问题来做编程的热身运动,让学员在数学试题中激发自己的思维能力,记得有位专家曾经说过,经常做做数学题目会使自己变聪明,很长时间不去接触数学问题会使自己思维迟钝。通过一些经典的数学问题来培养学员的思维的严谨性和跳跃性。很多人可能不以为然,其实有些看似简单的问题并不一定能够快速给出答案,大脑也是在不断的运用中变更加灵活的。不信吗?大家有兴趣可以做做下面这道题目,看看能不能在1分钟之内想到答案,这只是一道小学数学课后习题。很多人认为自己的数学基础很好,但是据说这道题目90%以上的人不能在一个小时内给出正确答案。试试,如果你觉得我说的是错的。  证明:AB+AC>DB+DC(D为三角形ABC的一个内点)。      最后,多学多问,多看好书,看经典。我在这里向大家推荐两部可能大家已经很熟悉的经典的计算机算法教材,它们中间很多内容其实就是数学知识的介绍。第一部是《算法导论》,英文名称:Introduction to Algorithms,作者:Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest,Clifford Stein。本书的主要作者来自麻省理工大学计算机,作者之一Ronald L.Rivest由于其在公开秘钥密码算法RSA上的贡献获得了图灵奖。这本书目前是算法的标准教材,美国许多名校的计算机系都使用它,国内有些院校也将本书作为算法课程的教材。另外许多专业人员也经常引用它。本书基本包含了所有的经典算法,程序全部由伪代码实现,这更增添了本书的通用性,使得利用各种程序设计语言进行程序开发的程序员都可以作为参考。语言方面通俗,很适合作为算法教材和自学算法之用。另一部是很多人都应该知道的Donald.E.Knuth所著《计算机程序设计艺术》,英文名称:The Art of Computer Programming。 Donald.E.Knuth人生最辉煌的时刻在斯坦福大学计算机系渡过,美国计算机协会图灵奖的获得者,是本领域内当之无愧的泰斗。有戏言称搞计算机程序设计的不认识Knuth就等于搞物理的不知道爱因斯坦,搞数学的不知道欧拉,搞化学的不知道道尔顿。被简称为TAOCP的这本巨著内容博大精深,几乎涵盖了计算机程序设计算法与理论最重要的内容。现在发行的只有三卷,分别为基础运算法则,半数值算法,以及排序和搜索(在写本文之际,第四卷已经出来了,我也在第一时间抢购了一本)。本书结合大量数学知识,分析不同应用领域中的各种算法,研究算法的复杂性,即算法的时间、空间效率,探讨各种适用算法等,其理论和实践价值得到了全世界计算机工作者的公认。书中引入的许多术语、得到的许多结论都变成了计算机领域的标准术语和被广泛引用的结果。另外,作者对有关领域的科学发展史也有深入研究,因此本书介绍众多研究成果的同时,也对其历史渊源和发展过程做了很好的介绍,这种特色在全球科学著作中是不多见的。至于本书的价值我觉得Bill Gates先生的话足以说明问题:“如果你认为你是一名真正优秀的程序员读Knuth的《计算机程序设计艺术》,如果你能读懂整套书的话,请给我发一份你的简历”。作者数学方面的功底造就了本书严谨的风格,虽然本书不是用当今流行的程序设计语言描述的,但这丝毫不损伤它“程序设计史诗”的地位。道理很简单,它内涵的设计思想是永远不会过时的。除非英语实在有困难,否则建议读者选用英文版。我个人就是阅读的该书的英文版,虽然花了不少money和时间,但是收获颇丰,值得。           总之,要想成为一名有潜力有发展前途的程序员,或者想成为程序员中的佼佼者,你一定要培养良好的数学修养。切记:对于一个能够灵活自如编写各种程序的人,数学是程序的灵魂。     参考文献 [1]林庆忠.修炼一名程序员的职业水准. http://blog.csdn.net/imwj/archive/2005/02/02/27723.  [2]刘汝佳,黄亮.算法艺术与信息学竞赛.清华大学出版社: 2004. [3] Thomas H.Cormen Charles E.Leiserson Ronald L.Rivest Clifford Stein. Introduction to Algorithms (Second Edition). The MIT Press: 2002. [4] Donald.E.Knuth. The Art of Computer Programming.清华大学出版社: 2002. [5]姜启源.数学模型.清华大学出版社: 1993. [6] Zbigniew Michalewicz, David B. Fogel.如何求解问题:现代启发式方法.中国水利水电出版社: 2003.   作者简介: 刘伟,国家认证系统分析员,国家认证数据库系统工程师,中南大学计算机应用技术博士,有超过十年计算机软件开发与计算机教学经验,在多个学校与培训机构承担教学工作,参与组织与开发各类信息系统30多个。学生期间曾获得全国研究生数学建模竞赛全国一等奖和二等奖各一次。参与翻译外文专著一部,发表论文十余篇。