emaizettoメモ: 11月 2010

2010年11月30日火曜日

Eclipse HeliosにSubversiveをインストールする方法

Eclipse HeliosにSubversiveをインストールしたので，その手順をメモしておきます．インストールは，ページ[1]を参考にしました．

「Help」 -> 「Install New Software...」を選択
「Work with」で「Helios - http://download.eclipse.org/releases/helios」を選択
「Collaboration」 -> 「Subversive SVN Team Provider (Incubaion)」を選択
「Next」を選択
「Next」を選択
「I accepted the terms of the license agreements」を選択
「Finish」を選択
「Next」を選択
「Next」を選択
「I accepted the terms of the license agreements」を選択
「Finish」を選択
「OK」を選択
「Restart Now」を選択

[1] Eclipse 3.6(Helios)に Subversionプラグイン Subversiveを導入する

2010年11月11日木曜日

Differences and similarities in information seeking: children and adults as Web users

読解支援に関係しそうな論文[1]を読んだので，メモしておきます．

本論文は，Web上の情報探索における子供と大人との間の相違点と類似点を調査する研究について報告している．この研究では，Yahooligansを用いて事実を探索する課題を行う際のユーザの思考，感情，行動について調査している．この結果，大人のほうが子供より多く正解を発見できた．また，行き詰まりに対する回復方法，閲覧方法（大人はブラウジング，子供は検索に割く時間が多い），タスクへの集中において両グループの間に違いがみられた．

注目フレーズ

Research on children's use of the Internet/Web and adult use of the Internet/Web shows that both user groups have cognitive difficulties constructing effective search queries, and that most of these users do not use the Web effectively.
Children were interactive information seekers, preferring to browse rather than plan or employ systematic and analytic search strategies.
Children had difficulty finding relevant information, but were more successful in finding information on the open-ended task than the fact-finding task.
Users did not have many queries per search, rarely modified queries, and used advanced search syntax minimally in constructing queries.
Following a link and using the Back command were the most frequent Web actions
58% of the pages the participants visited were re-visits, ant that these pages were re-visited through activating Back command.
Zipf's distribution was the best fit for the frequency of user node visiting.
The type of search task did not influence the path patterns user followed.
Frequent use of Back command seems to be common among Web users, regardless of age.
Information skills programs must consider levels of cognitive development and, as importantly, pay attention to the process skills students need to plan and evaluate all aspects of information utilization and retrieval.
Age is not a factor that influences information seeking behavior.
Both student groups were unsuccessful when they searched by keyword and more successful when they browsed subject hierarchies.
Both children and adults used Netscape Back command to navigate among Web pages.
Many children apply natural language in querying search engines.

[1] Dania Bilal, Joe Kirby: Differences and similarities in information seeking: children and adults as Web users, Information Processing and Management, Vol. 38, No. 5, pp. 649-670, 2002-09

2010年11月6日土曜日

Rubyでベイジアンフィルタを作成

そろそろRubyにも手を出しておこうかということで，Rubyでベイジアンフィルタを作ってみました．

作ったといっても，元々Perlで書かれたものをRubyで書き直しただけです．元のプログラムは，WEB+DB PRESS Vol.56の記事[1]で解説されていたものです．作ったプログラムは，bayes_sample.rbとClassifier.rbの二つです．両方とも以下に記しておきます．

Rubyは，初めて使ったので，お粗末なコードになっているかもしれません．でも，とりあえず動作すると思います．ただし，事前にMecabとMecab-rubyをインストールしておく必要があります．

[1] 伊藤直也: ベイジアンフィルタに挑戦―未知のデータを学習して分類―, アルゴリズム実践教室第1回, WEB+DB PRESS, Vol. 56, pp. 134-142, 2010-05

Amazon.co.jp ウィジェット

bayes_sample.rb



require 'MeCab'

require 'Classifier'



def text2vec(text)

  mecab = MeCab::Tagger.new

  node = mecab.parseToNode(text)

  vec = Hash.new(0)

  while node do

    if (node.posid >= 1 and node.posid <= 4) or node.posid == "?" then

      vec[node.surface] = vec[node.surface] + 1

    end

    node = node.next

  end

  return vec

end



cl = Classifier.new()

cl.train(text2vec("perlやpythonはスクリプト言語です"), "it")

cl.train(text2vec("perlでベイジアンフィルタを作りました"), "it")

cl.train(text2vec("pythonはニシキヘビ科のヘビの総称"), "science")



print "1, 推定カテゴリ: ", cl.predict(text2vec("perlは楽しい")), "\n"

print "2, 推定カテゴリ: ", cl.predict(text2vec("pythonとperl")), "\n"

print "3, 推定カテゴリ: ", cl.predict(text2vec("pythonとヘビ")), "\n"

Classifier.rb



class Classifier

  def initialize

    @term_count = Hash.new {|h, k| h[k] = Hash.new(0)}

    @category_count = Hash.new(0)

  end



  def train(vec, cat)

    vec.each do |term, count|

      @term_count[term][cat] = @term_count[term][cat] + count

      @category_count[cat] = @category_count[term] + 1

    end

  end



  def predict(vec)

    scores = Hash.new

    @category_count.keys.each do |cat|

      scores[cat] = self.score(vec, cat)

    end

    classes = scores.to_a

    classes.sort! do |a, b|

      (b[1] <=> a[1]) * 2 + (a[0] <=> b[0])

    end

    return classes[0][0]

  end



  def score(vec, cat)

    cat_prob = Math.log(self.cat_prob(cat))

    not_likely = 1.0 / (self.total_term_count() * 10)

    doc_prob = 0.0

    vec.each do |term, count|

      term_prob = self.term_prob(term, cat)

      if term_prob == 0.0 then

        term_prob = not_likely

      end

      doc_prob += Math.log(term_prob) * count;

    end

    return cat_prob + doc_prob

  end



  def cat_prob(cat)

    return @category_count[cat].to_f / self.total_term_count()

  end



  def term_prob(term, cat)

    return self.term_count(term, cat).to_f / @category_count[cat]

  end



  def term_count(term, cat)

    return @term_count[term][cat]

  end



  def total_term_count

    total = 0

    @category_count.values.each do |count|

      total += count;

    end

    return total

  end



  def dump

    p @term_count

    p @category_count

  end

end

2010年11月3日水曜日

教科書コーパスを用いた日本語テキストの難易度推定

読解支援に関係しそうな論文[1]を読んだので，メモしておきます．

本論文は，円滑な情報伝達を実現することを目的として，日本語テキストを対象とした難易度推定システムを提案している．難易度推定の手法には，文字ユニグラムを言語モデルとした多項ナイーブベイズ分類を使用する．そして，その言語モデルの構築には，小学校から大学までの英語を除く科目のテキストからなる規準コーパスを使用する．本システムを交差検定実験により評価した結果，所与の難易度と推定した難易度との間で非常に高い相関を示すことが分かった．

注目フレーズ

テキストの書き手に「難しさの客観的評価」を提供することは，円滑な情報伝達を実現するための計算機支援の一形態となる．
英語に対しては1920年代から，日本語に対しては1940年代から行われている．
英語の難易度算定公式
- Fresch Reading Ease
- Kincaid Grade Level
- 読解教材の難易度推定などに広く用いられている．
日本語の難易度算定公式
- 立石らの手法[2]
- 川村の手法[3]
- 柴崎らの手法[4]
- 実用にいたっていない
英語テキストの難易度推定手法
- Collins-Thompsonらの手法[5]

[1] 近藤陽介, 松吉俊, 佐藤理史: 教科書コーパスを用いた日本語テキストの難易度推定, 言語処理学会第14回年次大会発表論文集, pp. 1113-1116, 2008-03
[2] 立石由佳, 小野芳彦, 山田尚勇: 日本文の読みやすさの評価式, 情報処理学会研究報告, Vol. 1988, No. 25, pp. 1-8, 1988-05
[3] 川村よし子: 語彙チェッカーを用いた読解テキストの分析, 早稲田大学日本語教育センター講座日本語教育, 第34分冊, pp. 1-22, 1998
[4] 柴崎秀子, 沢井康孝: 国語教科書コーパスを応用した日本語リーダビリティー構築のための基礎研究, 電子情報通信学会技術報告, Vol. 2007, No. 32, pp. 19-24, 2007-10
[5] Kevyn Collins-Thompson, Jamie Callan: Predicting Reading Difficulty with Statistical Language Models, Journal of the American Society for Information Science and Technology, Vol. 56, No. 13, pp. 1448-1462, 2005-11

Cygwin 1.7.7にMeCab 0.98+MeCab-ruby 0.98をインストールする方法

Cygwin 1.7.7にMeCab 0.98+MeCab-ruby 0.98をインストールしたので，その方法をメモしておきます．インストール先を，/usr/localではなく，/opt/mecabにしてます．そのため，通常より余分な設定が必要になります．/usr/localにインストールする場合は，もっと簡単にできると思います．

$ cd /tmp/mecab-0.98

$ ./configure --prefix=/opt/mecab --with-charset=utf8 CPPFLAGS=-DNOMINMAX LIBS=-liconv

$ make

$ make install

$ cd /tmp/mecab-ipadic-2.7.0-20070801

$ ./configure --prefix=/opt/mecab --with-charset=utf8 --with-mecab-config=/opt/mecab/bin/mecab-config

$ make

$ make install

$ cd /tmp/mecab-ruby-0.98

$ ruby extconf.rb --with-opt-dir=/opt/mecab

$ vi Makefile

--

CC = g++

LDSHARED = g++ -shared -s

LIBS = $(LIBRUBYARG_SHARED)  -ldl -lcrypt -lmecab -liconv

--

$ make

$ make install

$ ruby test.rb

[1]のページを参考にしましたが，微妙に結果が違ってました．恐らくインストール先が違うからだと思います．

[1] Cygwin1.7にMeCab0.98+MeCab-rubyをインストールしたメモ

emaizettoメモ

2010年11月30日火曜日

Eclipse HeliosにSubversiveをインストールする方法

2010年11月11日木曜日

Differences and similarities in information seeking: children and adults as Web users

2010年11月6日土曜日

Rubyでベイジアンフィルタを作成

2010年11月3日水曜日

教科書コーパスを用いた日本語テキストの難易度推定

Cygwin 1.7.7にMeCab 0.98+MeCab-ruby 0.98をインストールする方法

自己紹介

ブログアーカイブ

emaizettoメモ

2010年11月30日火曜日

Eclipse HeliosにSubversiveをインストールする方法

2010年11月11日木曜日

Differences and similarities in information seeking: children and adults as Web users

2010年11月6日土曜日

Rubyでベイジアンフィルタを作成

2010年11月3日水曜日

教科書コーパスを用いた日本語テキストの難易度推定

Cygwin 1.7.7にMeCab 0.98+MeCab-ruby 0.98をインストールする方法

自己紹介

ブログ アーカイブ

ブログアーカイブ