Broken UTF-8 handling in newest Rubygems when environment locales are not set

Paweł Wilk's Avatar

Paweł Wilk

27 Feb, 2011 01:05 AM

Hi,

I'm encountering some problems after upgrade to newest Rubygems.

It happens when locale settings from the environment are empty (not even "C" locale) and someone tries to install some gem that was created using the newest Rubygems. The major precondition for error to occur is a UTF-8 character in some descriptive field (e.g. developer). The locale settings used when building a gem are irrevelant.

rubygems-update (1.5.2)
gemcutter (0.6.1)

Steps to reproduce:

unset LANG
unset LC_ALL
gem install i18n-inflector

  ERROR:  While executing gem ... (ArgumentError)
     invalid byte sequence in US-ASCII

But the same package built one day before (before Rubygems upgrade):

unset LANG
unset LC_ALL
gem install i18n-inflector --version 2.5.0

  Fetching: i18n-inflector-2.5.0.gem (100%)
  Successfully installed i18n-inflector-2.5.0
  1 gem installed

Regards,
Paweł

  1. Support Staff 1 Posted by Nick Quaranto on 27 Feb, 2011 06:29 PM

    Nick Quaranto's Avatar

    Wow, strange...not sure what's up here, the guys that work on the RG client library will most likely have a better idea.

    What OS is this on? Can you show us your gem env output?

  2. 2 Posted by Paweł Wilk on 28 Feb, 2011 12:38 AM

    Paweł Wilk's Avatar

    Hi,

    The problem is OS-independent. It happened on Mac OS X Snow Leopard and on Ubuntu Server. The problem was detected on Ruby 1.9.

    BTW, prerequisites to trigger the error when installing a gem:

    unset LANG
    unset LC_ALL
    unset LC_COLLATE
    unset LC_CTYPE
    unset LC_MESSAGES
    unset LC_MONETARY
    unset LC_NUMERIC
    unset LC_TIME
    
    locale
    
      # =>  LANG=
      # =>  LC_COLLATE="C"
      # =>  LC_CTYPE="C"
      # =>  LC_MESSAGES="C"
      # =>  LC_MONETARY="C"
      # =>  LC_NUMERIC="C"
      # =>  LC_TIME="C"
      # =>  LC_ALL=
    

    First failing env:

    RubyGems Environment:
      - RUBYGEMS VERSION: 1.5.2
      - RUBY VERSION: 1.9.2 (2011-02-18 patchlevel 180) [x86_64-darwin10.6.0]
      - INSTALLATION DIRECTORY: /usr/local/Cellar/ruby/1.9.2-p180/lib/ruby/gems/1.9.1
      - RUBY EXECUTABLE: /usr/local/Cellar/ruby/1.9.2-p180/bin/ruby
      - EXECUTABLE DIRECTORY: /usr/local/Cellar/ruby/1.9.2-p180/bin
      - RUBYGEMS PLATFORMS:
        - ruby
        - x86_64-darwin-10
      - GEM PATHS:
         - /usr/local/Cellar/ruby/1.9.2-p180/lib/ruby/gems/1.9.1
         - /var/root/.gem/ruby/1.9.1
      - GEM CONFIGURATION:
         - :update_sources => true
         - :verbose => true
         - :benchmark => false
         - :backtrace => false
         - :bulk_threshold => 1000
         - :sources => ["http://gems.rubyforge.org/", "http://gems.github.com/"]
      - REMOTE SOURCES:
         - http://gems.rubyforge.org/
         - http://gems.github.com/
    

    Second failing env:

    RubyGems Environment:
      - RUBYGEMS VERSION: 1.5.2
      - RUBY VERSION: 1.9.2 (2010-08-18 patchlevel 0) [x86_64-darwin10.6.0]
      - INSTALLATION DIRECTORY: /usr/local/rvm/gems/ruby-1.9.2-p0
      - RUBY EXECUTABLE: /usr/local/rvm/rubies/ruby-1.9.2-p0/bin/ruby
      - EXECUTABLE DIRECTORY: /usr/local/rvm/gems/ruby-1.9.2-p0/bin
      - RUBYGEMS PLATFORMS:
        - ruby
        - x86_64-darwin-10
      - GEM PATHS:
         - /usr/local/rvm/gems/ruby-1.9.2-p0
         - /usr/local/rvm/gems/ruby-1.9.2-p0@global
      - GEM CONFIGURATION:
         - :update_sources => true
         - :verbose => true
         - :benchmark => false
         - :backtrace => false
         - :bulk_threshold => 1000
      - REMOTE SOURCES:
         - http://rubygems.org/
    

    RubyGems 1.5.2 on Ruby 1.8 seems unaffected (but since 1.8 is often encoding-unaware
    it might be just a bug covering other bug).

    Note, that this error occurs when installing packages containing
    UTF-8 characters in some fields that were built with RubyGems 1.5.2.
    Packages built using previous version of RubyGems are installed successully.

    There is a difference in specfiles, that might help in tracking cause
    of this problem. Package i18n-inflector-2.5.0 was build with old RG,
    package i18n-inflector-2.5.1 with new RG. Both were installed using
    new RG, but in case of i18n-inflector-2.5.1 installation failed
    with the error quoted in my previous post.

    Look:

      $ diff i18n-inflector-2.5.0.gemspec i18n-inflector-2.5.1.gemspec
    

    Result: http://pastie.org/1615020

    (I used Pastie since this markup makes backslashes with character codes interpretable)

    See: the authors line. New RubyGems produce unescaped version.
    I don't know, maybe that's intended, but the new RG client
    has some problems with installing such a gem when environment
    locales are not set.

    Regards,
    Paweł

  3. 3 Posted by Paweł Wilk on 06 Mar, 2011 11:04 PM

    Paweł Wilk's Avatar

    It seems that exists in many previous RG versions and it's not a bug of RG itself but it would be nice to have some workaround for that in RG.

  4. Support Staff 4 Posted by Nick Quaranto on 27 May, 2011 12:29 AM

    Nick Quaranto's Avatar

    Is this still happening in later versions? Digging up open tickets, I haven't heard or seen other reports of this.

  5. 5 Posted by Grzegorz Kazula... on 27 May, 2011 10:34 PM

    Grzegorz Kazulak's Avatar

    I got this today when using "pusher-gem" (https://github.com/pusher/pusher-gem) on Ruby 1.9.2-p180 and RG 1.8.4

  6. Support Staff 6 Posted by Luis Lavena on 19 Jun, 2011 04:07 PM

    Luis Lavena's Avatar

    Hello,

    Can you guys open a bug report / issue for RubyGems itself:

    https://github.com/rubygems/rubygems/issues

    For us to better work on fixes for RubyGems that is the right place.

    Thank you.

  7. Support Staff 7 Posted by Luis Lavena on 19 Jun, 2011 04:14 PM

    Luis Lavena's Avatar

    Sorry, by mistake I provided the wrong link to RubyGems issue tracker.

    Please use this: https://github.com/rubygems/rubygems/issues

    Thank you.

  8. 8 Posted by zoras on 10 Aug, 2011 08:57 AM

    zoras's Avatar

    Use following command to fix the problem

    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8
    
    ± locale
    LANG="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_CTYPE="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_ALL="en_US.UTF-8"
    
  9. 9 Posted by Paweł Wilk on 10 Aug, 2011 09:32 AM

    Paweł Wilk's Avatar

    Since I am not in control of systems that are using my gem this workaround cannot be applied by me.

    Yesterday I've discovered that it may be related to psych parser. When using syck parser whlie building gem the problem dissapears during installation since UTF-8 characters are escaped in a gem manifest file.

    To apply the workaround the developer has to set the YAML engine to syck in the Rakefile and/or gemspec:

     if !defined?(YAML::ENGINE).nil? && YAML::ENGINE.respond_to?(:yamler)
       YAML::ENGINE.yamler = 'syck'
     end
    

    See also:

  10. Support Staff 10 Posted by Eric Hodel on 10 Oct, 2011 05:43 PM

    Eric Hodel's Avatar

    This is fixed.

  11. Eric Hodel closed this discussion on 10 Oct, 2011 05:43 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac